ICS 65 Fall 2012, Project #4: Unwashed and Somewhat Slightly Dazed

Due date and time: Monday, December 3, 11:59pm

Introduction

Used in everything from video games to predicting the weather or financial markets to understanding the reactions that underlie nuclear weapons, simulation is a technique that allows computers to help us visualize what might be. In general, the goal is to find a simplified model of reality, which captures the essence of that reality without expressing every detail, then to allow that model to play out, often in accelerated fashion, to see its effects. Depending on how nearly the model resembles reality, a simulation can be an effective way to see a possible future — what the weather will be in five days, which team will win the big game, which candidate will win an election, or where the markets will be next week — or to create the worlds we inhabit in the games we play.

Simulations are implemented using a variety of techniques. This project explores a particular kind of simulation, a discrete event-based simulation, in which events occur according to a schedule determined on the fly as the simulation plays out, with the goal being to focus on the points in time in which interesting events are taking place, skipping the "dead spots" in between. For simulations that are sparse, meaning that relatively few events happen at non-regular intervals, event-based simulations can be quite efficient. Our simulation will be a model of the inner workings of a dry cleaning establishment, though it should be noted that this technique has broader use, and you may find it useful in your own work going forward.

While delving into the details of your simulation, you will also gain experience with some features of C++ that we've yet to explore, such as defining your own template functions and template classes, the random number generation techniques built into the C++ standard library, and overloading operators and using these overloads in template classes.

A model of a dry cleaning establishment

Almost every strip mall and shopping center in Orange County houses a dry cleaning establishment. Let's suppose that you've decided to open one, but you've also decided to leverage your software-writing abilities to search for efficiencies that might help you to compete effectively with the cleaners that are already in business. In particular, you've decided to design and implement a simulation of what will take place in your establishment, so you can assess staffing levels, differing workloads, and how they will affect the overall throughput of your business.

Like any simulation, you'll need to begin with a model. An actual dry cleaning establishment is a physical store occupying physical space, complete with machinery, employees, customers, and so on, but there are more complexities here than you'd be able to effectively model in software. So your simulation will operate according to a simplified model, which still captures the essence of what will go on in your business once it's opened.

What will be cleaned

To keep our simulation reasonably simple, we'll say that all articles of clothing, which we'll call garments, are identical in size and weight — while there are realistic differences between, say, dress shirts and graduation gowns, we won't bother to model them here, because we expect that these differences will even out over the course of time. We will make certain parameters configurable, so we can model different average clothing sizes in different simulation runs.

The machinery used

Your business will consist of the following machinery:

Pre-treatment stations, in which an employee inspects a garment, finding and specially treating stains, so they'll be likelier to be removed during the cleaning process.
Cleaning machines, in which batches of pre-treated garments are cleaned using dry cleaning products. An employee is required to transfer a set of garments into and out of a cleaning machine, but no employee oversight is required while the machine is running.
Finishing stations, in which an employee takes a cleaned garment, (optionally) presses it and packages it for customer pick-up.

All garments will follow a process in which they pass through all three of these stages, though not all garments will spend the same amount of time in these stages.

While interactions with customers — whereby garments are dropped off and picked up — are also part of the business, we will not model these in our simulation, as we expect the cleaning process to be the most labor-intensive and, thus, the limiting factor in our growth.

Employees

Your business will be hiring some number of employees. While, of course, real people differ in terms of their skills and motivations, our simulation will presume that they are interchangeable, though certain aspects of their collective behavior (e.g., how quickly they process garments) will be configurable.

A word of warning

Any resemblance to the inner workings of an actual dry cleaning business here are purely coincidental; in reality, I know no more about dry cleaning than I do about farming, which might also make for an interesting simulation — Zynga certainly thought so, and many casual gameplayers agreed.

An overview of the simulation process

The simulation process works in the following way. Whenever "randomly-selected intervals", "random amounts", "set numbers", or "set amounts" are discussed below, these intervals and amounts are configurable; how to configure these settings is specified below.

At a randomly-selected interval, a customer arrives at the business with a random number of garments. These garments are immediately and automatically placed into an incoming queue, where they will await their pre-treatment. No employees are necessary for handling this step.
When an employee is available, a garment will be taken from the incoming queue and will go through a random amount of pre-treatment. During this time, the employee will not be available to do additional work. When the garment is finished with its pre-treatment, it is placed into a cleaning queue, where it awaits cleaning.
A cleaning machine can clean a set number of garments at a time. When there is an employee available and a cleaning machine is available and either there are a sufficient number of garments in the cleaning queue or there are garments in the cleaning queue and no garments in the middle of pre-treatment, as many garments as can be moved into an available machine are moved (up to the limit that the machine will accept). Moving garments into the machine is immediate, then the machine takes a set amount of time to complete its task, during which time the employee is free to do something else. (So, in general, there needs to be an available employee to do the moving, but the movement is instantaneous.)
When a cleaning machine has finished cleaning a set of garments and an employee is available, all of the garments are moved into a finishing queue, where they await their finishing. Moving garments from a cleaning machine into a finishing queue is immediate, though there does need to be an employee available to do it. After this is done, the cleaning machine and the employee are available.
When an employee is available, a garment will be taken from the finishing queue and will go through a random amount of finishing work. After it's been finished, the garment is done being processed and is ready to be returned to a customer (which is not modeled in our simulation; once it's done, it disappears).

What to do when there is a choice about jobs

Whenever an employee is available and has a choice about which job to do — finishing a garment, unloading a cleaning machine, etc. — jobs later in the process are always chosen ahead of jobs earlier in the process. In other words, an employee will always work on finishing a garment before pre-treating a garment.

Configuring the simulation

The simulation reads a set of input parameters from cin to configure its behavior. An example of that input is below, with an explanation following.

The configuration parameters are:

A positive number of hours that the simulation should run. This number is not limited to 24 hours; the implication is that the hours in which the dry cleaning establishment is open are being modeled, and other hours are being ignored. (Note that this is a measurement of simulation time, not real time; it shouldn't take 12 hours to run a simulation of a 12-hour day.)
A positive number of employees that will be on staff at any given time. For simplicity, this number stays constant throughout the simulation.
A positive number of pre-treatment stations that can be used during the simulation.
A positive number of cleaning machines that can be used during the simulation.
A positive number of finishing stations that can be used during the simulation.
The mean and standard deviation of the time it takes to perform pre-treatment on a garment, expressed as unsigned integers representing numbers of seconds.
A positive number of seconds required to run a batch of garments through a cleaning machine.
A positive number of garments, at maximum, that can be run through a cleaning machine at a time.
The mean and standard deviation of the time it takes to perform finishing on a garment, expressed as unsigned integers representing numbers of seconds.
The mean and standard deviation of the time that elapses in between customer arrivals, expressed as unsigned integers representing numbers of seconds.
The mean and standard deviation of the number of garments that each customer will drop off, expressed as unsigned integers representing numbers of garments.

The input format is designed to be as easy-to-parse as possible; it should be possible to read all of this input using only the >> operator on cin, and your program can freely assume that this input format will be followed (i.e., it's fine if your program misbehaves or crashes if given input that does not follow this format).

How discrete event-based simulations work

An obvious way to build a simulation like this is to implement it as a time-based simulation, which means that you center your implementation around a simulation loop, each iteration of which corresponds to one clock tick (i.e., the minimum meaningful chunk of time in your simulation) and continues until the appropriate number of clock ticks have elapsed. In addition to being simple to implement, this is a nice approach if the simulation is dense with activity, where many events are taking place during every clock tick.

However, considering the nature of this simulation, we realize that it is actually quite sparse. While our simulation runs at a resolution of one second, in most seconds nothing meaningful will happen at all; employees will be in the middle of pre-treating or finishing garments, or there will be no garments to work on at all. So looping over every second, when the majority of the iterations of the loop will do nothing, is tremendously wasteful. In a simulation that runs for s seconds and includes e events, we'd like it to run in much closer to O(e) time than O(s) time — in other words, we'd like the running time of our simulation to be determined by how densely packed with activity it is, rather than by the length of the simulation. Since e is much less than s in the case of our dry cleaning simulation, we should prefer an implementation that does not involve a one-clock-tick-per-iteration simulation loop.

A better solution for sparse simulations like ours is to build a discrete event-based simulation. Instead of looping over every second of simulation time, we'll loop over a sequence events. This will allow us to seamlessly skip the "boring parts" where nothing meaningful is happening, focusing our attention only on the seconds in which activity actually occurs. In a very rough pseudocode form, our simulation loop will look something like this:

set the current time to 0
add the first customer arrival to the event schedule

while the event schedule is not empty
      and the current time is less than the simulation length
{
    get the next event e from the event schedule
    set the current time to e's time (skipping the meaningless time until then)

    if the current time is less than the simulation length
    {
        execute the event e
        schedule any subsequent events implied by the execution of e
    }
}

So, the first question is what data structure we should use to store the event schedule. Initially, it may seem like a queue is a good choice. However, on closer inspection, we find that we need a slightly different approach. Consider this example, based on the set of example parameters in the previous section:

Initially, the simulation time is 0 (i.e., the simulation has just begun).
Before starting the simulation, we'll schedule the first customer arrival, by picking a random number of seconds from a normal distribution with mean 420 and standard deviation 180 (see the section titled About random number generation below for more information). Let's say that the random number is 505. We'll schedule the first customer arrival for time 505.
Now we enter the simulation loop and handle the first event in the schedule. The only event currently scheduled is a customer arrival at time 505. So, we set the current time to 505 and execute the event. Executing the event requires choosing a random number of garments; let's say that number is 3. The customer leaves 3 garments, which enter the incoming queue. Meanwhile, we schedule the next customer arrival by choosing another random number of seconds at random (mean 420, stdev 180); let's say that number is 350, so we schedule the second customer arrival for 505 + 350 = 855.
Since there are six employees and four pre-treatment stations available, three of the employees pick up a garment and begin pre-treatment on it, with the time still at 505. We choose a random pre-treatment time for each; let's say those numbers are 65, 75, and 85. So we schedule pre-treatment completion events for 505 + 65 = 570, 505 + 75 = 580, and 505 + 85 = 590, respectively.
That's it for processing the first event, so it's time to move on to the next event in the schedule. There are currently four events in the schedule:
- A customer is scheduled to arrive at time 855
- An employee will finish pre-treatment a garment at time 570
- An employee will finish pre-treatment a garment at time 580
- An employee will finish pre-treatment a garment at time 590
Of these, the next one that should occur is the employee finishing a pre-treatment at time 570, so we skip forward to time 570 and process that event.
Processing the completion of pre-treatment simply means that we place the garment in the cleaning queue. Since the cleaning queue has only one garment in it, but the cleaning machine can hold 10 garments, we don't start a cleaning batch yet.
Similarly, we handle the completion of pre-treatments at times 580 and 590, placing these garments in the cleaning queue. At this point (at time 590), there are no more garments in pre-treatment and there is an employee available, so we move the garments to a cleaning machine and start it. Cleaning machines take 600 seconds to run, so we schedule an event for the completion of the cleaning batch at time 590 + 600 = 1190.

The simulation proceeds from there in much the same fashion. What should be clear from the example above is that we need a data structure that allows us to schedule events in any order, yet have them emerge from the schedule in the appropriate order (i.e., in ascending order in terms of time). This sounds like a job for a priority queue, where the events are the items, the scheduled times are the priorities, and the item with the lowest priority is considered the most important. The requirements for your priority queue implementation are outlined in the next section.

The other thing you may have noticed from the example above is that you'll need to define a set of event types. For each event type, you'll need to decide what information needs to be stored and what actions should be taken when the event is executed. Notice how customer arrival events were designed in this example:

A customer arrival event, like all events, is scheduled for a particular time
A customer arrival event might also contain the number of garments the customer will drop off
When executing a customer arrival event...
- ...garments are dropped off into the incoming queue
- ...as a result, if employees and pre-treatment stations are available, at least some of the garments begin pre-treatment
- ...regardless of whether garments begin their pre-treatment immediately or remain in the incoming queue, the next customer arrival event is scheduled

So we see, in general, that we schedule an event when we know for sure that something will happen at a particular time in the future. Notice that we don't schedule events for garments emerging from queues; this is because it's something else (e.g., a pre-treatment station becoming available) that causes these garments to be removed from queues, and we're never entirely sure of exactly when that will be, especially as queues get longer and the number of interdependencies between events increases.

You'll need to decide what types of events you'll need, what information they'll carry with them, and what it will mean to execute them; that's one of the concepts at the heart of implementing your simulation.

Implementing your priority queue as a template class

You are required to implement your priority queue as a template class. At a minimum, your template class should take one type parameter, specifying the type for the items. You may assume in your template class that the specified type has a < operator and an == operator defined for it, though you should be sure to document this assumption, as well as any others you're making, about the type parameter.

Priority queues can be implemented in a straightforward way, just as queues can, though the resulting implementation of a priority queue is inefficient compared to a similarly-implemented queue. By storing the items in a sequential data structure, such as an array, vector, or linked list, your implementation will feature either O(n) enqueues or O(n) dequeues (or, at worst, both), whether you keep the items sorted in order of their arrival or their priority.

You have the option of implementing your priority queue in this fashion — by simply having a sorted array or vector of elements. If you prefer, a binary heap is a better approach, which yields O(log n) enqueues and dequeues, which is a significant improvement, especially when there are a large number of items in the priority queue. Implementing a binary heap is optional (and, as usual, no extra credit is offered).

Using polymorphism to your advantage

Overloaded operators and polymorphism don't work especially well together, so I suggest the following design sketch for the events in your simulation:

Create an Event class. Include within it a member variable for the event's scheduled time, and overload comparison operators such as < and ==, so that your priority queue can compare two events using them.
Create an abstract EventHandler class. Include a pure virtual execute method within it.
Derive a class from EventHandler for each kind of event supported by your simulation. Implement the execute method as appropriate in each.
Each Event object should contain an EventHandler*, pointing to a handler of the appropriate type for that event. So, for example, a customer arrival event will be contained in an Event object with the EventHandler* pointing to an event handler that knows how to handle customer arrivals.
In your priority queue, store Event objects.

If you follow this approach, your simulation loop will be very simple, needing only to dequeue an event, then call execute on it (which calls execute, polymorphically, on its EventHandler).

The reason why Event and EventHandler are best separated is because we want to be able to use operators like < and == to compare objects within the priority queue, but we also want the priority queue to store events that can be handled polymorphically, with different kinds of events being handled differently depending on their run-time type. One way to attempt this would be to store Event*'s in the priority queue instead, then have a class derived from Event for each kind of event; the problem with this approach is that the comparison operators on pointers compare the addresses the pointers point to, not the objects!

The design I suggest above provides you with the best of both worlds: Event objects that can be compared with overloaded operators, but polymorphic event handlers that can, at run time, automatically handle the appropriate type of event in the appropriate way.

The value of working incrementally

There is less design advice being given in this project than in the previous one, as I'd like you to experience more freedom in your design process. This almost necessarily means that you will take some missteps, which makes it all the more critical that you work incrementally and try to find a way to reach stable ground — with some part of your program working and tested, even if other parts are missing or even incorrect. Whenever you feel the ground is relatively stable underneath you, save a copy of your work; then, in your next iteration, if you feel that things have run off the rails, you can revert to your previous, stable copy and try again.

Output of your simulation

Your simulation program should generate the following output.

Each time something happens in the simulation (e.g., a garment begins pre-treatment, the cleaning machine is started), print a message to cout specifying what the current simulation time is and what happened.
At the conclusion of your simulation, show the following overall statistics:
- The total number of garments that were completely processed (i.e., completed their finishing step)
- The number of garments still unfinished somewhere in the simulation (including any in the incoming queue that have yet to be started)
- The number of completed cleaning machine batches
- The percentage of time employees spent idle, overall (i.e., the sum of the total number of seconds spent idle by each employee, divided by the number of seconds in the simulation times the number of employees)

About pseudorandom number generation

A thorough treatment of pseudorandom number generation is available in the form of a code example, mirroring and extending a recent conversation we had in lecture about it. Feel free to use the NormalIntDistribution template class from the code example in your implementation of this project.

Starting point

This project has no starting point, as I'd like you to build it from scratch, though I've given a fair amount of design advice along the way. As always, feel free to ask questions as you have them.

Deliverables

Submit all of the source (.cpp) and header (.h) files that comprise your program. Afterward, take a moment to be sure that you submitted all of the files; if you missed one, we won't be able to compile and run your program, which can result in a substantial penalty, since we won't be able to evaluate your program's correctness.

Follow this link for a discussion of how to submit your project via Checkmate. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the project that you want graded. We won't regrade a project simply because you accidentally submitted the wrong version.

Limitations

You must build your own PriorityQueue template class — as opposed to the priority_queue adapter template from the standard library — and you may use the standard library std::vector in the implementation of your PriorityQueue, and you may feel free to use the standard library anywhere else in your program where you believe it will help.