ICS 65 Fall 2012
Project #3: Back on the Chain Gang

Due date and time: Monday, November 19, 11:59pm


Introduction

The openness of the Internet's email service is both a strength and a weakness. It's nice to be able to send messages to people knowing only their email addresses, just as it's nice to be able to receive email from old friends who found your address, say, on your LinkedIn profile. Unfortunately, not everyone on the Internet is well-intentioned, so a variety of problems, such as spam and phishing schemes, have arisen from the ability of anyone to send email to anyone else at virtually no charge.

Even people with good intentions can cause problems on an email network. Chain letters, letters designed to motivate receivers to forward them to their friends and acquaintenances, have been around for many decades in paper form, sent via physical postal services. Since Internet-based email's relatively humble beginnings, electronic equivalents of chain letters have been propagating throughout the network; thanks to the relative ease and low cost of forwarding a message to many recipients at once, it was inevitable.

In this project, you'll write a simulation of chain letters propagating through an email network. As in real life, people in our network will behave differently from one another. Some will be more apt to forward chain letters while others will prefer to ignore them. Some will be tend to forward chain letters to many of the people on their contact list; others will be more selective. Your program will show how individual messages work their way through the network, and also will keep track of how many copies of each message are received by each person.

This project will allow you to practice using inheritance and polymorphism in C++. It will also provide you with an opportunity to use a number of features from the C++ standard library (maps, lists, and generic algorithms) that we have not covered in lecture, building your ability to learn new library features from existing documentation — a critical skill for writing real software. The documentation will acquaint you not only with precisely what functions and classes are available in the library, but will also begin to get you accustomed to C++ idioms and terminology that we might not have covered yet; all of this makes you a better C++ programmer. (Of course, I'm happy to help if you get stuck, though you may discover that I'll need to look up some of the details myself; no one is ever able to keep an entire library memorized, which is, in large part, why being able to quickly look things up in documentation is so important.)


The simulation

Central concepts

Our simulation is centered around the following concepts:

The simulation process

Your program will begin by reading an input file, which specifies the email addresses of all the people in the simulation, the addresses of all the people in each person's contact list, and the strategies that govern each person's behavior. The input file also specifies the set of chain letters that are to be placed in each person's inbox before the simulation begins. (The only email in this simulation are chain letters of various types; "normal" messages are not included.) Once the simulation is ready to begin, it proceeds according to the following algorithm:

The "round robin" nature of this simulation, moving from one person to the next and processing one message at a time for each, is a crude simulation of the way that people often interleave email with other work that they might be doing throughout the day.

Message types

There are four kinds of chain letters that our simulation will include:

Message quality

Messages of each type, separately, have a "quality" rating, which is an unsigned integer that indicates how likely it is that someone would want to forward a message to others. This rating allows us to simulate the fact that some real-world chain letters are better-written than others — some jokes are considered funny by a broader audience than others, some bogus virus warnings are more believable than others — and are thus more likely to be forwarded. Higher ratings are considered better than lower ones. There is no upper bound on the quality rating for a message, other than the implicit upper bound that arises from the fact that the unsigned int type in our implementation of C++ has an upper bound of around 4,000,000,000.

Strategies for determining whether a message is worth forwarding

Each person has a strategy for determining whether a particular message is worth forwarding to others. A person will have one of the following strategies for this:

Strategies for deciding on a list of recipients of a forwarded message

In addition to a strategy for determining whether to forward a message, each person also has a separate strategy for deciding who among the members of his or her contact list should receive a forwarded copy of the message. Once a person decides to forward a message, this strategy is used to decide who the recipients of forwarded copies will be. A person will have one of the following strategies for determining a recipient list.

Handling duplicate messages

Each person keeps track of which message content they've seen previously. If a person receives a message whose content he or she has seen before, the message is disregarded and not forwarded to anyone, regardless of the person's strategies. This rule turns out to be important, not only for "accuracy" of the simulation, but because it will allow the simulation to terminate. (Without this rule, two people might forward a message back and forth between one another forever.)

An assumption about email addresses and people

For simplicity, we'll assume in our simulation that no person has more than one email address. For this reason, email addresses can be assumed to uniquely identify individual people.

Configuring the simulation

An input file specifies all of the information that configures the simulation. Your program should accept one command-line argument, specifying the name of the input file that should be used to configure the simulation before it runs. (More information about command-line arguments and reading input from files is provided later in the write-up.)


Input file format

The input file specifies two things: information about each person, and information about each chain letter that should initially be placed in a person's inbox.

It begins with a number specifying how many people will be included in the simulation, followed by information about each person. For each person, the following information is listed:

After the information about all of the people, there is a number that indicates how many chain letters will initially be placed in people's inboxes before the simulation starts. This is followed by information about each message:

Here is an example input file:

3
alex@alex.com
    1 hello@paul.com
    3 10
    2
hello@paul.com
    2 alex@alex.com bruce@springsteen.net
    1
    1
bruce@springsteen.net
    1 alex@alex.com
    4 3 7
    3
2
3 6 feelgood@happy.com alex@alex.com
4 1 scary@virus.org bruce@springsteen.net

You may assume that any input file is properly formatted according to these rules. I designed the file format so that you can use the >> operator to read from the file, without regard to parsing spaces or worrying about lines, so all of the information could appear on one line or on separate lines as shown above, and your program should easily be able to handle it. (More information about reading from files follows in a later section of this write-up.) It's okay for your program to misbehave or crash if given an improperly-formatted input file. On the other hand, your program should not crash if given the name of a non-existent file, or if the program is not given any command-line argument at all.


The output of your simulation

As your simulation runs, you should print out information about each message that is placed in someone's inbox (including the messages initially sent before the simulation starts). The information must be formatted precisely like this; pay attention to spacing, capitalization, and punctuation here.

Message received!
From   : blah@blah.com
To     : hello@paul.com
Content: ID#1 (Type#2)

where the "Content" field shows the ID of the message content (which is a placeholder for the actual text of the message) and the number corresponding to the message's type. Use the same numbers as in the input file (i.e., jokes are indicated as "Type#1", pyramid schemes as "Type#2", and so on).

For clarity, please follow each such block of message information by a blank line.

At the conclusion of the simulation, print, for each person, how many copies of each message (uniquely identified by its content) were received, as well as the total number of messages received by that person. That output must again be formatted precisely, following this model. Spacing, punctuation, and capitalization are all relevant.

Messages received by hello@paul.com:
    Content#1: 3
    Content#3: 1
    Content#5: 2
    TOTAL RECEIVED: 6

Print only the information that corresponds to messages actually received by a person; in other words, there should be no counts of "0" in this section, except in the "TOTAL RECEIVED" field when a person never received any messages.


Design advice

You'll find that this program is not as big as it sounds; you may not even find that you're writing a whole lot more code than you wrote in the previous project. Rather than focusing your efforts on low-level details of memory management, as you did in the previous project, you'll instead be focused on building a larger program out of many smaller pieces, using inheritance and polymorphism in a couple of instances. The trick, as with most software, is the design; the better your design is, the easier it will be to write the program. With that in mind, I have some design advice that you should read through and understand before you proceed with implementing the project.

You may use anything in the C++ standard library — classes and/or functions — that you'd like. In the "real world," there's never a reason to re-invent the wheel when a suitable pre-built (and pre-tested!) implementation is already available.


Accepting command-line arguments in a C++ program

Writing your program to accept command-line arguments

As in Java, command-line arguments can be passed to C++ programs. The typical mechanism for accepting command-line arguments in a C++ program is to declare the main() function so that it accepts two arguments, argc and argv. argc indicates the number of command-line arguments have been passed, while argv is an array of strings consisting of the actual arguments. (The reason why both an array and a count are necessary in C++, whereas only the array is necessary in Java, is because C++ arrays, unlike their Java counterparts, do not know their own size. We'll talk in more detail about single- and multi-dimension arrays later in the quarter.)

The proper declaration for main(), if you'd like it to accept command-line arguments, is this:

    int main(int argc, char** argv)
    {
        // ...
        
        return 0;
    }

The declaration of argv seems a little strange (pointer to pointer to char?). This is actually C++ lingo for an array of C-style character strings, where a C-style string is implemented as a pointer to an array of characters, and an array is implemented as a pointer to its first element. The details, for now, are not especially important, but suffice it to say that you can treat each element of the argv array as a string. For example, the following code would print the first command-line argument to the console:

        cout << argv[1] << endl;

C-style strings can be implicitly converted to C++ string objects, so you'd also be able to do something like this, if need be:

        string s = argv[1];

The count, argc, will actually be one greater than the number of command-line arguments. This is because the name of the executable program is itself included as the first element of the argv array. So, for example, if you execute a program from the command line with the following command:

    MyProgram alex is happy today

Then argc will be 5, and argv will look like this:

01234
C:\CPP\MyProgram\MyProgram.exealexishappytoday

where argv[0] consists of the entire path to the executable version of your program.

Specifying command-line arguments when executing your program via Visual Studio

When developing your program with Visual Studio, you'll want to execute your program using the Start Without Debugging or Start Debugging menu options (or corresponding keyboard shortcuts), as you have likely been doing all quarter. When a program takes command-line arguments, you need to tell Visual Studio ahead of time what the command-line arguments should be. You can specify your program's arguments using the following procedure:


Reading input from a file instead of the console

File input/output in C++

So far this quarter, we've used two streams: cin, which represents console input, and cout, which represents console output. These streams are part of the iostreams library that is part of the C++ Standard Library. The iostreams library also supports file I/O, as well as I/O to and from string objects. Thanks to inheritance and polymorphism, file I/O works very similarly to the console I/O that you're already familiar with; the differences mostly revolve around the fact that files are less reliable — they may not exist, they have finite length — and files can be accessed in a non-sequential order. In this program, however, we're reading an input file with a predefined format, operating under the assumption that the file (if it exists) is properly formatted. So, we can treat our input file much like the console, except that we have to open it beforehand, close it when we're done, and deal with the fact that it might not exist.

Before you can use file streams, you'll need to include the appropriate header from the C++ Standard Library:

    #include <fstream>

Opening a file for input

An input file can be opened and manipulated using an ifstream object (where the "if" stands for "input file"). The ifstream constructor takes, as its parameter, the name of a file, then tries to open the file. So, to open the input file, you'll just need to do this:

    ifstream inputFile("filename.txt");

One minor caveat: the constructor expects a C-style string as its parameter, not a C++ string object. This means it will be safe for you to pass argv[1] as the parameter, since it's a C-style string. If you want to pass a C++ string to it, you'll first need to convert it to a C-style string, which you can do by calling c_str() on the string, like this:

    string filename;

    // ...

    ifstream inputFile(filename.c_str());

It's possible that opening a file will fail, because the file may not exist, it may be locked by another running program, or the program may not have access to it for security reasons. Unlike in Java, the ifstream constructor will not throw an exception if it fails to open the file, so you'll need to test the stream afterward to make sure that opening the file was successful. You can most easily test this by calling the is_open() method on the stream, like this:

    if (!inputFile.is_open())
    {
        // deal with the fact that the file could not be opened
    }

Reading from the file once it's open

Once the file is open, reading from it is just like reading from the console, using either the >> operator or getline(). For example:

    int i;
    inputFile >> i;
    
    string s;
    getline(inputFile, s);

Closing the file

When you're done reading from the file, it's best to close it. You can close a file by calling the close() method on the ifstream object, like this:

    inputFile.close();

It should be noted that ifstream's destructor automatically closes the underlying file if it's not closed already, so you can omit the call to close() if the file object dies automatically in an appropriate place. For example, if you have one function that reads the input file and then returns, with ifstream being a local variable within that function, you will not need to call close(). (One of the advantages of a programming language where you're in control of when objects die — as opposed to garbage-collected languages where the collector decides — is that you can associate the releasing of resources with the destruction of the object, yet still be certain of when the resources will be released. This is part of a broader design strategy, which is called "resource acquisition is initialization," which we'll talk more about later this quarter.)

Where your program will search for files by default if run via Visual Studio

On the Windows operating system, all running programs have a "working directory," which is the directory that will be searched whenever the program attempts to open a file without specifying the complete path to the file. By default, Visual Studio sets the working directory of your program to the same directory as your source code (.cpp and .h files). When you write test input files, it would be best to place them into your source code directory, so that your program will be able to find them easily.


A word about randomness in simulations

In real simulation work, it's a good idea for simulations to generate random behavior according to statistical distributions, rather than having all of the behavior be deterministic. I've opted to design this simulation to have deterministic behavior, so that it will always be clear exactly what the output of the simulation should be. This allows you to more easily determine whether your program is working; it also more easily allows me to grade it. But it should be pointed out that a "real" simulation would include randomness in its behavior.


Testing your simulation

To test your simulation, you'll need to design some input files, work out their expected output on paper, and check your output to see if it matches your expectations. Because this can be a somewhat arduous task, I encourage you to share your test input and expected output with one another, so everyone can benefit from one another's work.


Finding good reference material online about the standard library

While there is no "default" location where online documentation for the C++ standard library resides, a handful of sites exist that provide some nice documentation. Among the best is cppreference.com. As you find yourself exploring new parts of the library, or wondering whether something exists in the library, this is a great place to start your search.


Starting point

This project has no starting point, as I'd like you to build it from scratch, though I've given a fair amount of design advice along the way. As always, feel free to ask questions as you have them.


A word of warning about compatibility

While this course is entirely focused on standard C++, as defined in the C++11 standard, it's important to realize that the C++11 standard was completed relatively recently and, therefore, compiler support for it varies somewhat dramatically from one compiler to another, particularly in terms of what portions of the new library changes are implemented (and, in some cases, how). Be aware that your submission is required to compile and run on Windows using Visual Studio 2012.

As this project explores the standard library inn more depth, those of you who are doing your work predominantly using a compiler other than Visual Studio 2012's C++ compiler — because, for example, you're running Linux and don't have the option available — are well-advised to test your work on Visual Studio 2012 more thoroughly than you may have done in previous projects, as it will be easier than it has been to fall into one of the gaps between what is supported by one compiler and what is supported by another.


Deliverables

Submit all of the source (.cpp) and header (.h) files that comprise your program. Afterward, take a moment to be sure that you submitted all of the files; if you missed one, we won't be able to compile and run your program, which can result in a substantial penalty, since we won't be able to evaluate your program's correctness.

Follow this link for a discussion of how to submit your project via Checkmate. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the project that you want graded. We won't regrade a project simply because you accidentally submitted the wrong version.