ICS 65 Fall 2012
Project #2: Maps and Legends

Due date and time: Wednesday, November 7, 11:59pm


Introduction

In the last couple of decades, two forces have combined to fundamentally change how software is built:

What were once single, large-scale software systems running on beefy individual servers are now collections of cooperating services communicating with one another over networks. The paradigm of providing software as a service (i.e., providing the ability to access software running on the provider's servers — or even on servers rented by the provider from a third party — rather than providing software to be installed on the user's infrastructure) is becoming increasingly popular and profitable.

So it's become quite useful to break complex problems into small services. In this project, we'll consider one such service: an authentication service that manages usernames and passwords. It will not be battle-ready — it'll store its information only in memory with no redundancy, and will completely ignore security, for example — but it will serve as a vehicle for us to continue our recent exploration into writing well-behaved C++ classes and begin to consider the design of somewhat larger C++ programs, which will seed our work on future projects.

Well-behaved classes

We discussed in lecture what I call well-behaved C++ classes. A well-behaved C++ class is one that "just works" when you use it the way you use any other type in the language. Well-behaved C++ classes have objects that clean up after themselves when they die, that can be copied in a way that makes the copy unique and separate from the original, that can be made constant while preserving the ability to perform whatever operations do not change the publicly observable state of the object, and so on.

Every C++ class you write, starting with this project, will have to be a well-behaved class. What we'll discover as we go forward is that the design choices we make can make this a much simpler goal to achieve than you might think. But first we need to understand where the issues and pitfalls lie, and what tools C++ provides us to solve the problem.


The program

You will be writing an authentication service, whose role is to keep track of username/password combinations, verify that a particular username/password combination is valid, and be able to report on the number of unique username/password combinations that are currently known. As in the previous project, it will read all of its input from the standard input (cin) and will write all of its output to the standard output (cout), though you could certainly imagine it doing its work across a network connection instead. (If text-based communication like this seems primitive, you might be surprised to find out that many well-known Internet protocols actually send text-based commands and responses that are little different than what we're doing here.)

Your program should read one line of input at a time, parse it, and execute one command. Any command that is unrecognized — because, for example, it's an unrecognized command or it has too few parameters — is should be recognized as invalid. Valid commands, on the other hand, should be executed and will have some kind of observable effect.

The program continues reading, parsing, and processing one command at a time until a special "quit" command appears on the input, in which case the program ends.

The program stores a collection of username/password combinations in a hash table stored in memory. Initially, there are no username/password combinations stored; there are commands to create and remove them.

Before the program ends, any objects it has allocated dynamically must be deallocated.

The commands

The following commands must be supported. Every command appears on a line by itself, and the output of every command should appear on a line by itself.

Command Format Description
CREATE username password Create a new username/password combination and stores it in the program's collection. If successful, the output is CREATED. If the username is already stored in the collection, no change is made and the output is EXISTS.
LOGIN username password Checks a username/password combination to see if it is valid. A username/password combination is valid if it exists (i.e., the username is in the collection and is associated with the password), in which case the output is SUCCEEDED. If the username/password combination does not exist, the output is FAILED.
REMOVE username Removes the username/password combination with the given username, if it exists. If so, the output is REMOVED. If no username/password combination with the given username exists, the output is NONEXISTENT.
COUNT The output is the number of username/password combinations currently being stored.
QUIT The output of this command is GOODBYE. Once this command has been processed, the program should end.

All commands require all of the parameters listed above. The output for any invalid command — one that is missing parameters, has too many parameters, or is simply unrecognized (e.g., LISTEN to music) should be INVALID.

Minor but important details

All input and output is case-sensitive. (You'll find that this means you don't have to worry at all about case, as string comparisons, by default, take case into account.)

It is safe to assume that usernames and passwords can contain any character other than whitespace, but they can never have whitespace characters in them. (This simplifies the problem of parsing the commands.)


A complete example of program execution

The following is a complete example of program execution, demonstrating how input and output are interleaved. Input is shown in a regular font weight; output is shown in bold.

CREATE thornton@ics.uci.edu abcdefg
CREATED
CREATE boo@thornton.com sleeping
CREATED
CREATE boo@thornton.com playing
EXISTS
LOGIN thornton@ics.uci.edu abcdefg
SUCCEEDED
LOGIN thornton@ics.uci.edu defg
FAILED
LOGIN bill.gates@microsoft.com windows
FAILED
COUNT
2
REMOVE thornton@ics.uci.edu
REMOVED
REMOVE thornton@ics.uci.edu
NONEXISTENT
REMOVE edge@u2.com
NONEXISTENT
LOGINS hello@hello.com hello
INVALID
LOGIN thornton@ics.uci.edu
INVALID
LOGIN
INVALID
WTF
INVALID

INVALID
QUIT
GOODBYE

Some background on our hash table implementation

Hash tables are implemented in many slightly different ways, but the central concept is always the same: when storing a collection of search keys (and possibly other information attached to each), define a way to determine where each search key "belongs," then use that as a starting point for deciding where to store the key and where to find it later. Deciding where a search key belongs is the role of a hash function, whose job is to take a key and return a hash value. The hash value is, in turn, used to choose a location to store, find, or remove the key.

Our hash table has the specific goal of acting as a map, which is a collection of key/value pairs; so, we'll implement it in a class called HashMap. It will be separately chained, which is to say that it will be implemented as a dynamically-allocated array of buckets, where each bucket is a singly-linked list (or, more specifically, a dynamically-allocated array of pointers to nodes, with an empty list represented by nullptr). Because keys and values are paired together, each linked list node will store both a key and a value.

Hash functions

Each HashMap can optionally be given a hash function as a constructor parameter, or it will use a default (of your choosing) if none is specified. Hash functions have the type std::function<unsigned int(const std::string&)>, which means they are anything that can be treated as a function that takes a const std::string& as a parameter and returns an unsigned int. Since they will be unaware of the number of buckets, they can actually legally return any arbitrary unsigned int, so it will be up to the HashMap class to take the values returned by the hash function and reduce them into the range of available bucket indices (e.g., by using the % operator).

Load factors and rehashing

While linked lists can grow with relative impunity, the performance of a separately-chained hash table is a function of the lengths of its linked lists, so we're strongly incentivized to keep those lists as short as possible. Even with a wonderfully-designed hash function, a separately-chained hash table can still be slow simply because it's become overly full, with every list storing multiple keys. We'd like to avoid this problem.

This leads to a question: How do we measure how "full" a hash table is? We say that the load factor of a hash table is the number of keys it's storing divided by the number of buckets (or, stated differently, the average length of its lists). To avoid the performance hit of becoming overly full, your HashMap class is required to allocate a larger number of buckets and rehash all of the keys whenever the load factor exceeds a threshold of 0.8. (The reason that rehashing is necessary is that the number of buckets has an effect on which bucket a key will be stored in, so changing the number of buckets requires rehashing the keys so they're each stored in their new "home.")

Design requirements for your HashMap class

Your HashMap class must use the following header file as a starting point:

That header file declares a set of members that your class is required to implement as-is — though you're welcome to add anything you'd like to it, you won't be able to change or remove anything — because we'll be running a set of unit tests against your HashMap class to verify its correctness, separately from the rest of your program.

One of the primary goals of this project is to explore the tools provided by C++ to allow you to write a well-behaved class, so your HashMap class will be required to be well-behaved.


Some rules, limitations, and additional challenges

Here are the rules and limitations governing your work on this project.

Additional challenges

As you work on the project, if you're interested in tackling additional challenges, here are a few directions you can go. In general, you should always feel free to explore the use of language features we've yet to cover, though you should also be aware that you sometimes won't be able to submit your work (if you choose features that explicitly violate one of the rules above, such as using C++ standard library containers like std::list); that doesn't stop you from doing it as a learning experience.


Testing

As in the previous project, there is no explicit deliverable demonstrating that you tested your program, but you would nonetheless be well-advised to run your program on test inputs other than the example here, and to test your HashMap class in ways not necessarily exercised by the entire program, as we will be running two kinds of tests:

Why should we test functionality that's not used by the program?

You should think of classes as reusable components. To the extent that we can make their designs clear and their implementations bullet-proof, reuse will be enabled. For example, when you've finished your HashMap class, you should be able to write a second, separate program that uses it in ways that your original program didn't — and I'd suggest doing this in the course of your testing — yet still see, ultimately, that it works as it should; if not, you still have work to do.

Writing two main() functions

The main() function in C++ is special, in the sense that there can only be one of them in a program. If you try to compile a program with more than one main() function, the linker will refuse to link the program, because the initial call to main() — the one that starts the program — will be ambiguous.

There is a lot of value in writing your tests in a separate source file from your regular main(). Given that, there are at least a couple of ways to work around the problem in Visual Studio:

What to submit

As no credit is offered for them, please do not submit your tests, as it runs the risk of making it more difficult for us to compile and run your program using our test automation. The value in writing tests is not to please us; the value in writing them is ensuring that your HashMap class is complete and correct (which will be reflected in your score).


Deliverables

Submit all of the source (.cpp) and header (.h) files that comprise your program. Afterward, take a moment to be sure that you submitted all of the files; if you missed one, we won't be able to compile and run your program, which can result in a substantial penalty, since we won't be able to evaluate your program's correctness.

Follow this link for a discussion of how to submit your project via Checkmate. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the project that you want graded. We won't regrade a project simply because you accidentally submitted the wrong version.