Informatics 102 Spring 2012
Assignment #3: Concurrent Programming in Erlang

Due date and time: Wednesday, May 16, 11:59pm


Introduction

Erlang is a functional programming language that has direct, built-in support for concurrency — the ability to perform more than one task simultaneously on a machine — and distribution — the ability to perform cooperating tasks on multiple machines. In an era of networked, multicore computers, concurrency and distribution are becoming increasingly important. While Erlang doesn't do anything so special that it can't be done in other languages, it clearly demonstrates the difference between a language that allows you to build infrastructure that supports concurrency and distribution (like Java) and one that has this infrastructure built in.

This assignment gives you experience with both sequential and concurrent programming in Erlang, though it will focus on the concurrent aspects, using the sequential problems as a warm up to familiarize you with Erlang syntax and semantics. Whether you've previously written concurrent programs in other languages, such as Java, it's likely that your experience with Erlang will feel different than anything you've done before.


How do I learn Erlang?

I presume that most of you are new to the Erlang programming language. As with any new programming language, there are two hurdles to cross: the syntax hurdle — how do I say what I want to say? — and the semantic hurdle — what should I want to say in the first place? Erlang encourages (and, in fact, requires) you to think differently about programming than Java does, so the semantic hurdle will actually be a little bit tougher, though you will be able to rely on your previous experience with functional languages like Scheme and Haskell, as Erlang has features familiar from each. Our goal here is not to become immediate experts, but to experience the aspects of Erlang that are new and interesting, relative to the things we've done before; this new experience will make us better software writers than we were, even if we never see Erlang again.

To get you started, I've written up a tutorial that describes the subset of Erlang that we discussed in lecture, along with instructions on using Erlang in the ICS labs and installing Erlang on your computer.


Part 1: Sequential Erlang programming (30 points)

In a module named part1, in a file named part1.erl, write and export the following six functions. You will likely need to write helper functions for some of these, but you should not export any of them.

You are not permitted to use the if or case expressions (which we didn't cover in lecture). Use pattern matching instead. It's important, when learning a new language, not to write only using the idioms you know from other languages; to get the most out of the experience, you'll want use the idioms, like pattern matching, that are appropriate in the new language.


Part 2: Processes and message passing (40 points)

Write a module named dictionary_server, whose job is to implement a process that maintains a dictionary (i.e., a collection of key/value pairs). Keys should remain unique in the dictionary, though the values are not required to be. It should export only the following functions.

You're permitted to write any additional non-exported functions that you need, though you are required to export all, and only, the functions described above.

You will need to use message passing to make this work, though users of the dictionary server will never need to send or receive messages explicitly and, thus, won't need to know the format of those messages; instead, all message passing will take place within your module. The reason that you'll need to send and receive messages is that the caller of, say, the insert/2 function will be a different process than the dictionary server process; the only way for these processes to exchange information is to pass messages back and forth. You'll need to decide what the messages should look like, what information is important to carry in them, and how they should be handled.

An important consequence of our design is that we're not requiring users of our dictionary server to know its pid. (This is actually a good thing, in the sense that it makes our dictionary server easier to use; the downside of this design approach is that we can't have more than one dictionary server running at a time. Still, there are lots of practical systems where you only want one of a particular server to be running, so this is a surprisingly common practice in Erlang.) This means that you'll need to use process registration to give the process a name, so that this name can be used throughout your module and the pid will never need to be used once registered.

One thing to remember is that you'll need to process messages recursively, one at a time, since Erlang has no looping construct. Remember, too, that it's important that you process messages tail recursively, so that the dictionary server could potentially run forever without running out of stack space. Tail recursion in server processes isn't just a nice performance optimization; it's absolutely critical.

When you're done with this part, you'll have what we called, in lecture, an actor. Actors are a lot like objects in object-oriented languages like Java, except that they run concurrently with other actors (and other Erlang processes), but can only themselves do one thing at a time. This is one model of concurrent programming that's gaining at least some traction in software design circles; it's not a panacea, but it's a nice solution to some kinds of problems.

An extra efficiency challenge (optional)

You are not required to build your dictionary using an efficient data structure — a list is fine — but you're welcome to build something better, like a binary search tree, if you'd like.

If you want to try building a binary search tree, one way to do it is to represent it as a tuple whose first element is the atom bst. The second element can be either the atom empty, if the tree is empty, or a three-element tuple containing the root of the tree (a two-element tuple containing a key and a value), the left subtree, and the right subtree. So an example binary search tree, whose root contains the key 20 and the value "X" and has a right subtree containing only a node with the key 30 and the value "Y" and no children, might look like this:

    {bst, {{20, "X"}, {bst, empty}, {bst, {{30, "Y"}, {bst, empty}, {bst, empty}}}}}.

Additionally, I'd suggest building your binary search tree in a separate module called bst, separating that concern from the concern of building your dictionary server.


Part 3: Using the MapReduce algorithm (30 points)

In lecture, we implemented the MapReduce algorithm in Erlang. MapReduce is an algorithm that asks concurrent or distributed tasks — in Erlang, these tasks are Erlang processes — to perform some portion of a large job, then collects and combines the intermediate results into one final result. MapReduce can be very efficient if the processes can genuinely run concurrently (e.g., there are enough processors to run all of the processes, or the processes are distributed on different machines, a scenario that Erlang supports natively), so long as the time spent sending the messages and combining the results is not prohibitive, relative to the time that the individual processes spend doing their work.

In this part of the assignment, you are required to take the MapReduce implementation from lecture, which you'll find (along with an deeper explanation of the algorithm) in the Code Examples section of the course web site, and apply it to a new problem. Do not make modifications to the MapReduce algorithm that was provided; the only goal here is to learn how to apply MapReduce to a different kind of problem than was demonstrated in the example.

For this part of the assignment, you'll write two modules:

The integer_finder module should hide all of the details of how the MapReduce algorithm is being used (e.g., what message should be sent to the integers_list processes, what the response messages will look like, what the initial result should be, etc.).


Deliverables

Submit all of the .erl files that comprise each part of the assignment. Do not submit .beam files or any other files.

Follow this link for a discussion of how to submit your assignment via Checkmate. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the assignment that you want graded. We won't regrade an assignment simply because you submitted the wrong version by accident.