Informatics 102 Spring 2012, Assignment #3: Concurrent Programming in Erlang

Introduction

Erlang is a functional programming language that has direct, built-in support for concurrency — the ability to perform more than one task simultaneously on a machine — and distribution — the ability to perform cooperating tasks on multiple machines. In an era of networked, multicore computers, concurrency and distribution are becoming increasingly important. While Erlang doesn't do anything so special that it can't be done in other languages, it clearly demonstrates the difference between a language that allows you to build infrastructure that supports concurrency and distribution (like Java) and one that has this infrastructure built in.

This assignment gives you experience with both sequential and concurrent programming in Erlang, though it will focus on the concurrent aspects, using the sequential problems as a warm up to familiarize you with Erlang syntax and semantics. Whether you've previously written concurrent programs in other languages, such as Java, it's likely that your experience with Erlang will feel different than anything you've done before.

How do I learn Erlang?

I presume that most of you are new to the Erlang programming language. As with any new programming language, there are two hurdles to cross: the syntax hurdle — how do I say what I want to say? — and the semantic hurdle — what should I want to say in the first place? Erlang encourages (and, in fact, requires) you to think differently about programming than Java does, so the semantic hurdle will actually be a little bit tougher, though you will be able to rely on your previous experience with functional languages like Scheme and Haskell, as Erlang has features familiar from each. Our goal here is not to become immediate experts, but to experience the aspects of Erlang that are new and interesting, relative to the things we've done before; this new experience will make us better software writers than we were, even if we never see Erlang again.

To get you started, I've written up a tutorial that describes the subset of Erlang that we discussed in lecture, along with instructions on using Erlang in the ICS labs and installing Erlang on your computer.

Part 1: Sequential Erlang programming (30 points)

In a module named part1, in a file named part1.erl, write and export the following six functions. You will likely need to write helper functions for some of these, but you should not export any of them.

You are not permitted to use the if or case expressions (which we didn't cover in lecture). Use pattern matching instead. It's important, when learning a new language, not to write only using the idioms you know from other languages; to get the most out of the experience, you'll want use the idioms, like pattern matching, that are appropriate in the new language.

fib(N), which calculates the Nth Fibonacci number. Use tail recursion, rather than the non-tail-recursive form shown in the mymath module in the Erlang tutorial. Your solution should run in O(N) time, given the parameter N. Examples:
- fib(0) → 0
- fib(6) → 8
- fib(40) → 102334155
adjacent_duplicates(L), which takes a list and returns a list containing only the elements that have an adjacent duplicate (i.e., all elements E such that the element following E is identical to E). Do not use any of the functions in the predefined lists module. Examples:
- adjacent_duplicates([1, 1, 2, 2, 3, 3]) → [1, 2, 3]
- adjacent_duplicates([1, 2, 2, 2, 3]) → [2, 2]
- adjacent_duplicates([1, 2, 3, 4]) → []
- adjacent_duplicates([1, 2, 3, 2, 1]) → []
deep_sum(L), which takes a possibly nested list of tuples, each containing two numbers, and returns the sum of all the numbers in the list. Use any of the functions in the predefined lists module that you find useful. Examples:
- deep_sum([{1, 2}, {3, 4}]) → 10
- deep_sum([{1, 2}, [{2, 3}, {3, 4}], {4, 5}]) → 24
- deep_sum([[[{1, 2}, {3, 4}], {3, 4}], {4, 5}, [{5, 6}, {6, 7}]]) → 50
- deep_sum([]) → 0
concatenate_all(L), which takes a list of strings and "flattens" them into one long string. Use any of the functions in the predefined lists module that you find useful. Examples:
- concatenate_all(["Alex", "is", "happy"]) → "Alexishappy"
- concatenate_all(["Time", "", " ", "the ", "Conquerer"]) → "Time the Conquerer"
perimeter(Shape), which takes a tuple describing a shape and returns the perimeter of that shape. The tuple will be in one of the following forms:
- {circle, Radius}
- {rectangle, Width, Height}
- {right_triangle, Width, Height, Hypot}
If the given argument is not a tuple in one of these forms, the function should fail. Examples:
- perimeter({circle, 3}) → 18.8495592153876
- perimeter({rectangle, 5, 7}) → 24
- perimeter({right_triangle, 3, 4, 5}) → 12
permutations(L), which takes a list L and returns a list containing all of the permutations of L. Use any of the functions in the predefined lists module that you find useful. It's fine for your output to be in a different order from ours, so long as all of the permutations are included. Examples:
- permutations([1, 2, 3]) → [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]
- permutations([1]) → [[1]]
- permutations([]) → []

Part 2: Processes and message passing (40 points)

Write a module named dictionary_server, whose job is to implement a process that maintains a dictionary (i.e., a collection of key/value pairs). Keys should remain unique in the dictionary, though the values are not required to be. It should export only the following functions.

start/0. Starts the dictionary server process. If there is already a dictionary server running, this function should fail, but it's not important what error it returns.
stop/0. Stops the dictionary server process, if it's running. If there is no dictionary server process running, this function has no effect.
insert/2. Takes two parameters, a key and a value, and adds the given key and its value to the dictionary. If the given key is already in the dictionary, the new value should replace the old value in the dictionary. This function returns ok after its work is done.
remove/1. Takes one parameter, a key, and removes the given key from the dictionary if it's present. This function returns ok if the key was removed successfully, or notfound if the key was not in the dictionary.
lookup/1. Takes one parameter, a key, and finds the corresponding value in the dictionary. If the key is in the dictionary, this function returns the tuple {ok, Value}, where Value is the value associated with the key. If the key is not in the dictionary, this function returns notfound.
clear/0. Clears all keys and values out of the dictionary.
size/0. Returns an integer specifying the number of keys in the dictionary.

You're permitted to write any additional non-exported functions that you need, though you are required to export all, and only, the functions described above.

You will need to use message passing to make this work, though users of the dictionary server will never need to send or receive messages explicitly and, thus, won't need to know the format of those messages; instead, all message passing will take place within your module. The reason that you'll need to send and receive messages is that the caller of, say, the insert/2 function will be a different process than the dictionary server process; the only way for these processes to exchange information is to pass messages back and forth. You'll need to decide what the messages should look like, what information is important to carry in them, and how they should be handled.

An important consequence of our design is that we're not requiring users of our dictionary server to know its pid. (This is actually a good thing, in the sense that it makes our dictionary server easier to use; the downside of this design approach is that we can't have more than one dictionary server running at a time. Still, there are lots of practical systems where you only want one of a particular server to be running, so this is a surprisingly common practice in Erlang.) This means that you'll need to use process registration to give the process a name, so that this name can be used throughout your module and the pid will never need to be used once registered.

One thing to remember is that you'll need to process messages recursively, one at a time, since Erlang has no looping construct. Remember, too, that it's important that you process messages tail recursively, so that the dictionary server could potentially run forever without running out of stack space. Tail recursion in server processes isn't just a nice performance optimization; it's absolutely critical.

When you're done with this part, you'll have what we called, in lecture, an actor. Actors are a lot like objects in object-oriented languages like Java, except that they run concurrently with other actors (and other Erlang processes), but can only themselves do one thing at a time. This is one model of concurrent programming that's gaining at least some traction in software design circles; it's not a panacea, but it's a nice solution to some kinds of problems.

An extra efficiency challenge (optional)

You are not required to build your dictionary using an efficient data structure — a list is fine — but you're welcome to build something better, like a binary search tree, if you'd like.

If you want to try building a binary search tree, one way to do it is to represent it as a tuple whose first element is the atom bst. The second element can be either the atom empty, if the tree is empty, or a three-element tuple containing the root of the tree (a two-element tuple containing a key and a value), the left subtree, and the right subtree. So an example binary search tree, whose root contains the key 20 and the value "X" and has a right subtree containing only a node with the key 30 and the value "Y" and no children, might look like this:

    {bst, {{20, "X"}, {bst, empty}, {bst, {{30, "Y"}, {bst, empty}, {bst, empty}}}}}.

Additionally, I'd suggest building your binary search tree in a separate module called bst, separating that concern from the concern of building your dictionary server.

Part 3: Using the MapReduce algorithm (30 points)

In lecture, we implemented the MapReduce algorithm in Erlang. MapReduce is an algorithm that asks concurrent or distributed tasks — in Erlang, these tasks are Erlang processes — to perform some portion of a large job, then collects and combines the intermediate results into one final result. MapReduce can be very efficient if the processes can genuinely run concurrently (e.g., there are enough processors to run all of the processes, or the processes are distributed on different machines, a scenario that Erlang supports natively), so long as the time spent sending the messages and combining the results is not prohibitive, relative to the time that the individual processes spend doing their work.

In this part of the assignment, you are required to take the MapReduce implementation from lecture, which you'll find (along with an deeper explanation of the algorithm) in the Code Examples section of the course web site, and apply it to a new problem. Do not make modifications to the MapReduce algorithm that was provided; the only goal here is to learn how to apply MapReduce to a different kind of problem than was demonstrated in the example.

For this part of the assignment, you'll write two modules:

A module named integers_list, which implements a server process similar to the counterprocess server in the MapReduce code example, except that its job is to remember a list of integers provided to it when it starts up, and to provide the ability to filter those integers using a "filter fun." As in the counterprocess module, the integers_list module should export two functions:
- start/1, which spawns a new integers_list server process and returns its pid. The argument is the list of integers that it should remember while it runs. (Note that there is no way to change the list of integers once the process is spawned.)
- stop/1, which takes the pid of an integers_list server process and tells it to stop, waiting for a response to confirm it really did stop.
and otherwise should contain a run function that comprises the server's tail-recursive server loop.
A module named integer_finder, which exports the following function:
- find_integers/2, which takes two parameters:
  1. A list of pids corresponding to integers_list processes.
  2. A "filter fun" that takes an integer and returns either true or false. This could be a fun answering any arbitrary yes/no question about an integer (e.g., return whether the integer is positive, return whether the integer is odd, return whether the integer ends in the digit 9, and so on).
  The find_integers function should use the provided MapReduce implementation to find and return a list of all the integers in all the integers_lists for which the filter fun returns true. The returned integers should be sorted in ascending order, as opposed to just returned in whatever order the integers_lists return them.

The integer_finder module should hide all of the details of how the MapReduce algorithm is being used (e.g., what message should be sent to the integers_list processes, what the response messages will look like, what the initial result should be, etc.).