ICS 32 Winter 2022, Python Background Notes: Sets and Dictionaries

Why tuples and lists aren't enough

Thus far, we've seen two different data structures that are built into Python: tuples and lists. At a quick glance, they seem to be essentially the same thing, because they each store a collection of elements where the order of those elements is considered relevant. The key difference, though, lies in how flexible their structure is:

Tuples are intended to be used in situations where you know their structure at the time you write the program. For example, you might want to store information about a point in three-dimensional space as its three coordinates: x, y, and z. A tuple would be a handy way to do that.
Lists are intended to be used in situations where you need the structure to be more flexible, and to adapt as the program runs. You might use this to store the information you load from a file when you don't know ahead of time how large the file will be, or a collection of all of the users currently logged into your application, or what have you; the key is that we're talking about a collection that can (and usually does) change while the program runs.

Why tuples and lists are useful for different things has mostly to do with a difference in their mutability. Tuples are set in stone once you create them; they are what they are. Lists can be mutated during their lifetime.

So, ultimately, there's value in having both tuples and lists available to us, because they each fit a different kind of problem. But they aren't enough, because not all problems require storing an ordered sequence of elements. For differently-shaped problems, you'd need different structures. Fortunately, Python provides some additional choices, fitting additional commonly-occurring problems very nicely.

Sets

You've likely been introduced to the mathematical concept of sets at some point. In mathematics, a set is a collection of zero or more unique values. What's important about a set is what's in it and what's not in it; there is no meaningful notion of ordering involved. For example, the set {1, 2, 3} is considered to be equivalent to the set {3, 1, 2}, because they both contain the same collection of values.

As it turns out, sets aren't just a mathematical curiosity. Sometimes, you'll want your program to manage a collection of objects with the same characteristics:

What you're most interested in knowing is whether or not an object is in the collection.
You don't want to allow duplication of objects within the collection. If, for example, it's a collection of integers, you don't want the same integer to appear twice.

When you have a problem that has those characteristics, a set provides a nice way to solve it. Obviously, not all problems have those characteristics, but some certainly do. If, for example, you were implementing a program with a graphical user interface, which contained a collection of buttons that could each be toggled on or off, and you wanted to know which buttons had been toggled on, you might store them in a set. Why? Because this scenario fits those two characteristics:

You're most interested in knowing whether or not one of the buttons has been toggled on.
The same button can't be toggled on more than once. It either is or it isn't.

How to create and use sets

Sets are actually pretty simple to create and use in Python, though there is a bit of additional syntax we'll need to learn. We've seen previously that lists are denoted, literally, by being surrounded by the bracket characters '[' and ']', with their elements separated inside by commas. Sets are written similarly, except they're surrounded instead by the curly-brace characters '{' and '}'.

>>> s = {'a', 'c', 'e'}
>>> type(s)
<class 'set'>
>>> s
{'e', 'a', 'c'}

One thing to note, right away, is that s was displayed to us differently from the way we wrote it. When we wrote it, we listed the strings alphabetically; when it was displayed to us, it was displayed in a different order. There are a couple of things to know about sets:

Because the order of elements within sets is considered irrelevant, Python has no obligation to maintain any information about what order we used when we originally created them.
The order that's used is actually implementation-dependent, which is a fancy way of saying that you simply don't know what it will be; the rules of ordering elements in a set can change from one implementation of Python to another — or even from one version of the same implementation to another. (The reason they would maintain a different ordering than the order in which you stored the elements is because it might allow their implementation to be dramatically faster. We'll explore why, in a lot of detail, in ICS 46.)

Just because sets aren't stored in any kind of order doesn't make them useless; it just means that you can't count on anything having to do with that ordering. But you can nonetheless ask the key questions you'd like to ask. For example, checking whether an element is in a set can be done using the in operator.

>>> 'e' in s
True
>>> 'f' in s
False

(It should be noted that you can use this same in operator on lists, as well, though you might expect it to run a fair bit faster on a large set than on a large list. In ICS 46, we'll talk in more detail about why that is, too.)

Note, too, that the syntax for creating an empty set isn't what you might think. The syntax {} is legal in Python, but it builds something else (which we'll see a little later) called a dictionary.

>>> empty_set = {}
>>> type(empty_set)
<class 'dict'>

All is not lost, however; you can create an empty set, by using the set constructor.

>>> empty_set = set()
>>> type(empty_set)
<class 'set'>

Sets can be constructed from sequences, as well, which means all of the following examples are ways to construct sets. Notice, in every case, that duplicates are (intentionally) dropped, because sets cannot contain duplicate elements.

>>> set([1, 2, 1, 1, 2, 1, 2, 3])
{1, 2, 3}
>>> set((1, 2, 3, 1, 2, 3))
{1, 2, 3}
>>> set(range(10))
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Sets can also be combined with other sets in the ways you've seen in mathematics before: There is a notion of union, intersection, set difference, and symmetric difference. The only tricky part is recognizing the operators.

Union is denoted by |, which is actually a syntax in some programming languages for the Boolean operator or. This makes some sense, actually: The union of the sets s1 and s2 contains elements that are either in s1 or s2 (or both).
Intersection is denoted by &, a syntax that also sometimes means Boolean and. The intersection of the sets s1 and s2 contains elements that are in both s1 and s2.
Set difference is denoted by -, which is somewhat akin to the difference between two numbers. In the case of two sets, the result is everything in s1 that isn't also in s2.
Symmetric difference is denoted by ^, a syntax that sometimes means the Boolean notion of exclusive or (i.e., one or the other, but not both). The symmetric differnce of the sets s1 and s2 contains elements that are in one set or the other, but do not appear in both.

Putting all of these together leads to the following examples.

>>> s1 = {1, 3, 5, 7, 9, 11}
>>> s2 = {2, 5, 8, 11, 14}
>>> s1 | s2
{1, 2, 3, 5, 7, 8, 9, 11, 14}
>>> s1 & s2
{11, 5}
>>> s1 - s2
{1, 3, 9, 7}
>>> s1 ^ s2
{1, 2, 3, 7, 8, 9, 14}

(Note, again, that the order in which you see the elements in each resulting set may vary if you run these same expressions in the Python shell. There are essentially no guarantees about the visible ordering of elements in a set when it gets printed in the shell.)

Like lists, sets are mutable, though the only operators we've seen so far have the characteristic that they build new sets. But not all of a set's operators do. For example, there are "assigning" versions of the operators above, which modify the set on the left, similar to how the += opearator mutates a list (but the + operator does not).

>>> s = set()
>>> s |= {3, 4}
>>> s
{3, 4}
>>> s |= {5}
>>> s
{3, 4, 5}

Still, that's a pretty clunky syntax for adding an element to an existing set, so there is also an add method that does that job more readably, as well as an update method, which adds a sequence of elements to a set.

>>> s.add(6)
>>> s
{3, 4, 5, 6}
>>> s.update([7, 8, 9])
>>> s
{3, 4, 5, 6, 7, 8, 9}

Whether or not other operations that you've seen on lists are supported on sets is mainly a matter of whether they're sensible when used on a set. Indexing, for example, is not supported.

>>> s = {1, 2, 3}
>>> s[0]
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    s[0]
TypeError: 'set' object does not support indexing

This is a sensible restriction, because indexing this way implies that there's a definitive order (i.e., one of them is definitively the first one in the set).

On the other hand, you can iterate over a set using a for loop.

>>> for x in s:
        print(x)

1
2
3

Though it should be noted that you won't be guaranteed that the iteration will happen in any particular order. Note, too, that this means anything else that uses iteration will work as you would expect, with the same caveat about ordering.

>>> list(s)
[1, 2, 3]

Dictionaries

Sets can be handy, but they are somewhat limited in their usefulness. There are problems that have set's "shape," but, in practice, there are a lot of data-storage problems that don't. However, we can extend the idea of a set a little bit and end up with something that has even broader usefulness.

In a dictionary, we store a collection of unique keys, each one associated with a value. The uniqueness of the keys in a dictionary means that the collection of keys, conceptually, forms a set. (This also means that the keys are not ordered, similar to how they aren't ordered in a set.) But the important distiction is that you can know something about each key besides just the key; by being able to associate a value with a key, you have a tool that you can use in different circumstances for which a set would be a poor fit.

How to create and use dictionaries

Creating a dictionary is generally done using the built-in function dict() and passing it no arguments. (Alternatively, the syntax {} is technically legal, though a little bit confusing, because you have to remember that the result is an empty dictionary, rather than an empty set. Clearer is usually better than shorter.)

>>> d = dict()
>>> type(d)
<class 'dict'>
>>> d
{}

Dictionaries are mutable, the way that lists and sets are; the expectation is that you should be able to use them to manipulate a collection of data that changes while your program runs. Mutating them is a matter of changing the value that is associated with some key, which you can do using an operation that looks very similar to indexing a list. The difference is that the index you pass it isn't a numeric index, necessarily; it's the key whose value you want to update.

>>> d['b'] = 3
>>> d['b']
3
>>> d['b'] += 1
>>> d['b']
4

There is wide latitude in which is allowed as a key in a dictionary; more or less anything that is immutable will do. (The reason why immutability of keys is such a big deal is that you don't want changes outside of the dictionary to suddenly have an impact of the set of keys within that dictionary.)

You can have as many different keys in the dictionary as you want, but each key can only appear once; any subsequent attempt to set its value replaces the value that's already there.

>>> d['b'] = 9
>>> d['q'] = -5
>>> d
{'b': 9, 'q': -5}

You can see, also, that a dictionary's representation is similar to the representation of a set, except that each key is followed by a colon and a value. Like sets, the order of the keys is considered irrelevant, and there's no guarantees about what order you'll see them in. Note, too, that this is also legal syntax for writing dictionary literals.

>>> d2 = {'a': 1, 'b': 2, 'c': 3}
>>> d2
{'a': 1, 'b': 2, 'c': 3}

Keys can be removed from dictionaries similar to how you remove values from other collections: by using the del operator.

>>> del d2['b']
>>> d2
{'a': 1, 'c': 3}

You can check for the presence of a key similar to the way you check for the presence of an element in a set, by using the in operator.

>>> 'c' in d2
True
>>> 'b' in d2
False

Dictionaries can also be iterated, though you do need to be aware of what you get when you do so. When you iterate a dictionary, all you get back are the keys.

>>> for x in d2:
        print(x)

a
c

However, you can also ask a dictionary for all of its keys, all of its values, or all of its items.

>>> for x in d2.keys():
        print(x)

a
c
>>> for x in d2.values():
        print(x)

1
3
>>> for x in d2.items():
        print(x)

('a', 1)
('c', 3)

Notice that the latter of these, the items() method, returns both each key and its associated value. The natural way to return two values is a tuple, so that's what Python returns.

Example: Calculating the occurrences of characters in a string

Suppose that we wanted to write a function that takes a string as a parameter and can tell you how many time each different, unique character occurs in that string. So, for example, if you called this function with the argument 'BOO BEAR', then you'd expect to see something that told you the following:

The character B appears twice
The character O appears twice
The character E appears once
The character A appears once
The character R appears once

You might call this kind of information a mapping, similar to a mathematical function. The order of the characters in our answer is irrelevant — except maybe from a standpoint of how we might display it, but that's a user interface problem, rather than a problem that this function should solve. All we want to know is what characters occur and, for each of those, how many times they occur.

Dictionaries are a perfect way to communicate this kind of information. The keys would be the characters; the values would be the number of times each character appears. There would never be more than one count for a particular character, so the fact that keys are unique is a good thing, rather than a limitation. We would only bother to store information for the characters that appeared at all; if a character doesn't appear, it will not be part of our output.

The following is one way you could write that function.

def count_chars(s: str) -> dict:
    # Start with an empty dictionary
    counts = dict()

    # Iterate through the characters of the string.  For characters that
    # we've seen before, increment the count we already have.  For
    # newly-seen characters, add them to the dictionary with a count of 1.
    for c in s:
        if c in counts:
            counts[c] += 1
        else:
            counts[c] = 1

    return counts