ICS 32 Winter 2022, Python Background Notes: Tuples and Lists

The need for data structures

A programming language needs control structures, which provide a way to organize the flow of control within a program — determining which statements happen in which sequence. So far, we've seen a couple of those in Python: the if statement and two kinds of loops, along with the ability to both write and call functions. These tools help to us to keep our code organized in a way that allows us to solve problems more easily, and also to read and understand those solutions more easily afterward.

Similarly, we also have a need to organize data. You can't get very far writing programs without data structures, which allow us to store and organize collections of many data items, instead of just a single one. We've seen a couple of types of objects so far that could be classified as data structures:

A string is a collection of characters. Some strings have no characters, some have a few, and some have many. Because different strings have different numbers of characters, there's a good chance that a string really does actually have to store the characters (i.e., longer strings require more memory than smaller ones).
A range acts as a collection of integers, arranged in a particular order. Unlike strings, ranges can be calculated on the fly, because their values follow a definitive pattern, such as "the numbers beginning with 0, ending with 10, in a step size of 2." Nonetheless, we can treat it like a data structure, by indexing into it, by iterating through it with a for loop, and so on.

But what if we need to arrange a kind of data that doesn't fit one of these two patterns? Not everything can reasonably be stored and manipulated as a collection of characters. And ranges are actually pretty limited — there's no range that contains the numbers 1, 3, 9, and 27, or that contains numbers that are in a seemingly-random order, like 4, 2, 6, 5, and 9. Once created, strings and ranges can't be modified; you can only build new ones that have different characteristics. So there will certainly be problems whose solutions will require that we organize data in ways that neither strings nor ranges can do. For that reason, we'll need more tools; luckily, Python has additional tools built in.

Tuples

A tuple is a data structure that contains an immutable sequence of values, whose types can be any mixture you'd like. When you would use a tuple is when you have a sequence where you know before the program runs exactly how many values you need, and before the program runs you can understand the meaning of each of them. Tuples let you bring together related values and store them in a single object, but, conceptually, provide little else. Still, sometimes that's exactly the problem you have, so it's quite useful to have a tool that's focused on solving it.

One example is if you were storing information about points in two-dimensional space (i.e., Cartesian coordinates, as in algebra). Each point in two-dimensional space is described by an x-coordinate and a y-coordinate. No matter what point you're talking about, both coordinates always need to be there, and they always have the same meaning; by convention, we generally write the x-coordinate first, then the y-coordinate, so we're always able to read them back later and make sense out of them.

If we wanted to store a point's coordinates, we could certainly use two variables to do it, but there would be design value in bringing them together. A point, after all, is a kind of object that we might like to be able to store in a single variable, pass as a single argument to a function, or return as the result of a function. A tuple provides one way to do all of these things.

Creating a tuple is as simple as separating multiple expressions with commas, with or without parentheses surrounding them

>>> 1, 2
(1, 2)
>>> ('Boo', 11, 3.5)
('Boo', 11, 3.5)

The parentheses are generally optional; it's the commas that make these into tuples. (For the most part, I tend to like to include the parentheses, because I find that it sets tuples apart visually from other things.) However, you do sometimes need the parentheses for the purposes of disambiguating syntax, because commas have other meanings in Python, too. For example, commas are used to separate the arguments you pass to a function when you call it. This leads to the following interesting (and perhaps surprising) result, when you try to determine the type of 1, 2.

>>> type(1, 2)
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    type(1, 2)
TypeError: type() takes 1 or 3 arguments

The error message indicates that the built-in type() function can only be called with either one argument or three arguments. (So far, we've only seen it used with one argument; the three-argument version is wildly different and is a very long story for another day.) If what we want to do is determine the type of a single argument that is a tuple, we'll need to surround that tuple with parentheses, so Python knows that we don't want the comma to mean that we're passing two arguments to the function.

>>> type((1, 2))
<class 'tuple'>

Tuples have a length that is determined at the time you create them, by virtue of the number of expressions you combined together with commas; once established, that length can't change. The tuple ('a', 'b', 'c') has a length of 3 and it always will. As with other kinds of objects that have lengths, the built-in len function can tell you the length of a tuple.

>>> len((5, 6))
2

Additionally, the values within a tuple can't change, either; once a collection of objects is combined into a tuple, that tuple will only ever contain those objects.

Tuples can be indexed, just like ranges and strings can. The elements of a tuple each have an index; they're numbered consecutively starting at 0.

>>> point = (5, 6)
>>> point[0]
5
>>> point[1]
6

Of course, because the values inside of tuples can't change, you won't be able to assign to these indices.

>>> point[1] = 3
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    point[1] = 3
TypeError: 'tuple' object does not support item assignment

Indexing into a tuple can be useful, but you may find that you'll be much better off introducing names for the elements, instead of just using indices everywhere. Remembering things like "Index 0 means the x-coordinate, while index 1 means the y-coordinate" is just one more cognitive burden to carry; when programs get large, hundreds of little details like that start to add up, so we should be motivated to find a way for the meaning of our programs to be self-evident.

Sequence assignment

Python's assignment statement is more powerful than we've seen thus far. Not only can it be used to assign a value into a single variable; it can additionally be used to to assign values into multiple variables. Sequence assignment is the name given to this technique.

Sequence assignment means to take a value that can be treated as a sequence, then "unpack" it and assign its elements to multiple variables. This can be done with any kind of object in Python that can be treated as a sequence — strings, for example, also qualify — but the trick is that you have to have the correct number of variables on the left-hand side of the assignment. For example, if there are two elements in the sequence on the right, you'd need exactly two variables on the left. So, sequence assignment tends to be more useful with tuples than with other kinds of sequences, because tuples generally have a length that's encoded directly into your program, and that size can't change once it's been established.

>>> point = (5, 6)
>>> x, y = point
>>> x
5
>>> y
6

Again, if the number of variables doesn't match the number of elements in the sequence, you'll see an error message.

>>> x, y, z = point
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    x, y, z = point
ValueError: not enough values to unpack (expected 3, got 2)

You can also create a tuple on the right-hand side of an assignment, then use it to sequence-assign multiple variables on the left-hand side, which leads to some interesting techniques, such as the following technique for swapping the values of two variables.

>>> a = 12
>>> b = 9
>>> a, b = b, a
>>> a
9
>>> b
12

As mentioned above, you can use sequence assignment with objects of sequence types other than tuples, such as strings or ranges. You're mostly going to want to avoid this technique, mainly because it's a dangerous one (e.g., you don't necessarily know how long a string will be if it originated from user input); if the number of variables on the left-hand side of the sequence assignment is different from the length of the sequence, you'll see an error message.

>>> name = 'Boo'
>>> a, b, c = name
>>> a
'B'
>>> b
'o'
>>> c
'o'
>>> nums = range(4)
>>> a1, a2, a3, a4 = nums
>>> a1
0
>>> a2
1
>>> a3
2
>>> a4
3

Lists

A list in Python is a sequence of objects that you expect may change over its lifetime. New elements can be added, existing elements can be removed or rearranged, and so on. Unlike tuples, you won't have to know the length of a list before the program runs, though you also won't be able to count on knowing what its length is.

Syntactically, we write a literal list in Python using a sequence of expressions separated by commas and surrounded by square brackets.

>>> x = [1, 2, 3, 4]
>>> x
[1, 2, 3, 4]
>>> type(x)
<class 'list'>

Like many other objects that contain a collection of other objects within them, lists have a length, which is defined as the number of objects stored in the list; the built-in function len can tell us that length.

>>> len(x)
4

Of course, lists can be empty, too.

>>> nothing = []
>>> len(nothing)
0

The indexing operator that we've seen previously on ranges and tuples can also be used to retrieve the individual values from a list. As with ranges and tuples, the indices are consecutive and begin at 0 (i.e., the index of a list's first element is 0, the index of its second element is 1, and so on).

>>> x[2]
3
>>> x[0]
1

Unlike ranges and tuples, which are immutable — which means that they can't be changed once they're built — lists can be modified during their lifetime. One way to do that is to assign to one of its indices.

>>> x
[1, 2, 3, 4]
>>> x[1] = 9
>>> x
[1, 9, 3, 4]

Note, too, that indices are also numbered negatively, with -1 being the index of the last element, -2 being the index of the second-to-last element, and so on.

>>> x[-1] = 12
>>> x
[1, 9, 3, 12]
>>> x[-3]
9

(One more thing worth noting is that this same negative-indexing trick works just as well on ranges and tuples. Try it.)

Adding a new element to the end of a list is most easily done by calling its append method. Whatever argument you pass to append becomes the last element in the list.

>>> x.append(15)
>>> x
[1, 9, 3, 4, 15]
>>> x.append([6, 7, 8])
>>> x
[1, 9, 3, 4, 15, [6, 7, 8]]

That last example above is an interesting one, as well, as it demonstrates two additional properties of lists:

Different elements of the same list can have different types.
Appending a list b to a list a means to make b be the last element of a.

Note that the last of these can be worked around by using different tools. There are two ways to extend a list instead of appending to it.

>>> y = [1, 2, 3]
>>> y.extend([4, 5, 6])
>>> y
[1, 2, 3, 4, 5, 6]
>>> y += [7, 8, 9]
>>> y
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Elements can be removed from a list using the del operator in combination with the indexing syntax.

>>> del y[4]
>>> y
[1, 2, 3, 4, 6, 7, 8, 9]

The built-in function list can take anything that can be treated as a sequence and build a new list containing the elements of that sequence.

>>> list(range(0, 10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list((1, 2, 3))
[1, 2, 3]
>>> list([1, 2, 3, 4])
[1, 2, 3, 4]
>>> list('Boo')
['B', 'o', 'o']

Slicing

Lists can also be sliced, which is to say that you can obtain a new list that contains a subsequence of what a list stores, while leaving the original list intact. A slice of a list is described similarly to how we described ranges previously: by specifying a start, a stop, and a step, with these values indicating which indices should be included in the slice. The start, stop, and step values are optional, falling back to defaults when they aren't specified.

Let's start with the following list.

>>> x = list(range(1, 21, 2))
>>> x
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

The simplest example of a slice is one that specifies both a start and a stop index. In that case, the slice contains the indices beginning at the start (and including it), up to but not including the stop index. So, for example, x[3:7] would include the indices 3, 4, 5, and 6, but not 7.

>>> x[3:7]
[7, 9, 11, 13]

Negative indices can be used in either or both the start and stop positions.

>>> x[4:-2]
[9, 11, 13, 15]
>>> x[-5:8]
[11, 13, 15]
>>> x[-5:-1]
[11, 13, 15, 17]

If the start index isn't specified, the default is 0; in other words, we always start a slice at the beginning of the list, if we don't say otherwise.

>>> x[:5]    # the first five elements of x
[1, 3, 5, 7, 9]

If the stop index isn't specified, the default is the end of the list (inclusive); in other words, we always continue through the end of the list unless we say otherwise.

>>> x[4:]    # all but the first four elements of x
[9, 11, 13, 15, 17, 19]

If both the start and stop index are left as defaults, we've got a convenient (and well-known) syntax for copying a list.

>>> x[:]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In every case we've seen so far, the step has been defaulted to 1, because we haven't said what it is. That means we work our way forward through the list, without skipping any indices in between the start and the stop. Adding an additional colon and a step value allows us to do something different, if we choose. (Note in the examples below that the stop index is still exclusive; we don't include it in the result.)

>>> x[2:9:3]
[5, 11, 17]
>>> x[2:6:2]
[5, 9]

A negative step can be indicated, as well, meaning that we want to build the slice by working our way backward through the list instead.

>>> x[6:2:-1]
[13, 11, 9, 7]
>>> x[8:1:-2]
[17, 13, 9, 5]
>>> x[::-1]     # make a reversed copy of a list
[19, 17, 15, 13, 11, 9, 7, 5, 3, 1]

The step value can't be zero, though. (If it could, you'd never get to the stop index; the slice would go on forever.)

>>> x[2:6:0]
Traceback (most recent call last):
  File "<pyshell#37>", line 1, in <module>
    x[2:6:0]
ValueError: slice step cannot be zero

It is also sometimes the case that you'll get back an empty list, when the range of indices you're describing is empty. For example, x[3:2] means "Give me the elements starting at index 3 and continuing forward until you get at least as far as index 2, then stop." There are no indices that meet that criteria, so the result is an empty list.

>>> x[3:2]
[]

Choosing between tuples and lists

Lists may seem like they provide a superset of the functionality of tuples, which might make you wonder why you would want tuples if they're less flexible. The thing to realize is that lack of flexibility is actually a good thing. When you have a problem for which tuples are a good fit, such as our x- and y-coordinate example above, anything that lists can do that tuples can't also do is something you wouldn't want.

You would never want to add a third coordinate, nor would you want to remove one of them; you always want two of them.
That tuples are immutable is a plus. Because there are only two coordinates that make up each point, building a new tuple when you want to represent a new point is inexpensive.

In a case like this, lists only provide the flexibility to do something wrong, which is to say something that is destined to introduce bugs into your program. A necessary mentality for writing large programs is to replace the intention of being careful with techniques that make mistakes impossible. Oftentimes, the right tool for a job is the one that does only what you need and nothing more.

On the other hand, if you need to store a collection of values that will change as a program runs — say, a collection of all of the users currently logged into an application, updated as users log in and out — you'll want something like a list. Suddenly, the mutability is vital; the core of the problem is that the collection will change over time.

Generally, as we learn about new data structures, we'll be thinking about the shape of the problem they solve. When we recognize that we have a problem of a particular shape, we'll choose the data structure that best fits that shape.