ICS 32 Winter 2022, Python Background Notes: Namedtuples

The limitations of tuples

Tuples have a fair number of uses in Python, because they provide the ability to bring together multiple objects within a single one, in cases where you know at the time you're writing your program how many objects there will be and what their various types and roles are. A simple example would be an object representing a point in three-dimensional space, which you might store in a tuple, with the first element representing the x-coordinate, the second element representing the y-coordinate, and the third element representing the z-coordinate.

>>> point = (2, 5, -1)

All three coordinates making up a point are now stored in the variable point. The point could be passed as a single argument to a function, assigned to another single variable, and so on.

On the other hand, actually using these coordinates requires one of two techniques, neither of which is quite perfect. One technique is to index into the tuple, but this requires us to remember that the x-coordinate is stored in index 0, the y-coordinate is stored in index 1, and the z-coordinate is stored in index 2. And not only do we have to remember that when we're first writing our code, but we have to keep summoning that detail into our mind later every time we read that code in the future. For example, we could calculate the distance from the origin to the point by taking the square root of the squares of its coordinates.

>>> distance = math.sqrt(point[0] * point[0] + point[1] * point[1] + point[2] * point[2])

But this is kind of an unfortunate bit of syntax. It doesn't look, at a glance, like what it is. The meaning is lost in a sea of point[0], point[2], and so on, obscured by what would be better left as the irrelevant details of how the values are arranged in memory that are, instead, central to our ability to understand the program.

An alternative is to use sequence assignment to name the coordinates first, by storing them in separate variables; we could then use those separate variables in our calculation instead.

>>> x, y, z = point
>>> distance = math.sqrt(x * x + y * y + z * z)

This is better, but becomes unfortunate if we have to do this in many different places. For example, if we wanted to write ten different functions that accepted points as parameters, we might start all ten of those functions with the same statement that sequence-assigns the point's elements into the variables x, y, and z. That would be unfortunate; we should want to avoid boring, repetitive work when we write programs, because this can be a source of bugs. (The less we have to think about the code we're writing, the more likely our minds will wander and we'll find ourselves making careless mistakes.) This can also make it difficult for us to change our minds about these details later, because we'd then have to change them in many places instead of just one.

What would be useful is a tuple that's smarter about naming, one where its elements actually have names intrinsically — and whose names are actually known by the tuple. Then, if we ever wanted an element with a particular name, we could just ask for it, and the tuple would know which one to give us back, so that we could ask for an element based on what it is, as opposed to asking based on where it is.

As it turns out, tuples can't do this in Python, but there is a variant called a namedtuple that can.

What is a namedtuple?

A namedtuple is a tuple in which each element is explicitly given a name. Like tuples, a namedtuple has a particular number of elements stored within it at the time it's created. Also like tuples, the number of elements never changes after that; once it's created, it is what it is. The only real difference is that each element has a name and, because of that, the elements can be referred to by their names instead of their indices.

Creating a namedtuple is a two-step process:

First, we have to create a "blueprint" for it, a new type that defines the set of fields that our new type of namedtuple will have. (The fields of a namedtuple are analogous to the elements of a tuple, except that the fields have names.)
Then, we can create an object of our new type from that blueprint. Every object of our new type will have the same collection of fields that have the same names; the only difference between them will be their values.

Defining a new namedtuple type

Technically, namedtuples are not part of the Python language; they're actually part of the Python standard library. So if we want to define a new namedtuple type, we'll first need to import something from the Python standard library that lets us create one. What we need is a function called namedtuple(), which we'll find in a module called collections. The meaning of the name namedtuple is self-evident enough that it's probably no better to say collections.namedtuple, so we can use from..import to import it.

>>> from collections import namedtuple

Having imported the namedtuple() function, we can call it by passing it two arguments: the name of our new type and a list containing the names of its fields. We'll also need to store that type in a variable, so we can refer to it later. By convention, we'll always want to store it in a variable whose name is the same as the name of our new type; things will get confusing if we do something different.

>>> Point = namedtuple('Point', ['x', 'y', 'z'])

Note, too, that the convention for naming new types in Python is to capitalize its first letter. Furthermore, if the type is described by a name that contains multiple words, we also run those words together without underscores and capitalize the first letter of each of them, so we would choose names like BasketballPlayer or AutomaticPurchaseStrategy, rather than basketball_player or automatic_purchase_strategy.

So, what did we get when we did this? What is stored in the variable Point? Let's take a look.

>>> Point
<class '__main__.Point'>
>>> type(Point)
<class 'type'>

Point is a type, separate from the built-in ones like str, int, or list. It's a wholly separate, new type that represents points in three-dimensional space. From that type, we can create as many objects as we'd like.

Creating an object of a namedtuple type

We've seen previously that you can create an object of a type by using the name of the type like a function — following the name of the type with parentheses and, optionally, passing it arguments. This is sometimes called construction; we're building an object of that type. We've seen multiple examples of that technique already.

>>> int('35')
35
>>> str(5.5)
'5.5'
>>> import socket
>>> s = socket.socket()

If Point is a type, it stands to reason that we should be able to do the same thing and, indeed, we can. The arguments we would pass to its constructor are the values of the fields, which we can do in one of two ways:

By passing the correct number of arguments, in which case the arguments fill in the fields' values in the order the fields were defined when we created the namedtuple type.
By passing keyword arguments, we can specify the names of the fields directly in the call to our constructor, which can read more cleanly.

The following three assignments involve the creation of equivalent points, whose x-coordinates are 3, whose y-coordinates are 5, and whose z-coordinates are 7.

>>> p1 = Point(3, 5, 7)
>>> p2 = Point(x = 3, y = 5, z = 7)
>>> p3 = Point(z = 7, x = 3, y = 5)

Regardless of whether you include names on the arguments, all of the fields must be given a value when you create an object of a namedtuple type; failing to do so leads to a TypeError.

>>> p4 = Point(5, 8)
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    p4 = Point(5, 8)
TypeError: __new__() missing 1 required positional argument: 'z'
>>> p4 = Point(x = 5, y = 8)
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    p4 = Point(x = 5, y = 8)
TypeError: __new__() missing 1 required positional argument: 'z'

Similarly, passing too many arguments isn't allowed, either, though the error messages are a bit more perplexing.

>>> p4 = Point(1, 2, 3, 4)
Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    p4 = Point(1, 2, 3, 4)
TypeError: __new__() takes 4 positional arguments but 5 were given
>>> p4 = Point(1, x = 2, y = 3, z = 4)
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    p4 = Point(1, x = 2, y = 3, z = 4)
TypeError: __new__() got multiple values for argument 'x'
>>> p4 = Point(x = 1, y = 2, z = 3, 4)
SyntaxError: positional argument follows keyword argument

One quick note on that last error message: This was a syntax error, rather than an exception, which means that the Point function never even got called. It's never permitted in Python to pass keyword arguments (the ones where you specify their names) before any that don't have their names specified, so Python won't even attempt to run a script or module that has this kind of mistake in it.

Accessing the fields of a namedtuple

Once you've created an object of a namedtuple type, accessing its fields is simply a matter of following the object with a "dot", then following that with the name of one of its fields.

>>> p1.x
3
>>> p2.y
5

This leads to a surprisingly clear way to write a function to determine the distance from the origin to a given Point.

def distance_from_origin(p: Point) -> float:
    return math.sqrt(p.x * p.x + p.y * p.y + p.z * p.z)

Replacing fields in a namedtuple

It's important to realize that namedtuples are immutable, just like tuples are; once they're constructed, the values they have are the values they'll always have. However, it is possible to construct a new namedtuple with some of the fields having their values remain and others being replaced, by calling the _replace method on an existing namedtuple.

>>> p4 = p1._replace(y = 9, x = 14)
>>> p4.x
14
>>> p4.y
9
>>> p4.z
7
>>> p1.x
3
>>> p1.y
5
>>> p1.z
7

There are a couple of important things to note in the example above.

The name _replace is actually somewhat misleading, because nothing in the original namedtuple is actually replaced. What p1._replace(...) does is create a new namedtuple that is similar to p1, but it doesn't modify p1 at all. (Namedtuples are immutable; they can't be modified in this way.)
Another reason why the name _replace is misleading is because it begins with an underscore. We've seen that previously as a way of noting a name that's considered private, though that's not what's going on here; the underscore here is used mainly to differentiate the name of the _replace method from the names of any of the fields in a namedtuple — as a way to accommodate the possibility of having a field named replace.

Finding out the names of a namedtuple's fields

If you've programmed in statically-typed languages like Java or C++ previously, the idea of a namedtuple might seem quite familiar. There are at least some similarities between namedtuples in Python and Java's classes (and especially to a relatively newly-added feature in Java called records), as well as between namedtuples in Python and structs in C++. However, there is one important difference. In both Java and C++, every variable has an explicitly-specified type, so it's possible before a program runs to decide, given an expression like p1.x, whether the fields being accessed are ones that actually exist.

In Python, on the other hand, types aren't checked until a program runs, which brings up an important question. How can this possibly work, if we don't know ahead of time what the type of p will be?

def distance_from_origin(p):
    return math.sqrt(p.x * p.x + p.y * p.y + p.z * p.z)

What happens when expressions like p.x or p.z are evaluated when the function runs? The answer is that namedtuples know the names of their own fields, so the expression p.x really boils down to this:

Check that you have a field named x.
If so, return its value.
If not, raise an exception.

In the near future, we'll talk more about the mechanisms that make that possible in more detail, but, for now, it's worth knowing that you can ask a namedtuple's type for the names of its fields.

>>> Point._fields
('x', 'y', 'z')

And, similarly, you can ask an individual namedtuple object for the same thing.

>>> p1._fields
('x', 'y', 'z')

Is a namedtuple a tuple?

As we've seen, namedtuples solve similar kinds of problems to those solved by tuples, with the main difference being that namedtuple allows us to introduce names for the values we store within them. They're otherwise pretty similar and even their names — namedtuple and tuple — suggest that they might be more similar than meets the eye.

Can we treat a namedtuple like it's a tuple? Let's take a look. Tuples support indexing, for example. Do namedtuples support it?

>>> p1[0]
3
>>> p1[1]
5
>>> p1[2]
7

Tuples are also sequences, which means that we can do things like create lists out of them, iterate over them with for loops, or sequence-assign them. Can we do that with namedtuples, too?

>>> list(p1)
[3, 5, 7]
>>> for x in p1:
        print(x)

3
5
7
>>> a, b, c = p1
>>> a
3
>>> b
5
>>> c
7

The answer is a resounding "Yes!" So, what does this tell us about these two types? Is p1 really a tuple? The answer is "Yes and no."

>>> type(p1)
<class '__main__.Point'>
>>> type(p1) == tuple
False
>>> issubclass(type(p1), tuple)
True

As it turns out, when we create a type using the namedtuple function — as we've done by creating a Point type — the type we get back is a distinct, new type. However, that new type is related to tuple in a special way: It's a subclass of tuple. In other words, a Point can do everything that a tuple can do, but it might do some of those things differently, and it might also be able to do additional things that a tuple can't.

The topic of subclasses is one we'll return to in ICS 33, but it's worth understanding now that two different types can have a well-defined relationship with each other. On the other hand, as we'll see soon, types can be related in more of an ad hoc way; not all relationships between types are explicitly defined in our programs (and that's got both upsides and downsides, ultimately).