ICS 32A Fall 2023, Notes and Examples: Functions

Functions and abstraction

We've seen already that Python includes a collection of functions built into the language, such as type(), len(), and constructors such as int() and str(). The ones we've seen so far have been pretty fundamental — in the sense that they mainly don't do jobs that we could have done in some other way in Python, but instead provide baseline functionality that we need in order to do bigger things. Fundamental functions like these are a good place to start, but we can achieve more with functions than just fundamental things; certainly, not all functions are fundamental.

The key benefit a function provides in a design is abstraction. Abstraction is hiding complexity beneath a veil of simplicity. It takes something that might be quite complex and makes it easy to use, so that you don't have to know every detail of how it works in order to use it. To use an abstraction, you need to understand how to interact with it — what you have to do, and what result or effect you expect to get back in return — but not how it works. For example, imagine you do something like this in the Python shell.


>>> name = 'Alex'
>>> len(name)
    4

When you do this, it's not important that you understand how the len() function is able to determine the string's length. Maybe there's a while loop in the len() function that counts characters. Maybe the length is just an integer stored in another variable, hidden behind the scenes somewhere. Maybe the operating system or the underlying hardware has some mechanism for tracking it. Whatever the implementation details are, all you need to know is that some objects in Python have a length, and if you call the len() function, you obtain that length. That's it.

So, if functions are such a powerful device for hiding complexity, it stands to reason that we should want to write our own functions, too. If Python can provide us built-in functions that make certain commonly-done tasks easier, that's nice in itself. But if we can write our own functions, we can take the complexity that arises in our own designs — things that aren't so common that everyone who writes Python programs would need them, but that are instead specific to the program we're writing — and neatly hide it away. In small, simple programs, this probably isn't that big of a deal. But most software isn't just written once and thrown away, but is instead built and maintained over a long of period of time, often by many different people. So, we need a way to allow someone to work on one part of the program without accidentally introducing problems into other parts of it; if someone has to remember every detail of a 50,000-line program in order to successfully change any detail about it, no one will ever be able to change it successfully. Isolating indivdual portions of our program from one another as much as we can is the only way we can build large programs that stand the test of time. Functions are the most fundamental tool for achieving that isolation.

Writing functions

We can introduce a new function into a Python program by using a statement called def. (The word def is short for definition or function definition.) Like everything else in Python, you can do this in the Python shell, though we'll much more often do it in scripts.

A def statement is a compound statement, the way that if statements and loops are compound; defs have other statements inside of them. The syntactic mechanism for expressing this is the same: We write a colon at the end of the first line of the def, then use indention to indicate what other statements are meant to be inside of it.

So, what might you need to say in order to define a function? Let's consider how you call a function. (Calling a function is what you do when you want to use it. When you define it, you want it to be ready to be used, so there's a correspondence between how you use it and what you might say in order to define it.)

When you call a function, you specify its name. If you were to define a function, then, you would expect that you would need to give it that name, so you could call it later.
When you call a function, you can pass arguments to it. Defining a function will require specifying what arguments the function needs in order to do its job.
When you call a function, you get back a return value when it finishes running; it gives you an answer. Defining a function will require us to say how that return value is determined.

As a first example, we could define a function gimme_five in the Python shell. Suppose that our intent is to be able to call it by passing it no arguments and, no matter what, it will always return the integer 5.


>>> gimme_five()
    5

How do we achieve that intent? Here's the definition of that function, followed by a call to it.


>>> def gimme_five():
...     return 5
...
>>> gimme_five()
    5

Let's unpack this syntax a bit:

The word def is a way of saying you've started to define a function.
Immediately after that is the name of the function; in this case, that name is gimme_five.
Immediately after that is a pair of parentheses, in which we define the function's parameters. If arguments are what you pass to a function when you call it, parameters are, correspondingly, where those arguments are passed. If you pass three arguments to a function, there would need to be three parameters in the function to accept them. In our case here, we didn't want to have to pass any arguments, so we have no parameters; nonetheless, the parentheses are necessary.
Immediately after the parameters appears a colon, followed by the function's body, beginning on the next line (and indented). The body of a function is a sequence of statements that will be executed when the function is called. You'll notice, in this case, that we have a statement called return in the body; that statement is a way to say "When you reach this statement, exit the function and give back this return value to whoever called it." The word return is followed by an expression, which will be evaluated; whatever that expression's value is, that's what will be returned from the function.

Calling a function that returns an integer, like this one, is a lot like any other expression that returns an integer. This means you can take that integer and do whatever you'd like with it: print it, store it in a variable, use it in an arithmetic expression, and so on. Every time the function is called within an expression, its body is executed, and its return value becomes the value of that call within the expression.


>>> print(gimme_five())
    5
>>> x = gimme_five()
>>> x + 10
    15
>>> gimme_five() + (gimme_five() * 3)
    20

The difference between a function and a function call

I should point out, too, that calling a function requires the parentheses, even if you're not passing any arguments to it; it's the parentheses that establish that you want to call it. That doesn't mean you can't evaluate gimme_five without the parentheses in the Python shell, but you should be aware that you would be doing something very different: Its result would be the function itself, as opposed to the result you'd get from calling it. Functions, as it turns out, are objects, just like strings, ints, bools, and so on; their type is function.


>>> gimme_five
    <function gimme_five at 0x000001E5497A16A8>
>>> type(gimme_five)
    <class 'function'>

(The funny-looking value 0x000001E5497A16A8 is an address in memory; you might find, if you try the same thing, that you get a different address than I did. For the most part, we're unconcerned about addresses in memory when we write Python code, and we have little or no control over where things will be stored. We care about two addresses being equal or different sometimes, but not so much about specifically where things are.)

The implications of functions themselves being objects are more powerful than you might at first realize, but that's a conversation we'll return to later. For now, you should be aware that you need parentheses both around the arguments when you call a function and around the parameters when you define them, even when there aren't any arguments or parameters.

Parameters and arguments

More often than not, the functions you write will need parameters. Functions are each intended to do a job; quite frequently, it takes some kind of input for the function to know what job you want done, so you'll need to pass arguments to it (and the function will need parameters to accept those arguments). The print() function needs arguments because it needs to know what you want printed; the int() function needs an argument because it needs to know what value you want to attempt to convert to an integer; and so on.

If you want to write a function that accepts arguments, you'll need to define the corresponding parameters. You'll also need them to have names, because you'll need to refer to them within the body of the function. Parameters are a lot like variables, in the sense that their job is to store an object and allow it to be used later. The difference is that they are more temporary than the variables we've seen so far; they live only as long as the function is executing, then they're destroyed.

Suppose you wanted to write a function that takes a number and tells you its square (i.e., the result of multiplying that number by itself). After you've written the function, you might expect to be able to do this.


>>> square(3)
    9

How you would write that function is similar to how we wrote gimme_five() above, with the main difference being that we'll need to define one parameter (to accept the number that we want squared), and then we'll need to use that parameter within the body of the function (so we can square it).


>>> def square(n):
...     return n * n
...
>>> square(3)
    9
>>> square(5.5)
    30.25

Defining a parameter in a function is as simple as listing the parameter's desired name within the parentheses after the function's name; this is enough to establish its existence. You can define multiple parameters by simply separating their names with commas. (If you've previously written programs in a language like Java or C++, you might wonder why you don't need to specify parameters' types. There is no explicit restriction on what type of value can be passed into a parameter, similar to how there is no restriction on what type of value can be stored in a variable.) Once you've defined a parameter, you're free to use it within the body of the function by simply specifying its name.

When you call a function, the arguments are matched to the parameters in the order specified, with the first argument passed into the first parameter, the second argument passed into the second parameter, and so on. (If the number of arguments doesn't match the number of parameters, an error will occur.) The body of the function is then executed, with the values of the parameters being whatever was passed into them. So, in the case of the call square(3) above, the following things happen.

The integer 3 is passed to square.
square's first parameter, n, is given the value 3.
n * n is calculated within the body of square. Since the value of n is 3, n * n is 9.
Since n * n is the expression in a return statement, the integer 9 is returned from the function.

Type errors

Even though the types of the parameters are not explictly specified, there is nonetheless an assumption being made within the body of the function about their types. By virtue of what we do with those parameters within the function, a particular type of argument might be compatible or incompatible with it.

For example, the square() function we wrote above is making an implicit assumption, even if we didn't say it directly in the code we wrote. The value of n has to be something that can be multiplied by itself. This means that n could certainly accept an integer or a float — because you can certainly multiply numbers by numbers — but could not accept a string.

However, even though our square() function can't successfully process a string argument that you pass to it, it is still possible for a Python program to run with this line of code in it.


square('Boo')

The program would still be syntactically legal Python, so it would still be possible for it to run. However, it wouldn't necessarily run successfully; the function square will fail, at run time, when it's called with an argument that can't be multiplied by itself.

We can see this in the Python shell by trying to call it that way.


>>> square('Boo')
    Traceback (most recent call last):
      File "<pyshell#3>", line 1, in <module>
        square('Boo')
      File "<pyshell#2>", line 2, in square
        return n * n
    TypeError: can't multiply sequence by non-int of type 'str'

A couple of interesting facts emerge from this example.

This kind of failure is the same as other kinds of failure we've seen before, such as dividing by zero or trying to convert a string to an integer when it doesn't contain digits. Aside from syntax errors — code that is structurally incorrect Python, such as an if statement without a colon in the right place, or not following the indention rules correctly — all error-checking is done at run-time, and it is all reported (and, ultimately, handled) the same way.
Tracebacks don't only tell us where the error occurred. Notice that we saw two different lines of code listed in the traceback this time: the line of code in the square() function (which is where the failure occurred, trying to multiply two strings together), as well as the line of code where the square() function was called. This is why they're called "tracebacks," because we're "tracing back" through the entire chain of function calls that have led to where we are now. That can give us a lot of evidence to use when we're trying to understand why something failed.

Docstrings

The body of a Python function can begin with a docstring, which is a string literal that describes — to a human reader — how a function works. The best docstrings briefly describe what the function's job is, along with anything that one would need to know about its parameters, its return value, and the ways in which it might fail. What you'll find, generally, is that the need to write a long, complex docstring is actually indicative of a function that is solving too many problems; functions that have a single responsibility will tend to have short docstrings, for the simple reason that there won't be that much to say about them.

Writing a function with a docstring is as simple as beginning its body with a string literal.


def square(n):
    'Computes the square of a numeric argument'
    return n * n

When you write functions in this course, you'll generally want to write a docstring, both to communicate your design goals to us, but also to ensure that you're thinking about them yourself. If you can't figure out what to write in a docstring, how can you understand what function you're trying to write? How will you know when you're done?

It's worth noting, too, that Python provides multi-line string literals, which are denoted syntactically by being surrounded by three single-quote characters on either side. That provides a nice way of writing a docstring that is long enough not to fit readably on a single line.


def square(n):
    '''Computes the square of a numeric argument, while failing when
    given an argument that is not numeric.'''
    return n * n

Writing functions in Python scripts

Aside from when we're experimenting, most of the functions we write in Python will be written in Python scripts. Functions allow us to take complexity in our program and "hide" it — not in the sense that the complexity can't then be seen, but in the sense that we can then call the function without considering every small detail of how it works. Where this kind of thing pays off is when we're writing programs that we can use again and again, so it stands to reason we would tend to benefit from this when we're writing scripts.

We write functions in Python scripts the same way we do in the Python shell, by writing def statements. For example, we could create a new Python script and write these statements in it.


def square(n):
    return n * n


def cube(n):
    return n * n * n

Suppose we then ran that script in IDLE. Here's what we would see, after the shell restarted.

>>>

Why didn't we see any numbers get squared or cubed? Remember that there is a difference between defining a function and calling it. Here, we have a script that defines two functions, square and cube, but doesn't call either of them. Defining a function makes it available to be called, but doesn't actually call it. Of course, having executed the script, the functions will have been defined, so we could then call them within the Python shell.


>>> square(4)
    16
>>> cube(5)
    125

Many Python scripts we write will only contain definitions. In other words, their role will be to make things available to other scripts, but not to do anything on their own. But when we want to write Python scripts that are stand-alone programs — ones that should do something when we run them — then they'll need not only to define functions, but also to call them somewhere. The simplest way to do that is to include the calls directly within the script.


def square(n):
    return n * n


def cube(n):
    return n * n * n


def read_number():
    return int(input('Enter a number: '))


num = read_number()
print('The square of', num, 'is', square(num))
print('The cube of', num, 'is', cube(num))

The order in which we say all of this matters. Python will execute this script in the order that it's written; executing it will cause the following things to happen in the following order.

The function square will be defined first.
Next, the function cube will be defined.
After that, the function read_number will be defined. (Note that nothing has happened yet, except that there are now functions available to be used later. But, so far, we haven't multiplied any numbers, read any input, or anything else; the functions haven't been called yet.)
Now, the statement num = read_number() executes, which calls the function read_number() that we already defined, then stores the result in a variable called num.
Next, we call square(num) and use it as part of some output that we print out.
Finally, we call cube(num) and use it as part of some output that we print out.

So, if we executed this script in IDLE, we would be able to have the following interaction with it (including evaluating some additional expressions in the shell after it finishes running).


    Enter a number: 5
    The square of 5 is 25
    The cube of 5 is 125
>>> num
    5
>>> cube(7)
    343

Why the order in which the script is written matters is because, broadly, things need to be defined before they're used in Python. The functions square, cube, and read_number can only be called once they've already been defined. So, the last three statements in our script — which call each of the three functions — must appear at the bottom of our script. If they appeared before the definitions of the functions, the script would terminate with an error, because of an attempt to call a function that didn't yet exist.

Scope and scoping rules

So far, we've only seen Python functions whose bodies each contain a single return statement. While we will write one-line functions like these sometimes, many of the functions we write will be longer than that. All of the statements that we've seen so far in Python can be used within the body of a function, and functions can legally contain as many statements in their bodies as you'd like, which are subject to the same rules of control flow — if statements for conditionality, loops for repetition, and so on — that we've seen already. (The only "special" statement we've seen so far is return, which can only appear in the body of a function.)

Suppose that we wanted to write a function that asks a user to specify a person's first name and last name separately, then return that name formatted with the last name specified first, the first name specified afterward, and a comma separating them. And, for the sake of argument, let's assume that both the first and last name have to be non-empty — though, of course, there are people who don't have both. All of this actually entails a fair bit of complexity, so it would be worth writing a function to encapsulate it.


def read_name():
    while True:
        first_name = input('What is the first name? ').strip()

        if first_name == '':
            print("You'll need to enter a first name")
        else:
            last_name = input('What is the last name? ').strip()

            if last_name == '':
                print("You'll need to enter a last name after the first")
            else:
                return last_name + ', ' + first_name

If you wrote this function by itself in a Python script and then executed that script in IDLE, the function would be available to call, which might lead to an interaction like this in the Python shell.


>>> name = read_name()
    What is the first name? Boo
    What is the last name? 
    You'll need to enter a last name after the first
    What is the first name? Boo
    What is the last name? Thornton
>>> name
    'Thornton, Boo'

Notice that the body of the function read_name contained assignments to two variables: first_name and last_name. Based on what we've seen so far, it stands to reason that we should be able to obtain their values in the Python shell — whenever we've defined something in a Python script, we've been able to get to it in the Python shell after the script finishes executing. So, let's try it and see what happens.


>>> first_name
    Traceback (most recent call last):
      File "<pyshell#5>", line 1, in <module>
        first_name
    NameError: name 'first_name' is not defined

As usual, this error message turns out not to be an accident; it's indicative of some additional rules in Python that we've not seen yet. When we define a name in Python — such as when we assign a value to a variable or define a function — it is not necessarily available throughout the entire program. Not all names are defined globally; many of them are (purposefully) defined more restrictively than that. (This is an example of a broader principle we'll see throughout this course: Perhaps paradoxically, our programs become more flexible as we restrict the ways that each part of it can be used.)

Global and local scopes

In Python, each definition exists within a scope, which is the portion of the program in which that definition is available. Anything that is named — variables, functions, parameters, and so on — are subject to this rule.

A global definition is one that is made within a Python script (or within the Python shell), but not inside of a function. For example, consider the following Python script.


x = 10
y = 20
z = 30

def foo(a, b, c):
    parameter_sum = a + b + c
    return x + y + z + parameter_sum

In this script, the variables x, y, and z are part of the global scope, as is the function foo(). For that reason, we would expect that we could execute the script and then access x, y, z, and foo() from within the Python shell.


>>> x
    10
>>> y + z
    50
>>> foo(1, 2, 3)
    66

This is because our shell interactions, too, are being made from within the global scope; we can access anything in the shell that is globally accessible.

By way of contrast, the function foo() contains some definitions that are in its local scope, which is to say that they are accessible only from within the function. The parameters a, b, and c, as well as the variable parameter_sum, are all local to foo(). Let's think about why that is.

Every time the function foo() is called, it will be passed a set of arguments. Each time, the three parameters — a, b, and c — may well have different values.
While the function foo() runs, the variable parameter_sum is re-calculated from the values of the arguments passed into the function.
Once the function foo() returns, its job is done; there's little value in being able to access its parameters or variables afterward. Being able to depend on prior values in these parameters or variables invites our program to become more brittle, such that changes to the function later could invalidate other seemingly-unrelated parts of the program.
One thing we'll see later in the course is that functions in Python can be recursive, which means that a function can call itself (directly or indirectly). For this reason, it's vital that there be a local scope, so that each call to the function can have its own separate copy of its parameters and variables.

So, when we assign to a variable from within a function, we're assigning to a local variable, which is to say a variable that is defined in the function's local scope. Note that this is true even when we're talking about variables with the same name within and outside of a function.


q = 5

def example(n):
    q = n * n
    return q

print(example(q))
print(q)

If we executed the Python script above, the output would look like this:


    25
    5

This is because the assignment to q in the example() function does not affect the global variable q; it instead creates a new local variable q, which is local to the function example() and is separate from the global one defined previously. When the same name is defined in more than one scope, there are rules about which one Python will "prefer," which generally boil down to "Prefer things that are defined more closely to where you are."

There is one other additional rule to be aware of, which might come as a surprise if you haven't thought it through carefully. Consider this Python script, which is similar to the previous one, but is not quite the same.


q = 5

def example(n):
    m = q * n
    q = m * n
    return q

print(example(q))
print(q)

Consider what might happen if we executed this script. Knowing what we know about Python already, we would expect the following things to happen.

The global variable q is defined, with its value set to 5.
The function example() is defined. Remember that functions aren't called when they're defined; they're available to be called, but haven't been called yet.
The first call to print() will occur, but only after example(q) returns. So, the next thing to happen is example(q) is called. Since this call is in the global scope, q is the global variable whose value is already 5.
Next, we begin the call to example(). The parameter n is set to the value of the argument, the global variable q, which is 5.
The first statement of example() executes first, in which we multiply q * n, then assign its value to m. We know that m will become a local variable within example(), because we know that assigning to a variable within a function will assign to a local one. But what is the value of q in this statement? There will be an assignment to a local variable q on the second line of the function, where we said q = m * n, but that hasn't happened yet. So, what happens here? The answer might surprise you.


    Traceback (most recent call last):
      File "D:/Examples/32A/scopes3.py", line 8, in <module>
        print(example(q))
      File "D:/Examples/32A/scopes3.py", line 4, in example
        m = q * n
    UnboundLocalError: local variable 'q' referenced before assignment

This outcome may seem a little bit perplexing, because the local variable q doesn't seem to exist yet. Its value will not be assigned until the second statement within the example() function. But Python first scans a function's body and determines all of the local variables it will need. They're all created at the beginning of the function's execution — albeit without values, which is what it means for them to be "unbound" — but they can't be used until a value has been assigned to them. So, in this case, we see an error message. You can't use a local variable until after you've assigned a value to it.

Functions defined inside of functions

Functions don't have to be defined only in the global scope, though that's most often where you'll define them. It turns out that you can define them locally — within other functions — as well. Like local variables, this will render them usable only within the function in which they're defined. (In truth, it's not all that often that I use this technique; it's comparatively rare that I want to write a function this way. But, as we'll see later in the quarter, isolating functions can have its uses. For now, though, we'll focus on the affect this technique has on the scoping rules of Python.)

Suppose you wrote the following Python function.


def read_and_sum_numbers():
    def read_number():
        return int(input('Enter a number, or 0 to stop: '))

    total = 0

    while True:
        number = read_number()

        if number == 0:
            return total

        total += number

Here, we've written a function read_and_sum_numbers(), which reads a sequence of numbers from the user as input, then returns the sum of the numbers it read. Part of what it does is encapsulated in a smaller function read_number() inside of it, which is used to read a single number. Like anything else defined locally within a function, read_number() can only be called from within read_and_sum_numbers(), which we've done here.

We've seen previously that names defined globally can be accessed from within a function, unless they're "hidden" by a more locally defined version of the same name. But what if there are multiple nested scopes, like we have here? Now we're ready to take a more complete look at how names are looked up in Python when we use them.

Python's name lookup rule: Local, Enclosing, Global, Built-in (LEGB)

At any given time, there may be multiple scopes in play. For example, in the previous example, when we're within the function read_number(), there are three scopes whose names are potentially accessible:

The scope of the function read_number().
The scope of the function read_and_sum_numbers().
The global scope (i.e., the one in which read_and_sum_numbers() is defined).

Note that what makes a scope accessible is structural — it's a matter of which functions are defined within which other functions, not which functions have called which other functions.

When you use a name in a statement or expression, Python uses a simple rule that is sometimes abbreviated as LEGB to determine which name you've used. LEGB stands for Local, Enclosing, Global, Built-In and is the order in which Python looks for that name.

First, the local scope is checked. If the name is defined in that scope, that's the one you'll get.
If the name is not found in the local scope, next the enclosing scope is checked. This only comes into play when functions are nested within one another — the enclosing scope of read_number() is the scope of read_and_sum_numbers(), for example. If there are multiple levels of nesting, all are checked, in the order of how close they are to where the name was used.
If the name is not found in any enclosing scope, next the global scope is checked. If the name is defined globally, that's the one you'll get.
Finally, if the name doesn't appear in a local, enclosing, or global scope, Python checks the built-in scope, which consists of the names of things that are built into the language, such as the function type() or the type int. If the name is built in, that's the one you'll get.
And, if the name isn't found in any of these scopes, it is an error; you're not allowed to look for a name that isn't there.

For the most part, we will avoid assigning values to variables defined in scopes other than the local one, though we will feel free to read values from outer scopes (as we will feel free to have global constants — variables whose values we never intend to change — even if we won't use global variables whose values vary). Particularly as our programs grow larger, it is paramount that we be able to understand portions of them without having to consider the fine-grained details from other portions. Global variables are problematic because they essentially tie an entire program together; they're a detail that transcends our entire program, making it more difficult to keep different parts of the program isolated from the others. Since one of our key goals in this course is "leveling up" our skills, so that we can write dramatically larger programs than we could before, we'll stick to techniques that scale up properly.