ICS 32 Winter 2022, Notes and Examples: Exceptions and Files

What are exceptions?

We've seen a number of times previously there are times when we write Python — be it expressions evaluated in the Python shell or code written in a Python script — and it fails with an error message instead of producing the result we wanted. While it's certainly sometimes the case that these mistakes are indicative of problems in the code we wrote, that's not always the case. Even in perfectly-written programs, things can go wrong. For example, as soon as a program reads input from an external source, such as a user typing something via the keyboard or the program reading the contents of a text file, there exists the possiblity that the input won't conform to what was expected.

>>> x = int(input('Enter a number: '))
Enter a number: This is not an integer!
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    x = int(input('Hello: '))
ValueError: invalid literal for int() with base 10: 'This is not an integer!'

So far, you might have only ever seen these failures to simply manifest themselves as error messages. Everything stops, we see the error message, and we can use it to diagnose the problem. And we're not left without a trail of evidence, because we see a traceback with the error message, which specifies where the failure occurred. That's the end of our trail of evidence, the symptom of our problem; it's up to us to follow that trail back to the initial cause.

You might be wondering why errors occur in such a harsh way. Why "panic" and give up so quickly? This is a question with a two-part answer.

If we're just typing an expression or statement into the Python shell, it's generally not going to be possible to just "muddle through," anyway. In the example above, we wrote a statement that was intended to read an integer from the keyboard and then store it in the variable x. If the user types something other than an integer, there's nothing to store in that variable. In fact, we can verify that by trying this afterward.
```
>>> x
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    x
NameError: name 'x' is not defined
```
The assignment to x never happened, for the simple reason that the value we intended to assign was the result of calling the int() function, which failed. If that function failed, it had no result, so there was nothing to assign.
It turns out that Python doesn't just panic and give up immediately. By default, failures cascade and programs crash with a traceback, as we're seeing in this example here. But it's possible to step in and prevent that chain reaction, when it's a problem we can anticipate, such as a user typing input not conforming to the requirements. The goal, in general, is to differentiate between things that are mistakes in the program — which we can't often do a lot about, other than fix them — as opposed to problems that lie outside of the scope of the program. The latter we can potentially handle, as long as we realize they might happen.

Understanding the difference between success and failure

When a function is called in Python, that function is being asked to do some kind of job. The function does the job and returns a result — always an object of some type, though that object might be the special object None if the function's role is to generate some kind of side effect (such as printing output) rather than calculating and giving you back a result. (Even if a function reaches its end without reaching a return statement, it still returns a value: None.) As you've seen, many functions accept parameters, which allow the function to do a slightly different job each time it's called; for example, a function that downloads an image from the Internet would probably take at least one parameter, the address from which the image should be downloaded, so that the function could potentially be used to download any image instead of just one particular image.

The interaction between a function and its caller in Python has at least some similarity to certain kinds of interactions between people. Think about what happens you ask a friend to do something for you, like "Here's $5. Can you drive over to Starbucks and buy me a latte?", which, conceptually, is a lot like calling a function in Python (with "$5" and "Starbucks" as its parameters and "latte" as its expected result). Even assuming your friend understands your instructions perfectly and is willing to do it, are you guaranteed to get the result you asked for — in this case, a latte? — or are there circumstances where you won't get it? Of course, failure is certainly a possibility here. Your friend's car might not be in working order, or it might be in use by someone else, or your friend might not even have one! Starbucks might be closed, or they might have run out of coffee. A latte might cost more than $5, and your friend might not have any more money than you provided.

Now let's think again about a Python function that downloads an image from the Internet. Even assuming that the function is perfectly written, can anything go wrong there? Sure! Your Internet connection might not be working. The web site from which you're trying to download the image might be down, it might not contain the image you asked for, or it might not exist at all. What result should the function return in these cases? Going back to the previous example, when you send someone to Starbucks and it turns out that Starbucks is closed, you get no result at all; instead of handing you a latte, your friend might instead inform you that the job couldn't be done and why. "Sorry," your friend might say, "I couldn't get that coffee for you, because Starbucks was closed." Or, right away, your friend might say, "Are you crazy? I don't have a car, remember?!" Either way, you're not getting the coffee you wanted.

In Python, when a function is called, it is being asked to do a job. Broadly speaking, just like in the case of sending your friend for coffee, there are two possible outcomes, even assuming the function has no bugs:

The function will complete its job successfully and return an object of a type you expect.
The function will fail to complete its job. Functions fail differently than they succeed in Python. Rather than just returning an object that indicates failure, they don't return an object at all, but instead raise an exception. The mechanism of failure is completely separate from the mechanism of success.

Despite their name, there's nothing exceptional about exceptions. They're not rare, they're not necessarily indicative of bugs, and they don't cause well-written programs to crash when we can reasonably anticipate them. An exception just means a function failed to complete its job. Where some finesse is required is in deciding what should be done about it.

What happens when an exception is raised

An exception that is not handled anywhere in a program will cause a crash and you'll see a traceback, which specifies information about the unhandled exception and where the program was in its execution at the time the exception was raised. For example, consider this nonsensical Python module.

oops.py

def f():
    x = 3
    g(x)

def g(n):
    print(len(n))

if __name__ == '__main__':
    f()

If you run this module in IDLE, you'll see the following result, which offers some insight about what happens when an exception is called in Python.

Traceback (most recent call last):
  File "C:\Example\oops.py", line 11, in 
    f()
  File "C:\Example\oops.py", line 3, in f
    g(x)
  File "C:\Example\oops.py", line 7, in g
    print(len(n))
TypeError: object of type 'int' has no len()

When you see a traceback like this, it's important to actually pay attention to what it says. Reading a traceback from the bottom up provides a lot of useful information, even if you don't quite understand the error message at first, because tracebacks don't only tell us what the error was, but also where the error occurred.

The type of the exception — exceptions are objects in Python, just like everything else — is a TypeError.
A more descriptive account of the problem is object of type 'int' has no len(). That's a hint that we were trying to get the length of an integer, but that integers have no length.
The exception was actually raised on line 7 of oops.py, by code in the function g(). (The traceback even shows us the code that's on line 7: print(len(n)).)
The function g() had been called by the function f(), on line 3 of oops.py.
The function f() had been called by the "main" if statement, on line 11 of oops.py.

This is the trail of evidence that's been left for us, describing the symptom of our problem. It's our job to take that evidence and determine the underlying cause. Given all the information here, it doesn't take long to figure out what happened:

f() was called.
f() initialized a local variable x to the integer value 3. So x had the type int.
f() called g() and passed x to g()'s parameter n. So, within g(), n also had type int.
g() attempted to get the length of n. But n was an int and ints have no length! The call to len() failed — it couldn't do what we asked — so it raised an exception.

When an exception is raised by a function, that function can be considered to have failed; it couldn't complete the job that it was asked to do. This sets off a chain reaction of sorts, albeit one that can be stopped. If a function raises an exception, control is given back to whatever function called it, which has two options:

Handle the exception.
Choose not to handle the exception. In this case, the calling function will also fail — the failure of one function has implied the failure of the one that called it — and control is given back to whatever function called the caller, who will have the same two options.

The cascading failure of functions continues until a function handles the exception, or until all active functions fail, at which point the program will crash and you'll see a traceback. In this example above, that's why we saw the traceback: g() raised the exception and didn't handle it; f() didn't handle it, either; and the if statement in the "main" block didn't handle it, either. Since the exception was never handled, the program crashed, and the traceback was shown.

In other words, when we see a traceback, it's not because a problem arose. It's because a problem arose and the parts of the program that were active (the functions we were in the midst of running at the time) didn't know what to do about it.

Catching an exception

We specify what should happen in a function when exceptions are raised by writing a try statement. A try statement is built out of clauses and is structured like this:

try:
    statements that will be attempted once
    if any exception is raised, control leaves the "try" clause immediately
except:
    statements that will execute after any statement in the "try" clause raises an exception
else:
    statements that will execute after leaving the "try", but only if no exception was raised
finally:
    statements that will always execute after leaving the "try", whether an exception was raised or not
    note that these statements will happen after any in the "except" or "else" that also need to execute

There are a few combinations of these clauses that are legal; other combinations are illegal because they are nonsensical. (Think about why.) In both cases, the clauses must be listed in the order below:

A try and a finally and nothing else
A try, at least one except, (optionally) an else, and (optionally) a finally

Being careful about what kinds of exceptions you catch

Exceptions are Python objects; like all objects, they have a type. An exception's type classifies what kind of failure occurred. When you see a traceback, the exception's type is included in what's displayed, which helps you to understand what went wrong. For example, the last line of the traceback in the example above said this:

TypeError: object of type 'int' has no len()

In this case, the type of exception that was raised was one called TypeError, a type built into Python that represents a problem revolving around incompatibility of types with the operations you're trying to perform on them (e.g., trying to take the length of an integer, or trying to add an integer and a string together). There are other types of exceptions that are built into Python that you might have seen before, as well, such as ValueError, NameError, and IndexError; these represent other ways that functions can fail. As we'll see a little later this quarter, you can also define new types of exceptions, which classify kinds of failures that are specific to the programs you write, as opposed to the ones built into Python, which are more general and could apply to many programs.

except clauses can optionally — and, more often than not, they do — specify a type of exception that they handle. Python only executes except clauses when the type of exception matches the type that the except clause can handle. except clauses with no type listed can handle any kind of exception, though these are somewhat dangerous in practice, because they'll handle every kind of problem the same way, meaning even a program bug (like misspelling the name of a variable) will be handled the same way as the kinds of problems you expect (like a file not existing when your program tries to open it).

So why is it potentially dangerous to write except clauses that don't specify a type? Consider these three short Python functions:

def foo():
    return 14

def bar():
    b1 = 3 * foo()
    return bi

def example():
    try:
        answer = bar()
        print(f'The answer is {answer}')
    except:
        print('ERROR')

Read these functions carefully and then decide what would happen if we defined those functions and then did this in the Python shell:

>>> example()

The answer might surprise you if you didn't read carefully. All we'll see is this:

ERROR

But why? Let's trace through it:

example() is called.
Control enters the try clause.
The statement answer = bar() is an assignment statement. The assignment of answer can't happen until we know what bar() returns, so bar() is called.
bar() begins with the statement b1 = 3 * foo(). Again, the assignment can't happen until we know what value will be assigned, which depends on the value returned from foo(). So foo() is called.
foo() returns 14.
Now that foo() has returned 14, bar() can calculate 3 * foo() as 3 * 14, which gives the result 42, which is assigned into b1.
Still in the bar() function, we reach the line that reads return bi. This line actually has a subtle typographical error in it. We probably intended to say return b1 here, but we mistyped b1 as bi instead. Because there is no variable called bi, Python raises an exception — a NameError — that (rightly) causes the bar() function to fail.
Since bar() failed, control passes back to the code that called it, which is the line answer = bar() in the example() function. Since the call to bar() failed, the assignment can't occur; instead, example() has the standard two choices: handling the exception or allowing the failure to cascade. In this case, though, answer = bar() is within a try clause that has an except clause that has no type specified. This means that if anything goes wrong in that try, we'll immediately jump into the except; whatever code is in the except clause will run, and then the error will be considered to have been handled. Our except clause simply prints the word ERROR.

So what is the net effect of this? The example() function claims to be quite resilient: If anything goes wrong, it claims to be able to handle the problem. Thinking naively about it, one might consider this to be a good design decision: example() is crash-proof! But think more carefully. Has it really handled the problem? "Swallowing" an exception and printing a generic error message is, in a lot of ways, worse than just letting the program crash. We still didn't get the result we wanted from calling example(), but instead of giving us an error message we can use — a traceback that indicates what went wrong and where — we instead see the word ERROR and are left with no idea of what went wrong.

Consider, instead, if we'd written example() this way instead:

def example():
    answer = bar()
    print(f'The answer is {answer}')

Now, calling example() will have a different result:

>>> example()
Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    example()
  File "<pyshell#12>", line 2, in example
    answer = bar()
  File "<pyshell#6>", line 2, in bar
    return bi
NameError: name 'bi' is not defined

The outcome is the same in one sense: We didn't get the result we wanted. But this time we got an error message that told us exactly what went wrong and where. Especially when we're working on writing a new program, hiding these kinds of details makes our job much more difficult. It's easy to misspell a variable name, to use the name of a function instead of calling it, and so on; for those errors to cause our program to misbehave without telling us why will make it extremely difficult to find and correct these inevitable mistakes.

Catching only one type of exception

An except clause can specify a type of exception by simply listing its name after the word except.

def read_number_and_print_square() -> None:
    try:
        number = int(input())
        print('The square of the number you entered is', (number * number))
    except ValueError:
        print('That is not a number')

In this example, we're catching only the one type of exception that we reasonably expect might go wrong. The call to int() may fail if the user's input is something that can't be converted into an integer; if that's the case, it will raise a ValueError. So, here, we've handled just the ValueError. If we had misspelled the name of a variable or made any other minor mistake in writing this function, it would have manifested itself in a different kind of exception, one that this function does not know how to handle.

Understanding how and when to handle exceptions

Once you understand the mechanics of how a construct in Python behaves, your next task is understanding the appropriate ways to use it; no part of a programming language is right for every circumstance. We've now seen how you can handle exceptions, but the more nuanced problem is understanding when to handle them and when not to handle them. Here are a few guidelines to consider.

When you're writing a function f that calls another function g, one thing you want to be thinking about is whether g raises exceptions (i.e., whether g can fail to complete its job). If so, you then need to consider whether the failure of g also implies the failure of f, or whether f could reasonably carry on the same job in some way. If f fails whenever g fails, you won't want to catch the exception in f; if f could reasonably continue, f should catch the exception and then continue its work.
All in all, in my experience, it's more common for a function not to catch an exception than it is to catch it. This is especially true when you tend to write relatively short, simple functions, and break larger, more complex functions into smaller ones — as you should be doing — because it's more likely that the failure of one will cause a cascading failure of several others that are each doing a smaller slice of something bigger.
The finally clause is primarily used for what you might call cleanup. This is most especially true when a function acquires some kind of external resource — like a file or a connection across a network — that only it (and the functions it calls) will use. In the case of a file, for example, a finally clause provides an obvious place to close the file if it was opened successfully.
When you don't want to catch an exception, but you do want to ensure that cleanup is done when an exception is raised, a try/finally (with no except or else) is appropriate. That way, if the code in the try statement completes successfully or if an exception is raised and the function is fails, the cleanup in the finally will always be done.