ICS 32A Fall 2023
Notes and Examples: Modules


Python programs that span multiple files

At a fundamental level, Python programs are written in much the same way as programs in almost every other programming language in popular use: We write our code as text and store that text in files. To differentiate these files easily from others, we use the extension .py on their filenames — and, as we'll see, this turns out to be a requirement, not just a good habit to get into — but there is otherwise nothing special about them; they're just text files. (In fact, you can write Python programs that read Python programs using the same techniques you've learned for reading from text files; you can even execute them using built-in functions like eval() and exec() to run them in whole or in part.)

You can learn a lot about Python while writing only programs that are entirely stored in a single .py file, and there's a pretty good chance that this is all you've done in your prior coursework. However, as programs grow beyond a certain size, it becomes important to split them up into separate files of a practical size, rather than writing them as one giant file. You can imagine, for example, that a 400,000-line program written by many people over a period of multiple years would need some sense of organization, both so it becomes easier to find your way around it (i.e., you know where to look when you're looking for something in particular) and so it becomes easier for more than one person to work on it simultaneously without one person's changes stomping on another's. If two pieces of relatively unrelated functionality are written in separate files (as they should be), then one person could work on one of them while the other worked on another, and there would be much less chance of a conflict.

In this course, we're now reaching the point where our programs are large and complex enough that there is value in writing them so they span multiple files, so we're going to begin to require that you do it, beginning in Project 2. But, in order to do that, you're going to need to know a little more about how Python allows you to write and run programs made up of more than one file, and you're going to need to start developing a sense of how best to decide what belongs in each file, and how to isolate unrelated functionality so that changes in one file have the least likelihood of affecting the others.


What are modules?

A Python module is a file with a name ending in .py that contains a collection of definitions. Each definition establishes a name for something — a function, a constant, or (as we'll see later this quarter) a class. So, for example, a module might be as simple as this one:

powers.py

def square(n: 'number') -> 'number':
    return n * n

def cube(n: 'number') -> 'number':
    return n * n * n

We've previously said that .py files contain Python scripts, and this terminology isn't necessarily wrong. All Python scripts are modules, though not all Python modules are scripts. The powers.py module above is not a script, in the sense we've thought of them, because loading this particular module in IDLE and executing it (by pressing F5) leads to the appearance of nothing having happened; you'll simply see the Python shell prompt, but no other output. Yet it's not as though executing the module had no effect, because you'll now be able to call the functions defined in that module:


>>> square(3)
    9
>>> cube(4)
    64

Going forward, we'll say that a Python script is a .py file that does something when you run it; it's a program, if you will (or, at least, the entry point to a program). On the other hand, we'll say that a Python module is any .py file, including the ones that contain definitions but do nothing when you run them.

Additionally, there is more to the story than we've discussed before. To understand what's happening here, there are a few more details we need to be aware of.

Namespaces

A running Python program, particularly a large one, might have a very large number of names that it knows: the names of variables, functions, parameters, constants, classes, and so on. When programs are very small, organizing these names isn't much of an issue; variables can be global, all of your code can be in one module, and you can pay relatively little regard to organizational problems like these:

In Python, there isn't just one big global set of names. We've seen previously that Python, like most programming languages, has a set of rules about scope, which establishes where in a program each name is considered to have meaning. As a refresher, consider these two fairly nonsensical functions:


def foo(n):
    m = n + n
    return m

def bar(m):
    n = m + m
    return n

When you look at these functions, you might notice that they both have one parameter and use one variable, each with opposite names (i.e., foo calls its parameter n and its variable m, while bar calls its parameter m and its variable n). There are two key questions we should consider here.

The answer, of course, is that in neither of these scenarios is one call affecting another. Every time foo is called, it has a fresh parameter called n and a new local variable called m; bar is similar. We call these variables local because they belong to a particular call to a particular function. And every time these functions return, their parameters and local variables are destroyed, so subsequent calls to functions are unaffected. (Note, too, that this is even true of recursive functions: Every call to the function is separate, so if we're three levels deep in the recursion, we have three sets of the function's local variables.)

What this tells us is that Python must already provide a way to organize its names, so it can differentiate between the names that are local to one function and the names that are local to another. Python does this using what are called namespaces, which is a bit broader than just isolating local variables between functions, though that is part of what they do. You can think of a namespace in Python as a mapping between names and the objects that have those names.

In general, the job of namespaces is to give meaning to a name that you use somewhere in a Python expression. For example, consider this skeletal Python function:


def blah(...):
    ...
    print(x)
    ...

When we ask to print the value of x, where is x? The answer is "It depends", and Python looks in a few places. (A few details have been simplified here, but these are things that wouldn't affect programs written only using features we've learned so far.)

Conspicuously absent from this list are other modules. How does code in one module use definitions in another? The answer is that the other module must be imported first. We've previously seen that the import statement allows us to use definitions from modules that are part of the Python standard library, such as math or collections. But you may never have considered what import actually does. But we've now reached the point where we need to understand it in more detail.

Importing modules

When you want one module's definitions to be available to another, it becomes necessary to import that module. Importing a module causes the code in that module to execute — so, for example, if it contains five function definitions, those functions are now defined — and then makes its definitions available to the module that imported it. There are a couple of different ways to do it, with subtle but important differences in how they behave.

Suppose you have the powers module written above, stored in the same directory as the module below.

example.py

import powers

def read_and_square() -> None:
    number = input('Number: ')
    number_squared = powers.square(number)
    print(f'The square of that number is {number_squared}')

Since we have a file called powers.py in the same directory as example.py, the attempt to import powers will succeed. (The reason it's important that they're stored in the same directory is that this is the first place that Python looks when you import a module. If there's a module with that name in the same directory, it'll find it. If not, there are a few other places it'll look — and you can even configure this list, as it turns out — but we won't write programs in this course except ones where all of the modules are in the same directory.)

However, note that the only way we can access the functions in powers from within example is to qualify the name (e.g., powers.square). This is because only the powers module is added to example's global namespace; the only way to access what's in the powers module is to say "Give me the square function in the powers module," which you say by writing powers.square.

In most cases, I find that this is the best way to import a module in Python. While I have to type a few extra characters in order to call that module's functions, my code becomes easier to read, because when I use a function, it's clear what module it comes from. My goal is never strictly to type less — and I would feel that way even if I typed a lot slower than I do — but to be able to read my code much later (after I've forgotten the details) and be able to find my way around and understand it at a glance. Brevity sometimes helps with understanding; quite often, they are opposed.

We've seen previously that there is a variant of the import statement that dumps individual definitions from within one module into the global namespace of another. Consider the example2 module below:

example2.py

from powers import square

def read_and_square() -> None:
    number = input('Number: ')
    number_squared = square(number)
    print(f'The square of that number is {number_squared}'))

In this example, the from..import variant is used instead. The difference is that we're asking Python to take the square function from the powers module and add it directly to the global namespace of example2. This means that code in example2 can now simply refer to it as square, without having to qualify it as powers.square. It also means that this module can't define anything else called square (e.g., a function or a constant) defined globally; each module can only have value for each name. (This latter issue is why I never, ever use the from x import * pattern, because this dumps a potentially large set of names into a module's global namespace, colliding with any same-named values in my own module. And every time the x module changes, it might introduce a new conflict that wasn't there before. This is a dangerous choice, especially in a program that will have a long life, as so many real-world programs do.)

So, from x import y does appear to slightly simplify uses of y, but you should carefully consider the tradeoff that's being made here. In return for typing less, you've also made your program a little bit harder to read and navigate. When you read the read_and_square function, you have to wonder where square is defined, a detail you may have forgotten since you originally wrote it. In a module as short as this one, it's probably not important one way or another; but as modules get longer and more complex, it becomes a lot more important for details to be as obvious and clear as possible.

Executable modules (if __name__ == '__main__')

When you load a module in IDLE and execute it (by pressing F5), the code in that module is executed. If it generates any observable result, like printing output or asking the user for input, you'll see that in the Python shell. Otherwise, you'll see a standard >>> Python shell prompt, and all of the module's definitions will now be available — so, for example, if the module defines a function, you could now call it.

As we've seen, modules in Python have names. We can check the name of the currently-executing module at any time by accessing the global variable __name__, which is present in every module.

In general, the names of modules are indicated by their filenames; a module written in a file boo.py has the name boo. But there's one special case that we haven't talked about: When you execute a module in Python (i.e., by pressing F5 in IDLE), it's given the special name __main__ while it runs. (Anything you define in the Python shell will be considered part of the __main__ module, as well.)

This little fact can be a useful way of differentiating between whether a module has been executed (i.e., is it the "entry point" of a program?) or whether it's been imported. In general, importing a module shouldn't suddenly cause things to happen — output to be generated, input to be read, and so on — but should, instead, simply make definitions available that weren't previously. Even executable modules, the ones we expect to be able to execute as programs, should behave differently when imported than they do when executed.

To facilitate this distinction, we can simply check the module's name by accessing the __name__ variable. If its value is '__main__', the module has been executed; if not, the module has been imported. So, in an executable module, we typically write the code that causes things to happen when the module is executed in an if __name__ == '__main__': block, so that it will only happen if the module has been executed. Meanwhile, if the module is imported, its definitions will become available to another module, but there will otherwise be no effect.


Public vs. protected definitions

In a module that contains a collection of definitions, we can divide them clearly into two groups:

The definitions in the former category are often called the public definitions; they're the tools that the module provides to other modules. The definitions in the latter category are often called the protected definitions; they're to be used within the module, but not outside of it.

The words "public" and "protected" are chosen for a reason. They connote an idea of whether something is available from the perspective of code outside of the module. A module makes some parts of itself available to others, but, in the process, there is an understanding that if those things change — particularly their "contract" (e.g., if it's a public function, what if a parameter is added to it?) — then code in other modules will certainly be affected. A module hides other details, particularly the ones that are specific to its implementation, which are details that the author reasonably expects might change, be added, and disappear as the module's implementation evolves.

In general, whenever you add a definition to a module — a function, a constant, or (as we'll see later) a class — you should consider whether it should be public or protected. One handy default is to think of everything as protected unless it needs to be public, but this can be a bit of a nuanced distinction, and we'll see plenty of examples throughout the rest of this course that will make it easier for you to decide which is which. The good news, too, is that you can change your mind later; you can always make a protected function into a public one later, say, if you discover that you need it to be, and you shouldn't affect code in other modules, since no code in other modules will have been calling the function while it was protected.

How to distinguish between public and protected

Python does not make explicit the distinction between "public" and "protected"; there's no special keyword or style you can use that will explicitly enforce a function's "protectedness" within a module, for example. Even if you say a function is protected, it will still be available to any module, anywhere, at all times. Still, we should respect the distinction in the code we write, and when using code written by others, and there needs to be some way for Python programmers to recognize, at a glance, whether a definition is public or protected.

It turns out that there is a well-known naming convention that's long been in use — and that we'll use in this course — to make the distinction evident to readers of the module's code. Python programmers traditionally separate the "public" from the "protected" by prefixing the names of "protected" definitions with a single underscore ('_') character. Since the protected functions, by their nature, are the ones that are more likely to be changed, added, or removed over time, the underscore serves as a sort of warning to users of the module that they should exercise caution and quite probably stay away from such functions, because they're likely to change in ways that will break callers. I'll adopt the underscore convention for protected definitions in modules from now on in this course; we'll expect you to do the same, and we'll also expect your code to respect protected definitions and not use them from outside of the module in which they're defined.


Some sensible style and design guidelines

Since we're concerning ourselves not just with writing programs, but with writing programs well, the design of our programs is becoming as important as whether they work. One of the broad goals of this course is improving your ability to write programs that are much larger than the ones you may have written in previous coursework. So as we learn new techniques, it's important that we also learn some sensible guidelines for how to use them appropriately, with an eye toward helping you to manage the complexity of your own large programs. Below are some good rules of thumb that you should follow this quarter when you're designing and implementing your modules.

Beginning with Project 2, we'll be applying these ideas in our grading of your projects.