ICS 32A Fall 2023
Project 3: From the Faraway Nearby

Due date and time: Friday, November 17, 11:59pm


Background

We saw in the previous project that our Python programs are capable of connecting to the "outside world" around them — to other programs running on the same machine, or even to other programs running on different machines in faraway places. This is a powerful thing for a program to be able to do, because it is no longer limited to taking its input from a user or from a file stored locally; its input is now potentially anything that's accessible via the Internet, making it possible to solve a vast array of new problems and process a much broader collection of information. Once you have the ability to connect your programs to others, a whole new world opens up. Suddenly, the idea that you should be able to write a program that combines, say, Google search queries, the Internet Movie Database, and your favorite social network to find people who like movies similar to the out-of-the-ordinary ones you like doesn't seem so far-fetched.

But we also saw that getting programs to share information is tricky, for (at least) two reasons. Firstly, there's a software engineering problem: A protocol has to be designed that both programs can use to have their conversation. Secondly, there's a social problem: If the same person (or group of people) isn't writing both programs, it's necessary for them to agree on the protocol ahead of time, then to implement it. This second problem has a potentially catastrophic effect on our ability to make things work — how could you ever convince a large entity like Google to agree to use a bespoke protocol just to communicate with a program you wrote?

In practice, both of these problems are largely solved by the presence of standards, such as those defined by the World Wide Web Consortium and the Internet Engineering Task Force. Standards help by providing detailed communication protocols whose details have already been hammered out, with the intention of handling the most common set of needs that will arise in programs. This eliminates the need to design one's own protocol; where the standard protocols will suffice, which is more often than you might think, you can use them as-is. Further, using standards allows programs to be combined in arbitrary ways; as long as two programs support the same protocol, they've taken a big step toward being able to interoperate with each other, so having many programs implementing the same standard protocol is a substantial improvement over having the same programs communicate using one-off techniques. What's more, standard protocols often have standard implementations, so that you won't have to implement the details yourself as you did in the previous project. For example, Python has built-in support for a number of standard Internet protocols, including HTTP (HyperText Transfer Protocol, the protocol that your browser uses to download web pages) among others.

At first blush, HTTP doesn't seem all that important. It appears to be a protocol that will allow you to write programs that download web pages (i.e., that allow you to write programs that play the same role that web browsers do). But it turns out that HTTP is a lot more important than that, since it is the protocol that underlies a wide variety of traffic on the Internet, limited not only to the conversation that browsers have with web servers in order to download a web page, but encompassing the way many applications — including those without any user interface at all — communicate. HTTP underlies a growing variety of program-to-program communications using web protocols, where web sites or other software systems communicate directly with what are broadly called web services, fetching data and also making changes to it. This is why you can post messages to a web site like Facebook using either their web site, a client application on your laptop, or a smartphone app; all of these applications use the same protocol to communicate with Facebook's service, differing only in the form of user interface they provide.

Fortunately, since HTTP support (as well as support for its more secure cousin, HTTPS) is built directly into Python, we can write programs that use these web services without having to handle low-level details of the protocol, though there are some details that you'll need to be familiar with if you want to use the provided implementation effectively. We'll be discussing some of these details in lecture soon, and these will be accompanied by a code example, which will give you some background in the tools you'll need to solve these kinds of problems in Python.

This project gives you the opportunity to explore a small part of the vast sea of possibilities presented by web APIs and web services. You'll likely find that you spend a fair amount of your time in this project understanding the web APIs you'll need — being able to navigate technical documentation, experiment with another system, and gradually build an understanding of it is a vital skill in building real software — and that the amount of code you need might not be as much as you expect when you first read the project write-up. As always, work incrementally rather than trying to work on the entire project all at once; there is partial credit available for a partial solution, as long as the portions that you've finished are stable and correct. When you're done, you'll have taken a valuable step toward being able to build Python programs that interact with web services, which opens up your ability to write programs for yourself that are real and useful.

Additionally, you'll get what might be your first experience with writing classes in Python, which will broaden your ability to write clean, expressive Python programs, a topic we'll continue revisiting and refining throughout the rest of this course. Along with that, you'll learn about why it can be a powerful technique to write multiple, similar classes in a way that leaves them intentionally identical in at least one aspect of how they behave.


The problem

Unless we're planning on spending some period of time indoors in a climate-controlled environment, part of how we'll plan that time involves knowing what kind of weather we expect wherever we plan to be. As a kid growing up in the 1980s, the most up-to-date way to get that kind of information was television or radio; for heavily populated areas, you might have been able to find an automated service that you could call using a (landline) phone. Nowadays, the Internet provides a valuable resource to help us to monitor weather forecasts on demand, obtaining forecasts for anywhere from anywhere. In your work on this project, you'll write a program that can answer a question similar to the following: What is the hottest it will feel near Bren Hall in Irvine, California over the next 48 hours?

To do that, we'll need some information that we won't have at our fingertips; it's not our ambition to build weather sensors and place them in the locations for which we'd like a forecast, after all. But thanks to the ubiquitous Internet of today, we'll be able to obtain and use (free of charge) information that will allow us to answer a question like this without ever leaving the house. What we'll need are two things.

Given the ability to obtain answers to those kinds of questions and use them as input to our program, the rest of the problem is reduced to interpreting that input appropriately and performing the right calculations on it.

Because we're building a program in a problem domain that's new to us, though, we'll need to know some things about it. We don't need to become experts in weather forecasting or the intricacies of geographic algorithms and mapping, but we need to know enough about those things to be able to build what we seek to build. When we build programs, we're in the automation business, but we have to know something about what we're automating, even if we don't have to know everything.


Temperature

Temperature scales

While temperature is a fairly universal concept, you've likely learned before that there are multiple temperature scales that are commonly used to report it. Depending on what problem you're solving or the preferences of the audience to whom you're communicating a result, you might choose one scale or another. When we measure air temperature as part of a weather forecast, there are two common choices, so if we're trying to report weather results to people, we might want to offer either one according to the preferences of those people.

Fortunately, converting from a temperature in one of these scales to the other requires only a straightforward bit of arithmetic, which you may have learned at some point in the past.

The "feels like" temperature

When we check a weather forecast, it's usually because we want to know how to prepare ourselves for being outside in that weather. How will it feel? What kind of clothing should I wear, and what additional clothing should I bring with me? To answer these questions, we'll want to know the air temperature, though it's important to realize that this isn't enough information to answer them properly. At higher temperatures, humidity can make it feel a lot hotter than it might feel otherwise. At lower temperatures, wind can make it feel a lot colder than it might feel otherwise. So, what we really want to know is the combination of these effects: Given the air temperature, humidity, and wind speed, how will it feel to be outside? We often see this reported as a "feels like" temperature.

While there is not a single agreed-upon formula for calculating a "feels like" temperature, we'll need to agree on one for the purposes of this project, so we'll lean on science, which provides two approaches that we can combine to good effect.

Putting these ideas together, the inputs to our formula are as follows.

Given these inputs, you'll perform the following calculation to determine a "feels like" temperature, by summing up a sequence of values determined from these inputs.

When... The "feels like" temperature is... Which is the sum of...
T ≥ 68°F Heat Index -42.379
2.04901523T
10.14333127H
-0.22475541TH
-0.00683783T2
-0.05481717H2
0.00122874T2H
0.00085282TH2
-0.00000199T2H2
T ≤ 50°F and
W > 3
Wind Chill 35.74
0.6215T
-35.75W0.16
0.4275TW0.16
Otherwise Air Temperature T

Finding out more

If you're interested in reading more about these formulas and where they came from, you can follow the links below to articles that describe them in more detail, though these details aren't particularly important for our work here. (This is one of the things you have to decide when you're working in a new problem domain: How much time do you want to spend, especially early on, understanding the broader context in which your work fits? Spend too much time and you won't be able to become productive soon enough. Spend too little time and you won't be able to discuss your work intelligently with the non-technical people who are impacted by your work. So, you gauge your curiosity, apply your time management skills, and make your best decision.)

It's worth noting that you may find other formulas online for solving similar kinds of problems — for example, the question of how a particular combination of temperature, humidity, and wind "feels" is one whose answer has some gray areas — though you'll want to implement the formulas described above regardless of what other research you do or what other formulas you find, so that the output of your program will match what is expected.


Latitudes, longitudes, and geocoding

Before you get too much farther, if you don't about how the latitude and longitude system works — don't feel bad if you don't, but you do need to understand this in order to finish this project! — take a look at the link below:

In paticular, note the limits on allowable latitudes and longitudes, as well as the difference between North and South latitude and between West and East longitude. And note, too, that latitude and longitude, generally, don't work the same way, so once you've understood one, you'll still need to be sure you've wrapped your mind around the other. There aren't a lot of details, but if you haven't thought about them in a while — or if you've never seen them before — it's worth taking a few minutes to get your understanding sorted out before continuing.

What is geocoding?

The word geocoding sounds like some kind of programming technique, but it's actually something else: It's a process for converting the descriptions of places on the Earth into their locations and back again. In other words, it allows us to answer questions such as these.

The first of those questions is what we'd call forward geocoding (i.e., taking the description of a location and turning it into geographic coordinates). The second is what we'd instead call reverse geocoding (i.e., taking geographic coordinates and describing what's there).

Of course, answering questions like these requires an enormous amount of data that we don't have, so it won't be up to us to determine these answers; instead, we'll obtain them online as we need them.


Where will we get our data?

While we'll be implementing some calculations of our own, the most meaningful input to our program will need to be obtained online, which raises the question of where we're going to get the information and how we're going to make sense out of it.

Hourly weather forecasts from the National Weather Service

The National Weather Service is a United States government agency that provides weather-related information, such as historical data, forecasts, and warnings. Among its services is a web API that offers real-time weather forecasts, including the hourly forecasts that we'll be using in our work in this project, which specify a handful of data points that forecast the upcoming weather on an hour-by-hour basis. Forecasts are only available in areas covered by the National Weather Service, which means we won't expect to find any forecasts available outside the United States; consequently, we won't expect our program to be able to answer questions about those locations not covered by the National Weather Service. (Our program is only as good as the APIs it relies on, in other words.)

The National Weather Service API includes documentation that describes its use, so your first step is taking a look through it. Based on what you find there, see if you can construct URLs that allow you to find an hourly forecast as near to 33.64324045°N and 117.84185686276017°W as possible. Don't worry if it takes a little while, but do spend some time working on that problem before you try to reach out to the National Weather Service API from your program; you can't use tools that you don't understand how to use, and you especially don't want to use services belonging to others until you understand them well enough to do so within the boundaries of what they permit.

The National Web Service API is capable of returning information in a variety of formats, but we'll need to agree on what format we'll be using — because, as you'll see in the next section of the write-up, we'll need to know what format your program can handle, so we can test it properly — so we'll need to agree to always ask for an answer in the application/geo+json format. (As of this writing, that format is the default, which will help when you do browser-based experimentation, but it's still best for your program not to rely on that not changing.)

Geocoding via Nominatim's API

Nominatim is a web API that provides geocoding services using an open set of map data called OpenStreetMap. Nominatim requires no API key, though there are still some restrictions on its use, which we'll discuss a little later in this write-up.

Specifically, we'll be interested in using Nominatim for two things:

Nominatim's API has fairly extensive documentation that describes its use, so you'll want to take a look through that to understand the services it provides and how to access them. Similar to how you experimented with the National Weather Service API, see if you can construct URLs that find the answers to the two examples above.

Nominatim's API is capable of returning information in a variety of formats, but we'll need to agree on what format we're using — because, as you'll see in the next section of the write-up, we'll need to know what format your program can handle, so we can test it properly — so we'll need to agree to always pass this query parameter in the URLs given to Nominatim's API, even if there are other options available:

Attribution requirements

Nominatim has an attribution requirement, which is to say that they require us to generate output in our programs that specify that we've taken some of our data from their APIs. Not only for reasons of legality, but also for educational reasons (i.e., learning how to take seriously the legal aspects of the work that we're doing), we'll follow those requirements.

The National Weather Service does not have such a requirement specifically, but we'll nonetheless respect their work enough to print an attribution message crediting them for their data, as well.

Any time our program uses any data from Nominatim or the National Weather Service, we'll say so in our program's output. And we'll only generate that attribution when we actually used one or both of the APIs.

Testing without the APIs

Working on a project that depends on an external API, such as this one, you'll face a couple of challenges that you may not have faced before.

For these reasons, your program will need a way to obtain its information from files stored locally, instead of reaching out to APIs. This will allow you to test your program with known-good data, which you'll mostly want to do, except when you're specifically working on the parts of the program where you're reaching out to the APIs. In the next sections of this write-up, you'll see how we'll make that possible.


The program

Your program will read a sequence of lines of input from the Python shell that configure its behavior, then generate and print some output consistent with that configuration. The general goal of the program is this: Given a target location and weather-related queries, show a description of the closest location for which weather data is available, then display the answer to those weather-related queries for that location. (That's a mouthful, so you'll want to read that sentence a few times; there's a lot going on there. Read further, too, and you'll see an example that will help to clarify.)

The input

The first thing your program does is read several lines of input that describe the job you want it to do. Your program should not print any prompts to the user; it should just blindly read this input, expecting that the user understands how to use the program already.

You can freely assume that the input will match the specification described above; we will not be testing your program on any inputs that don't match the specification.

The interplay between geocoding and weather data

There are two forms of data we gather, either from web APIs or files stored locally, and one thing we'll need to be sure we understand is how they relate to one another.

When the National Web Service API reports an hourly forecast within an area described by a polygon, we'll agree to use the following approach to boil that polygon down to a single location (i.e., a single latitude and longitude).

The output

After reading all of the input, you'd first display the latitude and longitude of the target location, with latitudes and longitudes shown in the following format.


    TARGET 33.64324045/N 117.84185686276017/W

Then, you'd use the information that's either stored in the specified files or downloaded from the specified APIs to determine the results of the weather queries, displaying the result of the queries in the order they were specified in the input. For example, suppose that the input was as follows:


    ​TARGET NOMINATIM Bren Hall, Irvine, CA​
    ​WEATHER NWS​
    ​TEMPERATURE AIR F 12 MAX​
    ​HUMIDITY 24 MIN​
    ​NO MORE QUERIES​
    ​REVERSE NOMINATIM​

This means we're looking up an hourly weather forecast from the National Weather Service as near to Bren Hall at UC Irvine as we can get, using Nominatim to describe the forecast location, then reporting two things:

We might see something like this as a result — though, of course, the results are entirely dependent on the data returned by the APIs.


    1 Sunnyhill, Irvine, CA
    2023-11-07T23:00:00Z 77.0000
    2023-11-07T22:00:00Z 6.0000%

There are a couple of things worth noting in the example above:

After information about all of the locations has been printed, you will wrap up the output by printing attribution messages for any of the sources of data (National Weather Service and/or Nominatim) that was actually used when looking up forward geocoding, reverse geocoding, or real-time weather data. For example, you would only print National Weather Service's attribution if WEATHER API was in the input rather than WEATHER FILE.


    **Forward geocoding data from OpenStreetMap
    **Reverse geocoding data from OpenStreetMap
    **Real-time weather data from National Weather Service, United States Department of Commerce

If all three attributions are to be printed, they should be shown in the order above. If fewer than three are printed, they should be shown in the relative order above (e.g., forwarding geocoding would be shown before reverse geocoding or weather data).

A complete example that uses locally stored data

I recommend that you do the majority of your testing with locally stored data. Testing requires not only running a program, but also knowing what the output is supposed to be; only then can you know whether you've got the correct output. But when you're writing a program that reads data from an API that will quite possibly give you different data every time you call it, it becomes difficult to know what the right answer is.

So, as a first step in this direction, you'll find some example data below. Download these files and store them in the same directory as your program's code.

Once you've finished with your program, you should be able to run the following test and see the results shown below.


    ​TARGET FILE nominatim_center.json​
    ​WEATHER FILE nws_hourly.json​
    ​TEMPERATURE AIR C 24 MAX​
    ​NO MORE QUERIES​
    ​REVERSE FILE nominatim_reverse.json​
    TARGET 33.64324045/N 117.84185686276017/W
    Auburn Aisle, Oxford Court, University Town Center, Irvine, Orange County, California, 92612, United States
    2023-11-06T22:00:00Z 23.8889

What to do in the case of API failure

In this project, we face the problem that our program may be written perfectly, yet still might fail in some circumstances. This is because we're dependent on two APIs sending us the data we need, in the format we expect, without which our program can't generate its output. Yet, the APIs are themselves software, and software fails; our communication with the APIs is done via a computer network, and computer networks fail, too. So, we'll need to account for these possibilities in our design, and also have a mechanism for testing them.

First, we'll need to decide what it means for the APIs to have failed. To do that, we'll attack the problem from the opposite angle: What does success look like?

In any other case, we'll say that our program has failed, and we'll print an alternatively formatted set of output — entirely separate from the normal one — that briefly describes the first failure you encountered.

For example, if your program makes an API request whose response contains the HTTP status code 429, your output would be something like this (albeit with the actual URL that failed):


    FAILED
    429 https://whatever.the.url.that/failed/was?including=its&parameters=please
    NOT 200

Or if your program tried to use the file D:\Examples\Python\nws.json but that file didn't exist, your output would be this instead:


    FAILED
    D:\Examples\Python\nws.json
    MISSING

To be clear, you'll print this alternative output (and only this alternative output) if any of the API requests or usages of files fails; otherwise, you'll follow the requirements above and print output describing the center location and the results of any weather queries.


Design requirements and advice

As with the previous project, you'll be required to design your program using multiple Python modules (i.e., multiple .py files), each encapsulating a different major part of the program. We'll leave you some flexibility in determining where to draw the line between what's in one module and what's in another, but the module that you'd execute to run your program must be named precisely project3.py.

Fetching our data with classes

There are three points in your program where you'll need to fetch data from either an API or a file:

  1. When you use forward geocoding to determine the location of the center of your analysis.
  2. When you need to obtain hourly weather forecasts for a given location.
  3. When you use reverse geocoding to determine the description of where hourly weather forecasts are available.

In each of these three cases, there are two separate ways to solve the problem — one using an API and the other using a file. In each case, you'll be required to implement Python classes, which contain attributes that configure it, if necessary (e.g., the path to a file that should be read), and a method that obtains the data. Classes that obtain the same data must share an interface (i.e., they must have a method with the same name, the same parameters, and the same type of return value), so that you can build objects of these types when you read your program's input, then execute them later without knowing which types of objects they actually are.

This is one key benefit in using classes in Python; we can treat different kinds of objects with similar capabilities the same way, which avoids us having to use if statements to differentiate. We saw an example of this in lecture, when we talked about Duck Typing.

Where should I start?

There are lots of ways to start this project, but your goal, as always, is to find stable ground as often as possible. One problem you know you'll need to solve is generating the final report, so you could begin by generating a portion of it — maybe just some details of the output report that are formatted correctly, even if the data is hard-coded. Now you're on stable ground.

One problem you know you'll need to solve is the problem of calculating a "feels like" temperature, given the air temperature, relative humidity, and wind speed; you might consider continuing with that. You can test this using the Python shell or assert-based tests before proceeding, and then you're on stable ground. Temperature conversions aren't a bad idea as a next step; implement and test those, and now you're on stable ground again.

From there, you might continue by implementing a module that obtains the hourly weather forecasts from the National Weather Service API, perhaps first by implementing the class that reads that data from a file, then later implementing the class that loads it from the web instead. (You'll want the part that reads from a file pretty early on, because you'll want stable data you can test with, instead of receiving different weather data as the weather changes.)

Once you've got these implemented, you might continue with forward and reverse geocoding using Nominatim — again, first by implementing the classes that read this data from a file, then later implementing the classes that load them from Nominatim's API instead.

Now you'd have a lot of pieces in place, and you can start thinking about how to tie them together. At this point, you may feel like you don't have a program yet, but that's not so out of the ordinary when you work on a large project; it's often quite a while before you have something that runs an entire end-to-end process, because you first need to build and test a lot of smaller-scale tools. In that sense, this project is a pretty realistic view into what it takes to build realistic programs that interact with complex sets of inputs and outputs.

But, again, there are lots of sequences that could lead to a good solution, and you'll want to consider how you can achieve partial solutions that nonetheless meet the requirements partially, because partial credit is available for those. Still, if you find a way to approach this that's different than what I've suggested, but that leads you to a complete program that meets the design requirements, that's fine; we don't care what order you implement it in, ultimately, but we're happy to help you find an ordering if you're not sure what to work on next.


Limitations

Third-party libraries

Remember that, as stated in the Project Guide, third-party libraries — libraries that are not part of Python's standard library — are off-limits in your work unless they are explicitly permitted. This includes, for example, code you might find online that communicates with the Nominatim or National Weather Service APIs, or third-party libraries such as requests that are commonly used for HTTP-based communication. The intent here is that you be the one to write that code, because that's one of the learning objectives here.

Respecting the terms of service of the APIs we'll use

The APIs we're using in this project are subject to terms of service, which is to say that there are restrictions around how we're permitted to use them. In particular, we'll need to be cognizant of the following restrictions.

National Weather Service

The National Weather Service API requires that we set a header called User-Agent, as a way of letting them know who is connecting to their API. Web browsers generally set this header in a way that identifies themselves — the name of the browser and its version is commonly sent — but we'll be connecting from a Python program instead. So, we'll set the User-Agent header as follows (including the parentheses):

Nominatim

Nominatim has a rate limit of one request per second, which means that you'll need to be sure that your program "pauses" for one second between subsequent requests. Since there may be multiple requests being made to Nominatim during one run of your program, this is something you'll need to include in your program; if you're making multiple requests, you'll need to "pause" for one second between them.

Nominatim requires that we set a header called Referer, which specifies information about where the request came from. Set the Referer header as follows:

More details on the usage policies governing the National Weather Service and Nominatim APIs can be found at the links below.

If you do not respect these terms of use and your API usage is limited or revoked, we will not be offering an extension on the due date of this project, so you'll need to be respectful of these limitations to be sure that your usage of these APIs will be unimpeded throughout your work in this project cycle.


Deliverables

Gathering your files for submission

We've written automation tools to help us to manage your submissions and report your scores, but these tools require us to know that everyone's submission will be structured the same way. For this reason, we're providing you a tool that can gather your files into a single file whose format we can count on, which you'll then submit to Canvas.

To submit your work, follow these instructions:

  1. Make sure that all of the .py files that make up your program are all in the same directory.
  2. Download the Python script linked below, storing it in the same directory as your program:
  3. Run the Python script your downloaded in the previous step. It will gather all of the .py files in the same directory (except for ones that it intentionally skips), verify that they're readable as text, and will then generate a submission file named project3.zip in the same directory.
    • If there are any issues — files in the wrong format, for example — they'll be reported to you.
    • The files included in the submission will be listed in the script's output; you'll want to read that output to ensure that all of the files you want to be submitted are included. (You might also want to unzip the file somewhere else, just to be sure it contains what you intend; the risk of your submission containing all of the files you want submitted is yours.)
  4. Submit the submission file project3.zip (and only that file) to Canvas.

Note, too, that if you submit separate files, create your own .zip file arranged in your own way, or otherwise don't follow these instructions, we reserve the right to score your project as low as zero. There are no exceptions to this rule.

There are a few additional rules to be aware of.

Can I submit after the deadline?

Yes, it is possible, subject to the late work policy for this course, which is described in the section titled Late work at this link.

What do I do if Canvas adjusts my filename?

Canvas will sometimes modify your filenames when you submit them (e.g., by adding a numbering scheme like -1 or a long sequence of hexadecimal digits to its name). In general, this is fine; as long as the file you submitted has the correct name prior to submission, we'll be able to obtain it with that same name, even if Canvas adjusts it.