ICS 32A Fall 2019
Project #3: Jack of All Trades

Due date and time: Monday, November 18, 11:59pm

This project is to be done individually


Background

We saw in the previous project that our Python programs are capable of connecting to the "outside world" around them — to other programs running on the same machine, or even to other programs running on different machines in faraway places. This is a powerful thing for a program to be able to do, because it is no longer limited to taking its input from a user or from a file stored locally; its input is now potentially anything that's accessible via the Internet, making it possible to solve a vast array of new problems and process a much broader collection of information. Once you have the ability to connect your programs to others, a whole new world opens up. Suddenly, the idea that you should be able to write a program that combines, say, Google search queries, the Internet Movie Database, and your favorite social network to find people who like movies similar to the ones you like doesn't seem so far-fetched.

But we also saw that getting programs to share information is tricky, for (at least) two reasons. Firstly, there's a software engineering problem: A protocol has to be designed that both programs can use to have their conversation. Secondly, there's a social problem: If the same person (or group of people) isn't writing both programs, it's necessary for them to agree on the protocol ahead of time, then to implement it. This second problem has a potentially catastrophic effect on our ability to make things work — how could you ever convince Google to agree to use your protocol just to communicate with you?

In practice, both of these problems are largely solved by the presence of standards, such as those defined by the World Wide Web Consortium and the Internet Engineering Task Force. Standards help by providing detailed communication protocols whose details have already been hammered out, with the intention of handling the most common set of needs that will arise in programs. This eliminates the need to design one's own protocol (where the standard protocols will suffice, which is more often than you might think) and allows programs to be combined in arbitrary ways; as long as they support the protocol, they've taken a big step toward being able to interoperate with each other. What's more, standard protocols often have standard implementations, so that you won't have to code up the details yourself as you did in the previous project. For example, Python has built-in support for a number of standard Internet protocols, including HTTP (HyperText Transfer Protocol, the protocol that your browser uses to download web pages) among others.

At first blush, HTTP doesn't seem all that important. It appears to be a protocol that will allow you to write programs that download web pages (i.e., that allow you to write programs that play the same role that web browsers do). But it turns out that HTTP is a lot more important than that, since it is the protocol that underlies a much wider variety of traffic on the Internet than you might first imagine. This is not limited only to the conversation that your browser has with a web server in order to download a web page, though that conversation most often uses HTTP (or its more secure variant, HTTPS). HTTP also underlies a growing variety of program-to-program communications using web protocols, where web sites or other software systems communicate directly with what are broadly called web services, fetching data and also making changes to it. This is why you can post tweets to Twitter using either their web site, a client application on your laptop, or a smartphone app; all of these applications use the same protocol to communicate with the Twitter service, differing only in the form of user interface they provide.

Fortunately, since HTTP support is built directly into Python, we can write programs that use these web services without having to handle low-level details of the protocol, though there are some details that you'll need to be familiar with if you want to use the provided implementation effectively. We'll be discussing some of these details in lecture soon, and these will be accompanied by a code example, which will give you some background in the tools you'll need to solve these kinds of problems in Python.

This project gives you the opportunity to explore a small part of the vast sea of possibilities presented by web APIs and web services. You'll likely find that you spend a fair amount of your time in this project understanding the web API you'll need — being able to navigate technical documentation and gradually build an understanding of another system is a vital skill in building real software — and that the amount of code you need might not be as much as you expect when you first read the project write-up. As always, work incrementally rather than trying to work on the entire project all at once; there is partial credit available for a partial solution, as long as the portions that you've finished are stable and correct. When you're done, you'll have taken a valuable step toward being able to build Python programs that interact with web services, which opens up your ability to write programs for yourself that are real and useful.

Additionally, you'll get what might be your first experience with writing classes in Python, which will broaden your ability to write clean, expressive Python programs, a topic we'll continue revisiting and refining throughout the rest of this course. Along with that, you'll learn about why it can be a powerful technique to write multiple, similar classes in a way that leaves them intentionally identical in at least one aspect of how they behave.


Reminder: Do not select a partner

Unlike the previous projects, which offered you the option of using the pair programming technique, this project requires that you work individually. So you will not be selecting a partner and you will not be doing pair programming this time; each student is responsible for his or her own submission for this project. While we do believe that pair programming offers a lot of benefits, you'll also need to build your skills at working on your own, as future coursework (and possibly future employment) will depend on them.

Reusing your own work from a previous quarter

Unlike on work done with a partner, you are permitted to reuse your own work if you took this course previously. We recommend doing the projects again from a clean slate, anyway, because there are likely some significant things that you didn't understand if you've failed this course previously. Still, it is not plagiarism to use code that was solely yours to begin with.


Overview of the program

In your work on this project, you will write a program that is capable of using information gathered from a web service to obtain past prices of stocks, calculate some metrics summarizing those prices, and ultimately report on potentially opportune times to buy or sell the stock, based on one of a few automated buying-and-selling strategies.

Some background on the buying and selling of stocks

It's quite possible that you know relatively little about investing, so this section offers the necessary background that you'll need for this project. Like many programs that you write to automate some kind of real-world activity, you need to know enough about that activity in order to understand the problem you're trying to solve; this situation is no different.

First of all, stock represents ownership in a business entity. A given business entity, such as Apple, divides its overall ownership stake into shares of stock. For our purposes, you can think of each share as being a small, equal percentage of ownership of the company, though the arrangement can be more complex than this; the more shares you own, the greater percentage of the company you own. In the case of Apple, as of this writing, there are around 4.8 billion shares that are publicly traded (i.e., individuals like you and I can buy them, provided that we have the money and that someone else is willing to sell them to us). On the day I wrote this, just about anyone could have bought one of those shares for around $212, which would have given that buyer ownership of roughly 1/4,800,000,000th of the company. (It should be noted that not nearly all companies in the world are publicly traded, but we'll focus our attention in this project on those that are.)

Share prices in publicly-traded companies are determined via a stock exchange, where interested buyers and sellers meet (electronically, most of the time) and negotiate prices. Buyers specify what they're willing to pay, sellers specify what they're willing to accept, and when there is a match between someone willing to buy for a certain amount and someone else willing to sell for that same amount, a trade takes place; the buyer exchanges money in return for shares. Prices can be somewhat volatile; using the example of Apple's trading on the day I originally wrote this in October 2018, the price of one share ranged between $206 and $220 at different times during the day, with a total of around 45,000,000 shares changing hands on that day. (And just to show you how volatile those prices can be, that same share traded for as low as $150 within the previous year and about $235 at one point earlier that month; they were less than $150 again by January 2019.)

Stocks that trade on stock exchanges generally have a symbol (sometimes called a ticker symbol) associated with them, which is a shorthand name that is used to uniquely identify a company on that exchange. In the United States, symbols are generally a sequence of uppercase letters; in some parts of the world, digits are also common. For example, the symbol on U.S. stock exchanges for Apple is AAPL, while Microsoft's is MSFT, Johnson & Johnson's is JNJ, and Verizon's is VZ.

There are vast databases of every trade that was made in every stock every day: when the trade was completed, how many shares traded, the symbol identifying the stock that was traded, and at what price. This is an overwhelming amount of data, and is not generally available online for free in its full detail, since the data has so much potential value. Given such a database, one can develop automatic strategies for rapidly buying and selling and test those strategies against massive collections of real-world pricing data, which can lead to shockingly effective ways to both make and lose money. (An error in one firm's automated trading software in 2012 cost it in excess of $400 million in one day, leading to that firm being acquired for pennies on the dollar by a competitor to rescue it from an imminent bankruptcy.)

However, even though we may not be able to easily access an itemized list of every individual trade from every day, it is possible to find summary data at little or no cost, provided that we plan to use it for non-commercial purposes. For example, one thing we can obtain are daily summaries of trade activity, including these data points:

Given this information, we can start to form an understanding of the overall price action, even if we don't have an explicit description of every individual trade.

Trading does not take place every day. Generally speaking, there are no trades on weekends or holidays, nor during nighttime hours where the stock exchange is located — though this is gradually changing as years go by. Your program will not need to do anything special to handle that; it will simply determine from historical data which days included trading and which didn't, then ignore the days on which there were no trades.

What will your program do?

Generally, your program will perform the following tasks.

There is more to be said about the details of each of these requirements, as there will be automation used in grading this project, but we'll need to begin with an understanding of some of the background concepts.

Indicators

The core of our analysis will be comparing daily price and volume data against the values of indicators. There are three kinds of indicators we'll use:

Generally, you can think of indicators as aggregating price information for multiple days together into single values that broadly describe what happened across those days. These can then form the basis of decisions about buying and selling.

Signal strategies

The main goal of the analysis is to generate buy signals and sell signals, which are recommendations to buy or sell stock at the conclusion of a particular day.

There are multiple signal strategies that we'll implement, each corresponding to one the indicators above:

In general, any signal — buy or sell — can occur on any day; there is no general limitation that, for example, buys must precede sells or that consecutive signals can't be the same (e.g., two consecutive buys or three consecutive sells), or even that a buy and sell signal can't occur on the same day. There is also not guaranteed to be a signal every day; in fact, in most configurations, there will be relatively few signals generated.


Where to get historical data about stock trades

There are several places online where you can obtain historical data about stocks and trading. One such service, which we'll be using this quarter, is called Alpha Vantage. Alpha Vantage provides a web API, similar to the YouTube Data API we saw in a code example in lecture, that allows you to obtain the information you'll need via Secure HTTP.

Testing out the Alpha Vantage API

To begin to see how the Alpha Vantage API works, try visiting the following link. There's no need to modify it; use it as-is.

If successful, you should receive a result that looks roughly like this (though you may get different data than this, since the information is time sensitive).

{
    "Global Quote": {
        "01. symbol": "MSFT",
        "02. open": "143.8400",
        "03. high": "144.8800",
        "04. low": "143.7700",
        "05. price": "144.2600",
        "06. volume": "17743119",
        "07. latest trading day": "2019-11-07",
        "08. previous close": "144.0600",
        "09. change": "0.2000",
        "10. change percent": "0.1388%"
    }
}

You may recognize from lecture that this format is JSON (JavaScript Object Notation), which is a common format of information that is exchanged with web API's like this one. (Note, too, that some browsers — such as some recent versions of Firefox — present data in JSON format differently, but there's usually a way to see the "raw data," which would look like what you see above.)

Take a quick look through the data. Don't worry if you don't understand every detail, but start to get a rough sense of what kind of information is available and how it's organized. When you want to know the details, the API documentation will explain the details you need, and you'll find that you can discover a lot of the details through additional experimentation. But it's important that you allow yourself to build an understanding gradually; this is not something you'll necessarily be able to figure out right away, but a lot of the information won't turn out to be relevant in this project, anyway. One characteristic that distinguishes real-world work from the often-sanitized kinds of projects you do in courses like this is the need to find small nuggets of information you need amongst large amounts of documentation that is largely irrelevant to the problem you want to solve; I want you to start building those skills (and alleviate your fears about this kind of thing) now, so you can start working on your own programs that are more "real" and, thus, more exciting.

Obtaining an API key

One of the conditions of using the Alpha Vantage API is that you have something called an API key. Your API key identifies you distinctly from other people who use their API, and provides you with whatever access you've arranged for. In our case, we'll be obtaining a free API key, whose only cost is that we provide a legitimate email address (one that belongs to you and whose messages you can access). Click the link below, then fill out the small form, and click the "GET FREE API KEY" button underneath it; you may also need to check your email to complete your authorization.

Once you've obtained your API key, store a copy of it in a safe place where you won't lose it (e.g., in a file in a directory that you back up regularly).

Note that a free API key for Alpha Vantage does come with some fairly significant limitations, most notably that you will only be able to run five API requests per minute and no more than 500 per day. This may affect how you test Your program — you'll need to be judicious about how often you run it, for example — and will, of course, affect how we test it during grading. For this reason, we'll need each of you to provide us with your API key for our use in testing your program; more details about that later.

This does bring up an interesting question: Why does Alpha Vantage provide this free service? As it turns out, they're actually in the business of charging money for access to this same data; a "premium" API key, which costs money, will allow you to make more frequent requests, which would allow you (for example) to build an application that others could use. So, by providing a free API key, they're allowing people who might want to purchase one the ability to see whether the data suits their needs.

Respecting Alpha Vantage's license

Being the owner of the service, Alpha Vantage has a license that describes the conditions under which you're permitted to use it. For the curious, the license (a "Terms of Service" document) for the Alpha Vantage API is available at the link below. If you've never looked at the license for a software product, take a minute or two to see what one looks like; when you build software that depends on other software, what you can and can't do with the other software will generally be described in a license like this one.

Don't feel like you need to read the whole thing in detail, but at least spend a little time getting familiar with what a license like this looks like. If you want to work in technology, it won't be the last one of these you'll see. I'm certainly not a lawyer, but I do know how to skim through a license to see whether there are obvious red flags that suggest that I won't be able to use a product in the way I plan to. And for the parts I'm less sure about, I can seek legal counsel — and sometimes my employers will require (and provide) legal oversight, but it helps if I understand the basics of the license first, so I know what questions to ask.

In general, what we mainly care about with respect to Alpha Vantage's license is that they are limiting us to five requests per minute, so we'll respect this by making sure that (a) one execution of our programs never make more than five requests, and (b) we wait one minute between runs of our programs. Beyond that, the exploratory and educational nature of what we're doing makes most of the rest of the license less important — of course we aren't going to sue them, for example, if the data turns out to be incomplete or incorrect.

Skimming the documentation

The Alpha Vantage API is described in detail at the link below. You certainly won't need to read all of the documentation to start with, but you'll want to take a look around and familiarize yourself with what kinds of information the API provides, because part of your goal in this project is to decide what parts of the API you'll need to solve your problem.

Note, too, that alongside the documentation for each of the requests supported by the Alpha Vantage API are one or more examples links that you can click to see an example of what the output would look like. The example links use a special API key, demo, which appears to be limited only to the examples provided. But this can be a handy way of getting a sense for which requests you might need to make.


Details of your program's input and output

As in at least one previous project, the intent here is not to write a user-friendly user interface; what you're actually doing is building a program that we can test automatically, so it's vital that your program reads inputs and writes outputs precisely as specified here.

Generally, your program will read a set of input without prompting the user in any way (i.e., without printing anything that tells the user what they should be entering). After that, your program will generate a report and print it.

The input

Your program will read its input from the Python shell in the following format. You can freely assume that the input will always be given in precisely this format; we will not be testing your program on any inputs that don't match this specification.

The report

Your report will be printed out in a tab-delimited format, which is to say that you'll write lines of output on which there are pieces of information separated by tab characters. For example, if you wanted to print a line with three pieces of information A, B, and C, you'd print a line containing (in Python terms) 'A\tB\tC'. This format has a couple of advantages for us:

The format is described in the form of tables below, though you'll write this in a tab-delimited form instead, with one tab character in between the cells on each line (and you'll need to include the tab even if a cell is empty) and a newline on the end of each line.

The report begins with what is called a header, which specifies some basic information about the stock that we're analyzing.

The stock's symbol, such as AAPL
The number of trading days for which the analysis was done
(i.e., the number of actual days the stock traded between the start and end dates, inclusive)
The indicator and signal strategy used
(in the same format as the fifth line of input)

After that, for each day on which there was trading in that stock — based on the information returned to you by the Alpha Vantage API — you would print one line of information containing these fields. (The lines would be sorted in ascending order of the date, i.e., earlier days shown before later ones.) For readability, you'll first print a line that is a header for that table, giving a name to each of those columns. That line should look exactly like this.

Date Open High Low Close Volume Indicator Buy? Sell?

Underneath that, for each day, you would show the following values in those columns.

Some formatting rules apply to the values in this table:

Your decimal values can differ from ours by 1/10000th (e.g., if we expect 17.8046, we would also accept 17.8045 or 17.8047). This will leave room for the minor rounding errors that can occur when you use floating-point numbers, which are beyond the scope of our work here.

An example input and report

An example input and report for the program follows. Note that you will not necessarily get the same answer as I'm showing below if you run the program, mainly because the results are time sensitive (i.e., when you run a one-month report, you're asking for the most recent one month of data; if you do that again next week, you'll be getting different data). But if the prices were the ones I list below, the rest of the calculations are correct.

Suppose that the input to the program was as follows, and suppose that my API key was stored in the file D:\Examples\32A\AlphaVantage\apikey.txt on my hard drive.

D:\Examples\32A\AlphaVantage\apikey.txt
BOO
2018-10-01
2018-10-31
MP 10

What we're doing is an analysis of the last month of a stock with symbol BOO, using the 10-day moving average of closing prices to generate buy and sell signals.

The report would look like this. (Again, data that would come from the Alpha Vantage API is hypothetical, but the indicators and signals are correct given that data.) Note that I'm simulating tab characters with spaces; what you'll want are tabs between the fields, as specified. (Note, too, that your report may look like it's spaced differently — or even unevenly — since tabs aren't always printed as the same number of spaces. But the fields on each line should be separated by a tab character, even if that means the table appears to be formatted poorly.)

BOO
23
MP 10
Date    Open    High    Low     Close   Volume  Indicator      Buy?    Sell?
2018-10-01     8.0000  9.5000  7.7500  9.0000  5000
2018-10-02     8.9000  9.2000  8.8000  9.2000  7500
2018-10-03     9.1500  9.2500  8.6000  8.7500  10000
2018-10-04     8.7000  8.7200  8.4100  8.4500  8000
2018-10-05     8.5000  8.5000  8.2700  8.2700  9000
2018-10-08     8.3000  8.6100  8.2400  8.4000  6500
2018-10-09     8.3500  8.3800  8.2000  8.2900  4100
2018-10-10     8.2800  8.3000  8.1800  8.2000  3800
2018-10-11     8.2100  8.3500  8.1800  8.2500  5200
2018-10-12     8.2500  8.3100  8.0900  8.1000  7000    8.4910
2018-10-15     8.1500  8.3500  8.0800  8.3200  8200    8.4230
2018-10-16     8.3500  8.6200  8.3000  8.6000  9500    8.3630  BUY
2018-10-17     8.5000  8.6500  8.4900  8.5500  3700    8.3430
2018-10-18     8.5100  8.5300  8.2800  8.2900  11400   8.3270          SELL
2018-10-19     8.2700  8.3600  8.2400  8.3000  8100    8.3300
2018-10-22     8.2800  8.3100  8.1600  8.1900  9900    8.3090
2018-10-23     8.2000  8.4300  8.2000  8.4100  7600    8.3210  BUY
2018-10-24     8.3800  8.4200  8.2900  8.3000  4300    8.3310          SELL
2018-10-25     8.2900  8.5500  8.2600  8.5500  6800    8.3610  BUY
2018-10-26     8.6000  9.1400  8.5600  9.1400  10600   8.4650
2018-10-29     9.2100  9.5500  9.1500  9.5000  11100   8.5830
2018-10-30     9.6000  9.9900  9.5000  9.9800  13000   8.7210
2018-10-31     9.9600  9.9800  9.8500  9.9000  4900    8.8560

An example implementation

If you'd like to experiment with a completely implemented example of this program, which demonstrates the output you are required to generate based on the input your program receives, a version is available at the link below, which you can run directly within your browser.

You'll want to spend some time with this early on, as you work on understanding how the program is supposed to work, and then along the way, as you compare your own output to what you get back from the example implementation.

Note that you'll need your API key before you can run the example implementation, so obtaining that API key is a very good place to start your work on this project; almost everything else depends on it.

Sanity-checking your output

We are also providing a tool that you can use to sanity-check whether you've followed some of the basic requirements above. It will only give you a "passing" result in these circumstances:

Note that the sanity checker is not checking whether the output is actually correct; it's checking only the format of the output. Still, your program won't be able to pass our automated tests unless it can pass the sanity checker, so best to be sure it can before you proceed with submitting your project.

Running the sanity checker is simple. First, download the Python module linked below:

Put that file into the same directory as your project3.py file. Running the project3_sanitycheck.py module — for example, by loading it in IDLE and pressing F5 (or selecting Run Module from the Run menu) — will run the sanity checker and report a result, which will be printed to the Python shell.

What you can obtain from the Alpha Vantage API

You can obtain any data you'd like from the Alpha Vantage API, but there are a couple of important limitations to be aware of.

Obtaining partial credit for a partial solution

When we grade your project, we will be focusing not only on looking for a perfect solution, but also for a partial one. For example, if you implement one of the indicators and one of the signal strategies, but not the others, we'll aim to offer you partial credit.

What this means, though, is that we need to agree on the basic criteria for offering that partial credit. This means that even if your program can't handle certain calculations, you'll still want to organize your report in the appropriate format.

So, generally, what you'll want to be doing is printing the information you have in the correct places, even if you don't have all of the information you need. That way, our test automation tools will be able to allocate partial credit appropriately and fairly. A program that is partially complete but is unable to generate any output will be graded a lot less favorably than one capable of printing some of the output but putting it in the right place, so you'll want to keep that in mind as you work incrementally.


Design requirements and advice

As with the previous project, you'll be required to design your program using multiple Python modules (i.e., multiple .py files), each encapsulating a different major part of the program. The following would be a good way to break this problem down into its component parts.

Indicators and signal strategies as classes

Each of your indicators and each of your signal strategies is required to be implemented as a Python class, which contains attributes that configure it (e.g., for the simple-moving-average-based strategy, an attribute would store the number of days to use in the average calculation), and a method that executes the strategy against a collection of price information.

All of your indicator classes must have a method with the same signature (i.e., the same name, the same parameters, and the same type of return value), so that your user interface could call into it without depending on which kind of indicator object it is. The same must be true of your signal strategy classes.

(This is one key benefit in using classes in Python; we can treat different kinds of objects with similar capabilities the same way, which avoids us having to use if statements to differentiate. We saw an example of this in lecture, when we talked about duck typing.)

How do I deal with dates?

There are a few places where the program will require you to work with dates. Dates are suprisingly complicated: Different months have different numbers of days, the month of February has different numbers of days depending on the year, and so on.

You might consider using the datetime module in the Python Standard Library to help you manage these kinds of details. More information about that module can be found at the link below:

Where should I start?

There are lots of ways to start this project, but your goal, as always, is to find stable ground as often as possible. One problem you know you'll need to solve is generating the final report, so you could begin by generating a portion of it — maybe just the header and each day's price, but formatted correctly, since this can lead to partial credit. Now you're on stable ground.

One problem you know you'll need to solve is the problem of calculating a list of indicator values, given a list of price information; you might consider continuing with that. You can test this from the Python shell before proceeding, and then you're on stable ground. Continue with another indicator. Now you're on stable ground again.

From there, you might continue by implementing a module that gives you a set of test price information without reaching out to the Alpha Vantage API. You won't want to submit this one, but it will be useful in allowing you to continue your work without handling the downloading of quotes from the API right away. This might allow you to continue by implementing and testing your signal strategies. It might also help you iterate on your testing more quickly, even though there's a limit of five API requests per minute when you actually call Alpha Vantage's API.

Once you've got these implemented, you might continue with the downloading feature, and, finally, the entire report. But, again, there are lots of sequences that could lead to a good solution, and you'll want to consider how you can achieve partial solutions that nonetheless meet the requirements partially, because partial credit is available for those.

Limitations

Remember that third-party libraries — libraries that are not part of the Python Standard Library — are off-limits in your work unless they are explicitly permitted. This includes, for example, code you might find online that communicates with the Alpha Vantage API in Python; the intent here is that you be the one to write that code, because that's one of the learning objectives here.


A note about the signal strategies presented here

The strategies used here to determine whether to buy or sell stocks are chosen more for their ability to be implemented using skills you have, as opposed to being designed as sound investing strategies. It is not uncommon for programs to use historical data — often over multiple time frames, in addition to daily — to make buying and selling decisions in place of people, and the things we're doing here form a very loose basis for how some of those decisions get made, but the algorithms for making these decisions are generally more complex and tend to vary over time; what works today may no longer work a year from now when people's behavior, as well as the behavior of other automated buying-and-selling programs, may have changed.


Deliverables

Put your name and student ID in a comment at the top of each of your .py files, then submit all of the files to Checkmate. Take a moment to be sure that you've submitted all of your files. Note, too, that the "executable" module (i.e., the one you execute when you want to run your whole program) must be named project3.py.

You'll also need to submit a file called apikey.txt that contains your Alpha Vantage API key. We'll need that in order to test your program (i.e., we'll be testing each student's program with that student's own API key).

Do not submit the sanity checker. That's not part of your project; that's a tool we've provided to help you complete it.

Follow this link for a discussion of how to submit your project via Checkmate. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the project that you want graded. We won't regrade a project simply because you submitted the wrong version accidentally.

Can I submit after the deadline?

Yes, it is possible, subject to the late work policy for this course, which is described in the section titled Late work at this link.