ICS 280D, Spring 2000
Probabilistic Models for Time Series and Sequences
Project Instructions
Project Outline
You can think of your project as consisting of several
basic components.
-
Select a data set (see below for some suggestions)
-
Visualize and explore your data, e.g., for a time-series plot it as
a function of time, perhaps calculate the ACF: for discrete-valued sequences
you could estimate the mutual information function to see how much memory
the sequence has, look at histograms of the frequency of individual symbols,
etc.
-
Select a particular task: e.g., prediction, clustering, pattern detection,
visualization
-
Develop or find code to carry out your task. If you use a tool like
MATLAB there are many library toolkits available on the Web to help you.
Be sure to test your code on some simple simulated data to verify that
it works the way you expect it to.
-
Conduct your experiments on your data: be sure to define in advance
precisely what your experiments are, and to carefully reserve some portion
of your data for final testing (not to be used in any development or tweaking
of the algorithm, only to be used for testing at the very end).
-
Evaluate your results: try to be objective and note the successes and
failures of your approach. You should for example compare your results
with some very simple "strawman" alternative, such as simply predicting
the most likely outcome at each time point.
Due Dates
You are required to hand in two reports
-
1-page plan on what you intend to do, due by Wednesday May 17th. You
should include at least a description of the data set and the task you
intend to pursue, even if details of implementation and experiments are
not yet worked out. This can be email or hardcopy.
-
A 5 to 10 page project report, due by Monday June 12th. It should describe
all of the project components above and be written clearly. Figures and
tables are encouraged.
Data Sets
Here are some pointers and suggestions for data sets
you may want to consider: