Course
- Prerequisites - Textbook
- Grading - Schedule - Other
ICS 277B SPECIAL TOPICS IN INFORMATION AND
COMPUTER SCIENCE:
PROBABILISTIC MODELING OF BIOLOGICAL DATA
Course Goals and
Description
This is a graduate level course on probabilistic modeling of biological data. The
course covers computational approaches to understanding and predicting the
structure, function, interactions, and evolution of DNA, RNA, proteins, and
related molecules and processes. The emphasis is on providing a unified
Bayesian statistical framework to mine large noisy data sets that are
becoming the hallmark of modern biology. The methods taught focus on
developing the structure of the models, on model fitting algorithms (machine
learning), and on the application of the resulting models (data mining). Most applications will revolve around DNA,
RNA, protein sequence, and gene-expression-array data, but other types of data will
also be considered depending on participants interests.
The official catalog description is:
ICS 277B: Probabilistic Modeling of Biological Data. A unified Bayesian probabilistic
framework for modeling and mining biological data. Applications range from sequence (DNA,
RNA, proteins) to gene expression data. Graphical models, Markov models, stochastic
grammars, neural networks, structure prediction, gene finding, evolution, DNA arrays single and multiple
gene analysis.
Course -
Prerequisites - Textbook - Grading
- Schedule - Other
Prerequisites
A basic course in algorithms (ICS 161 or equivalent) and in molecular biology (Bio Sci
99 or equivalent), or ICS 277A (or equivalent), or consent of instructor. Course assumes
some background in biology, and basic knowledge of probability, statistics, and
programming.
Course - Prerequisites
- Textbook - Grading - Schedule
- Other
Textbooks
Bioinformatics: the Machine Learning Approach
Pierre Baldi and Soren Brunak, Second Edition, 2001, (MIT
Press)
DNA Microarrays and Gene Regulation: From Experiments to
Data Analysis and Modeling
Pierre Baldi
and G. Wesley Hatfield, 2002, (Cambridge University Press)
Course
- Prerequisites
- Textbook - Grading - Schedule
- Other
Grading
Students will read articles from the literature. Grading will be based on
participation in class discussions, presentations, and possibly a final project requiring a computational
analysis of biological data, which will result in a brief (5--10 pages) conference-style
written report. Additional assignments can include homeworks.
Course
- Prerequisites
- Textbook - Grading -
Schedule - Other
Tentative Schedule
N.B.: Schedule may change to follow class interest,
schedule outside speakers, etc.
| Week 1: Introduction to Bioinformatics. Probabilistic Modeling: the Bayesian
Statistical Framework. |
| Week 2: Graphical Models. Simple Markov models of Biological Sequences
(HMMs). |
| Week 3: Hidden Markov Models of
Biological Sequences. |
| Week 4: HMMs, Probabilistic Models of Genes, and Gene Finding Algorithms. Probabilistic
Models of Genes and Gene Finding Algorithms. |
| Week 5: Probabilistic Models of Evolution and Phylogenetic Trees.Stochastic Grammars
and Languages. |
| Week 6: Stochastic Context Free Grammars and RNA Secondary Structure. Beyond Context
Free Grammars. |
| Week 7: Probabilistic Modeling and Neural Networks. Machine Learning Approaches for
Protein Structure Prediction. |
| Week 8: Machine Learning Approaches for Other Problems (Signal Peptides, etc).
DNAl Microarray Data and Gene Regulation |
| Week 9: Probabilistic Modeling of DNA MicroArrays: Single-Gene Level. Probabilistic
Modeling of DNA MicroArrays: Multiple-Gene Level. Gene and Protein Networks.
Systems Biology. |
| Week 10: Project Presentations. |
Relation to Other
Courses
This course is intended to complement the existing ``hands-on'' computer based courses
Biological Sciences 123/223 (Computer Applications in Molecular Biology/Computational
Molecular Biology), which give a very practical introduction to using computer tools in
molecular biology. In contrast, this course emphasizes the development of probabilistic
models and machine learning approaches for the analysis of biological data. This course is
also intended to closely complement the existing ICS course``Representations and
Algorithms for Molecular Biology'' (currently ICS-277 and scheduled to become ICS-277A).
In contrast, this course emphasizes modeling and analysis of biological data using a
probabilistic framework. The probabilistic approach is essential to account for biological
variability brought about by evolutionary tinkering. The course can be viewed as data
mining, machine learning, and probabilistic algorithms, concentrated on biological data
sets, especially sequence data, but also including other data sets, such as gene
expression data, depending on student interest.
There is essentially no overlap between this course and ICS 246, as well as ICS 248.
There is a small overlap with ICS 275B and with 283. The overlap with 275B is in the use
of graphical models. Not all the graphical models used in 277B, however, are Bayesian
networks. Furthermore, the Bayesian networks used in 277B are very specialized and come
with their own algorithms (forward-backward, inside outside) etc. There is also a small
overlap with ICS 273 (machine learning) but the approach in 277B is more probabilistic
and, once more, focused exclusively on biological problems. ICS 277B could benefit
students who have taken ICS 275B and/or ICS 273 by deepening their understanding of
graphical model/machine learning concepts and letting them apply systematically to
problems in biology.
Finally, ICS 277B complements a course such as 223 (Molecular Biology and Biochemistry)
by focusing on the application of computational methods to the solution of biological
problems.
This course is part of the new ICS concentration: Informatics in Biology
and Medicine.
Course
- Prerequisites
- Textbook - Grading - Schedule
- Other