Additional Reading:
ICS 278: Data Mining
Spring 2006
Below I will provide links to papers and Web pages that complement
the
material presented in class. Additional links will be added as the
quarter
progresses.
- ·
Topic
Models for Text Data:
- M.
Steyvers and T. Griffiths, Probabilistic
topic models, In T. Landauer, D McNamara, S. Dennis, and W. Kintsch
(eds), Latent Semantic Analysis: A Road to Meaning. Laurence
Erlbaum, in press.
- D.
Blei, M. Jordan, A. Ng, Latent
Dirichlet allocation, JMLR, 2003
- M.
Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, The
author-topic model for authors and documents, Proceedings of the 20th
Conference on Uncertainty in AI, 2004.
- Web page for the
author-topic model, including an online JAVA browser for the NIPS
and CiteSeer data sets.
- Papers
from the Helsinki Institute of Information Technology ALVIS
project for topic models and Web search
- ·
AskMSR
question-answering system:
- ·
Credit
Scoring:
-
Links
to articles on
naïve Bayes and spam email:
·
Popfile:
a popular open-source email
filtering program using naïve Bayes classification
·
Paul
Graham’s widely-cited article, A plan for spam
·
Upcoming
first academic conference on email and
anti-spam
·
Technical
paper
on
spam email classification (comparing naïve Bayes with memory-based
classification methods)
·
SpamBayes:
another open source
filtering program using Bayes rule
·
Mozilla page on
spam email
(again using naïve Bayes classification)
·
SpamAssassin: a
rule-based spam
filtering approach