The Author-Topic Model

The author-topic model is a generative model for authors and documents that reduces the generation of documents to a simple series of probabilistic steps. Each author is associated with a topics mixture and the choice of words of a collaborative paper is assumed to be the result of a mixture of the authors' topics mixtures. The model is applied to a collection of 1.7K NIPS conference papers and 160K CiteSeer abstracts.

This webpage contains an online query interface to the model that allows interactive exploration of queries such as the query what topics does a given author write about and other fun applications.

Most of the data currently presented in this webpage is extracted from a single MCMC sample. One solution of 300 topics from the CiteSeer dataset and one solution of 100 topics from the NIPs dataset (these two samples are available for queries at the browser).

The Author Topic Browser

You might need to install Java software for the desktop to run the browser. It is available on-line, to download click ->


*       User Help

*      Query Examples

*      The Data Sets: CiteSeer, NIPs 

*      100 Topics – one sample result from the NIPs

*      300 Topics – one sample result from the CiteSeer

*      The On-line Gibbs sampler



Applications of the Author Topic Model to the CiteSeer:

*      Topic Trends Over Time

*      Assigning Topics and Authors to New Documents

*      Scoring Papers for Authors (Detecting the Most Surprising Papers for an Author)




Finding Scientific Topics.
T. Griffiths and M. Steyvers (2004).
Proceedings of the National Academy of Sciences


This is a joint reaserch project by Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths
Graduate Student: Chaitanya Chemudugunta
Programmers: Amnon Meyers, Momo Alhazzazi
Funding: This material is based upon work supported by the National Science
Foundation under Grant No. IIS-0083489 and by the Knowledge
Discovery and Dissemination (KD-D) Program. NSF KDD project.
Any opinions, findings and conclusions or recomendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation (NSF).

We would like to thank Steve Lawrence and C. Lee Giles for kindly providing us with the CiteSeer data used.

Last Updated: 2005-03-02   for comments and questions contact Michal Rosen-Zvi, email: michal at ics dot uci dot edu