David Newman >> Moved to Google Los Angeles




CS 277: Data Mining (Spring 2013)

NEW! Newman awarded NSF grant to model effectiveness and efficiency of biomedical research funding
NEW! Newman awarded NSF grant to analyze CBET research portfolio
Topic Modeling Tool

> Interests
Machine Learning, Topic Modeling, Text Mining
Center for Machine Learning and Intelligent Systems

My research focuses on theory and application of topic models and related text mining and machine learning techniques. My work is marked by a commitment to combining theoretical advances with practical applications in ways that widen access and use for individuals and communities, and improve the way people find, discover, analyze and understand information.

> Education
PhD, Princeton University
> Publications
2012 Newman, Koilada, Lau, Baldwin (2012). Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction. COLING 2012. [pdf]

Lau, Baldwin, Newman (2012). On Collocations and Topic Models. Forthcoming, ACM Transactions on Speech and Language Processing special issue on Multiword Expressions.

Newman, Boyd-Graber, Mimno, Wallach (2012). Topic Models: Problems, Diagnostics and Improvements. Forthcoming, (book chapter).

Cetindil, Esmaelnezhad, Newman, Li (2012). Analysis of Instant Search Query Logs. WebDB 2012. [pdf]

Newman, Balagopalan, Hagedorn, Noh (2012). Learning Topics and Related Passages in Books. JCDL 2012. [pdf]

Lau, Cook, McCarthy, Newman, Baldwin (2012). Word Sense Induction for Novel Sense Detection. EACL 2012. [pdf]

N.Newman, Porter, Newman, Bolan, Courseault (2012). Comparing Methods to Extract Technical Content for Technological Intelligence. PICMET 2012. [link]

Porter, Newman, N.Newman (2012). Text Mining to identify topical emergence: Case Study on Management of Technology. 2012 STI Conference. [doc]

2011 Newman, Bonilla, Buntine (2011). Improving Topic Coherence with Regularized Topic Models. In NIPS 2011. [pdf]

Talley, Newman, Mimno, Herr, Wallach, Burns, Leenders, McCallum (2011). Database of NIH grants using machine-learned categories and graphical clustering. Nature Methods. [link][Map]

Blume-Kohout and Newman (2011). Productivity Differences Across Topics in Federal R&D Funding. Atlanta Conference on Science and Innovation Policy.[link]

Lau, Greiser, Newman, Baldwin (2011). Automatic Labelling of Topic Models. ACL-HLT 2011. [pdf]

Boyak, Newman, Duhon, Ma, Biberstine, Skupin, Schijvenaars, Klavans, Borner (2011). Clustering More Than Two Million Biomedical Publications. PLoS ONE. [link]

Noh, Hagedorn, Newman (2011). Are Learned Topics More Useful Than Subject Headings? JCDL 2011. [link]

Hagedorn, Kargela, Noh, Newman (2011). A New Way to Find: Testing the Use of Clustering Topics in Digital Libraries In D-Lib Magazine, Sept/Oct 2011.

Gretarsson, O'Donovan, Bostandjiev, Hollerer, Asuncion, Newman, Smyth (2011). TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling. ACM TIST. [link]

Block and Newman (2011). What, Where, When and Sometimes Why: Data Mining Twenty Years of Women's History Abstracts. Journal of Women's History. [link]

2010 Newman, Lau, Grieser, Baldwin (2010). Automatic Evaluation of Topic Coherence. NAACL HLT 2010. [pdf]

Lau, Newman, Karimi, Baldwin (2010). Best Topic Word Selection for Topic Labelling. COLING 2011. [pdf]

Ihler, Newman (2010). Understanding Errors in Approximate Distributed Latent Dirichlet Allocation. TKDE. [pdf]

Newman, Noh, Talley, Karimi, Baldwin (2010). Evaluating Topic Models for Digital Libraries. JCDL 2010. [pdf]

Newman, Baldwin, Cavedon, Karimi, Martinez, Zobel (2010). Visualizing document collections and search results using topic mapping. Journal of Web Semantics. [online]

Asuncion, Newman, Porteous, Smyth, Triglia, Welling (2010). Machine Learning on Very Large Data Sets: Distributed Gibbs Sampling for Latent Variable Models. (book chapter). [link]

2009 Newman, Asuncion, Smyth, Welling (2009). Distributed Algorithms for Topic Models. JMLR (Volume 10). [pdf]

Newman, Karimi, Cavedon (2009). External Evaluation of Topic Models. ADCS (Winner of Best Paper Award). [pdf]

Newman, Karimi, Cavedon (2009). Topic Models to Interpret MESH -- MEDLINE's Medical Subject Headings. AI09. [pdf]

Herr, Talley, Burns, Newman, LaRowe (2009). The NIH Visual Browser: An Interactive Visualization of Biomedical Research. IV09. [pdf]

pre-2009 Porteous, Newman, Ihler, Asuncion, Smyth, Welling (2008). Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation. In ACM SIGKDD 2008. [pdf]

Newman, Asuncion, Smyth, Welling (2007). Distributed Inference for Latent Dirichlet Allocation. In NIPS 2007. [pdf]

Burns, Newman, Herr, Ingulfsen, Pantel, Smyth (2007). A Snapshot of Neuroscience: Unsupervised Natural Language Processing of Abstracts from the Society for Neuroscience. In SfN 2008. [poster]

Hage, Chapman, Newman (2007). Enhancing Search and Browse Using Automated Clustering of Subject Metadata. In D-Lib Magazine, July/August 2007. [online]

Newman, Hage, Chemudugunta, Smyth (2007). Subject Metadata Enrichment using Statistical Topic Models. In JCDL 2007. [doc]

Teh, Newman, Welling (2006). A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation. In NIPS 2006. [pdf]

Newman, Chemudugunta, Smyth, and Steyvers (2006). Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In LNCS -- IEEE ISI. [pdf]

Newman, Chemudugunta, Smyth, and Steyvers (2006). Statistical Entity-Topic Models. In ACM SIGKDD 2006. [pdf]

Newman and Block (2006). Probabilistic Topic Decomposition of an Eighteenth Century Newspaper. In JASIST, March 2006. [pdf]

Newman, Smyth, Steyvers (2006). Scalable Parallel Topic Models. Journal of Intelligence Community Research and Development. [pdf]

Primeau and Newman (2006). Elongation and Contraction of the Western Boundary Current Extension in a Shallow-Water Model: a Bifurcation Analysis. Journal of Physical Oceanography. [pdf]

Primeau and Newman (2006). Bifurcation Structure of a Wind-Driven Shallow Water Model with Layer-Outcropping Ocean Modelling doi:10.1016/j.ocemod.2006.10.003. [pdf]

Zender, Newman, and Torres (2003). Spatial Heterogeneity in Aeolian Erodibility: Uniform, Topographic, Geomorphic, and Hydrologic Hypotheses, J. Geophys. Res., 108(D17), 4543. [pdf]

Zender, Bian, and Newman (2003). Mineral Dust Entrainment And Deposition (DEAD) model: Description and 1990s dust climatology, Geophys. Res., 108(D14), 4416. [pdf]

Newman and Karniadakis (1997). A Direct Numerical Simulation Study of Flow Past a Freely Vibrating Cable. Journal of Fluid Mechanics. [online]

Newman and Karniadakis (1996). Simulations of Flow Over a Flexible Cable: A Comparison of Forced and Flow-Induced Vibrations. Journal of Fluids and Structures. [online]

Crawford, Evangelinos, Newman, and Karniadakis (1995). Parallel Benchmarks of Turbulence in Complex Geometries. In Parallel CFD. [pdf]

> Students
Mike Stewart (UCI, PhD, co-advised with Alex Ihler)

Jey Han Lau (UniMelb, PhD, co-advised with Tim Baldwin)

Nagendra Koilada; Karthik Balasundaram; Prakash Nagarajan; Deepak Agarwal (UCI, MS students doing Independent Studies)

> Graduated Students

Arun Balagopalan (UCI, MS Thesis)

Xunyu Wang (UniMelb, MS, Independent Study)

YiFan Yang (UniMelb, MS, Independent Study)

> Teaching
CS 277 Data Mining (Spring 2013)

Web Technologies, guest lecturer (Semester 1 2009)

CS 277 Data Mining (Fall 2007)

> News
Newman awarded NSF grant to model effectiveness and efficiency of biomedical research funding

Newman awarded NSF grant to analyze CBET research portfolio

Smyth & Newman awarded IARPA grant

Newman awarded NSF EAGER to analyze grant portfolios

Spent 2009 and 2010 at NICTA in Melbourne, Australia

Newman does pilot study on topic modeling grants

Newman receives Google Research Award for topic mapping

Newman and collaborators awarded $750,000 for topic modeling

UCI researchers 'text mine' the New York Times, demonstrating evolution of potent new technology

> Topic Model and Topic Map Demos
NIH Topic Map and Topic Browser (article in Nature Methods)