Examples of Software and Demos for Text Analysis

CS 175, Winter 2017
The links below are just a small example of the many Web pages you can find discussing research demos, applications, sample code, for text analysis using AI and machine learning techniques. These are intended to be a starting point to help you in terms of thinking about ideas for projects.

Sentiment Analysis
Sentiment140: a very comprehensive list of resources related to sentiment analysis, particularly for Twitter. Also includes labeled Twitter data
NLTK Sentiment Analysis Package including the vader package for sentence-level analysis (e.g., with negation detection).
Stanford deep learning demo for sentiment analysis (Web page with code, data, papers)
Sentiment analysis of short informal texts (research paper)
Example of using Python to build a Naive Bayes text classifier for sentiment analysis.
Blog tutorial on building a classifier in NLTK to classify tweets as positive or negative sentiment.

Information Extraction
GATE demos and software on information extraction
Reverb software for information extraction from the Web, from the University of Washington

Question Answering
The Aristo Project from the Allen Institute for AI, for acquiring knowledge and question-answering. Includes links to several research papers.
The Quepy System (in python) for converting natural language questions into queries in a database query language.
Semantic parsing for question answering (from Stanford NLP group)

Embedding Text in Vector Spaces
word2vec from Google (with code)
Recurrent neural networks for word embeddings (with Python code)

Other Applications
BookNLP, extracting information from books: Github repository (Java code) by David Bamman.
Making predictions from search query data, slide presentation from Choi and Varian
Text mining for analyzing the biomedical literature, introduction by Cohen and Hunter, 2008
Implementing a spelling corrector. Blog post and Python code by Peter Norvig, Director of Research at Google.

Software Packages for NLP and Text Analysis
(general-purpose toolkits with multiple components and algorithms - see also NLTK 3.0 and scikit-learn as discussed in class)
Gensim: Python library for topic modeling, document indexing, and similarity retrieval
Textblob :  General Python library for analyzing text data (built on top of NLTK and pattern).
Pattern: A Web mining module in Python (doesn't support Python 3)
MALLET: useful toolkit (in Java) from the University of Massachusetts, Amherst (particularly for classification and topic modeling)
    (also: a tutorial article on how to use MALLET for topic modeling)
Stanford NLP Software
Software packages for NLP from the University of Illinois (UIUC)
List of text mining tools from the Digital Research Tools Website