Software for CS 175, Winter 2021
The links to software packages (all in Python) below will likely be useful to you both for
the initial assignments and for class projects. For the class projects you are welcome (if you wish)
to make use of other software packages in addition to those below, although the packages below contain a very large range of
different library functions and utilities for text analysis and machine learning and should be enough to support most if not all aspects of your project.
Anaconda Python Distribution
We recommend that you download and install the
free Anaconda Python distribution
with Python 3.6/3.7/3.8. Anaconda includes Python, the Natural
Language Toolkit (NLTK) and scikit-learn, in addition to
a wide range of other packages
are useful for data analysis (such as matplotlib, numpy, scipy, and more). If you download Anaconda
you should have many of the packages you will need for both the assignments and for your class project.
Anaconda is available for Mac, Linux, and Windows OS. Anaconda includes (among many other libraries):
Python (3.6 or above)
You should have Python 3.6 or above installed on your computer for this course (if you
installed Anaconda (see above) with the Python 3 option then you should already have it). The
online Python Tutorial
materials are very useful reference in general.
If you are not familiar with Python you will need to spend time learning it, e.g., via an online tutorial such as the
Beginner's Guide to Python
an introductory text on Python such as
Python Programming: An Introduction to Computer Science
Pytorch and Related NLP Tools
is a powerful machine learning framework in Python that you should also download and install for this course. There are also a number of additional (optional) NLP packages that are built on top of PyTorch and that may be useful for your projects:
- Huggingface, a very useful publicly-available set of models, datasets, library functions that extends PyTorch and TensorFlow, for example with multiple varieties of transformer models such as BERT, DistilBERT, ALBERT, GPT-2, etc.
- TorchText (general purpose NLP package built on PyTorch)
- PyTorch-NLP (a neural network NLP package built on PyTorch)
- AllenNLP (advanced NLP capabilities from AI2 built on PyTorch)
Note: when installing Python packages you may find it useful to use conda to create virtual environments that are specific to this course and/or your project, e.g.,
> conda create --name cs175
> source activate cs175
> conda install --name cs175 pytorch