WEKA software, JAVA-based package containing a variety of data mining/machine learning algorithms.
R
statistical computing environment: powerful environment for
statistical computing, widely used by statistical researchers.
MATLAB
user-contributed programs: a small subset of the many functions
and scripts available on the Web for MATLAB.
Statlib/MATLAB
routines: some MATLAB programs available from Statlib, e.g.,
the edatoolbox is quite useful.
KDNuggets software list: extensive pointer to software packages for data mining and machine learning (mostly commercial, but some free).
Software for Graphical Models/Bayesian networks: very comprehensive (as of April 2006) comparison of different software packages for graphical models.
JUNG:
open-source project for JAVA code for graph/network analysis and
visualization.
SVMlight: widely
used and very efficient implementation of SVM algorithms. Here
are some other
links to SVM software packages.
Topic Modeling for Text Documents:
topic modeling code from Mark Steyvers and Tom Griffiths in
MATLAB. Also David
Blei's impementation of LDA in C, and Yee
Whye Teh's hierarchical Dirichlet process modeling code.
MALLET:
comprehensive JAVA-based software for statistical natural language
processing.
UCI Machine
Learning Archive: widely-used testbed of data sets for machine
learning and data mining - mostly relatively small and
classification-oriented data sets.
UCI
KDD Archive: contains somewhat larger, more complex, data sets
than those found in the UCI ML archive.
StatLib:
contains pointers to many data sets used in statistics.
KDNuggets Data
Sets list: pointers data sets and archives of data sets