Download page for Sourcerer Source Code Data Set: SDS_source-repo-18k

"SDS_source-repo-18k" is a tarball containing source code of about 18,000 open source Java projects that were collected from open source repositories and stored in Sourcerer's repository format. We are releasing this tarball so that this repository could be used as a reference collection of open source projects to be used for various research purposes. This tarball was archived on 04-22-2010.

"SDS_source-repo-18k" is a part of the UCI Source Code Data Sets.


By downloading and using the Sourcerer repository, you agree to abide by the following terms of usage.

  1. The source code contained in the tarball are collected from various open source projects and you should adhere to the respective licenses that come with the projects.
  2. You will use the file strictly for non-commercial and non-profit work (eg; research or personal use). Any commercial use of this file is prohibited.

Citation Policy

This data set should be cited according to the general Citation Policy.

Publications relevant to this data set

  1. S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation, pages 1-4. IEEE Computer Society, 2009.
  2. Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C. V., Baldi, P. (2009). Sourcerer: Mining and Searching Internet-Scale Software Repositories. Journal of Data Mining and Knowledge Discovery, 18(2), 300-336.
  3. Baldi, P., Lopes, C. V., Linstead, E., Bajracharya, S. (2008). A Theory of Aspects as Latent Topics. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA). (pp. 543-562).
Related Datasets

See CWI's version of this corpus and additional artifacts produced from it, including complete ASTs.

(c) the mondego group