Download page for Sourcerer Source Code Data Set: SDS_source-repo-18k
"SDS_source-repo-18k" is a tarball containing source code of about 18,000
open source Java projects that were collected from open source repositories
and stored in Sourcerer's repository format.
We are releasing this tarball
so that this repository could be used as a reference collection of
open source projects to be used for various research purposes.
This tarball was archived on 04-22-2010.
"SDS_source-repo-18k" is a part of the UCI Source Code Data Sets.
By downloading and using the Sourcerer repository, you agree to abide
by the following terms of usage.
-
The source code contained in the tarball are collected from various
open source projects and you should adhere to the respective licenses
that come with the projects.
-
You will use the file strictly for non-commercial and non-profit work (eg;
research or personal use). Any commercial use of this file is prohibited.
This data set should be cited according to the general
Citation Policy.
Publications relevant to this data set
-
S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software
repository. In Proceedings of the 2009 ICSE Workshop on
Search-Driven Development-Users, Infrastructure, Tools and Evaluation,
pages 1-4. IEEE Computer Society, 2009.
bibtex:
@inproceedings{Bajracharya:2009lr,
Author = {Sushil Bajracharya and Joel Ossher and Cristina Lopes},
Booktitle = {Proceedings of the 2009 {ICSE} Workshop on {Search-Driven}
{Development-Users,} Infrastructure, Tools and Evaluation},
Pages = {1--4},
Publisher = {{IEEE} Computer Society},
Title = {{Sourcerer: An internet-scale software repository}},
Year = {2009}}
-
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C. V., Baldi, P. (2009).
Sourcerer: Mining and Searching Internet-Scale Software Repositories.
Journal of Data Mining and Knowledge Discovery, 18(2), 300-336.
bibtex:
@article{linstead2009,
year={2009},
issn={1384-5810},
journal={Data Mining and Knowledge Discovery},
volume={18},
number={2},
doi={10.1007/s10618-008-0118-x},
title={Sourcerer: mining and searching internet-scale software repositories},
url={http://dx.doi.org/10.1007/s10618-008-0118-x},
publisher={Springer US},
keywords={Mining software; Program understanding; Code search; Software analysis;
Author-topic probabilistic modeling; Code retrieval},
author={Linstead, Erik and Bajracharya, Sushil and Ngo, Trung and Rigor, Paul and Lopes, Cristina and Baldi, Pierre},
pages={300-336},
language={English}
}
-
Baldi, P., Lopes, C. V., Linstead, E., Bajracharya, S. (2008).
A Theory of Aspects as Latent Topics. In Proceedings of the 23rd ACM SIGPLAN
Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA). (pp. 543-562).
bibtex:
@inproceedings{Baldi:2008:TAL:1449764.1449807,
author = {Baldi, Pierre F. and Lopes, Cristina V. and Linstead, Erik J. and Bajracharya, Sushil K.},
title = {A Theory of Aspects As Latent Topics},
booktitle = {Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming
Systems Languages and Applications},
series = {OOPSLA '08},
year = {2008},
isbn = {978-1-60558-215-3},
location = {Nashville, TN, USA},
pages = {543--562},
numpages = {20},
url = {http://doi.acm.org/10.1145/1449764.1449807},
doi = {10.1145/1449764.1449807},
acmid = {1449807},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {aspect-oriented programming, scattering, tangling, topic models},
}
See CWI's
version of this corpus and additional artifacts produced from it,
including complete ASTs.
|