The fault in our stars: Designing reproducible large-scale code analysis experiments

Maj P, Muroya Lei S, Siek K, Di Grazia L, Vitek J. 2024. The fault in our stars: Designing reproducible large-scale code analysis experiments. 38th European Conference on Object-Oriented Programming. ECOOP: European Conference on Object-Oriented Programming, LIPIcs, vol. 313, 27.

Download
OA 2024_LIPICs_Maj.pdf 1.76 MB [Published Version]

Conference Paper | Published | English

Scopus indexed
Author
Maj, Petr; Muroya Lei, StefanieISTA; Siek, Konrad; Di Grazia, Luca; Vitek, Jan
Series Title
LIPIcs
Abstract
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.
Publishing Year
Date Published
2024-09-01
Proceedings Title
38th European Conference on Object-Oriented Programming
Publisher
Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Acknowledgement
This work was supported by the Czech Ministry of Education, Youth and Sports under program ERC-CZ, grant agreement LL2325, BigCode (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000421). NSF grants CCF-1910850, CNS-1925644, and CCF-2139612, as well as the GACR EXPRO grant 23-07580X. We would like to thank Digital Ocean for their involuntary contribution of computational resources during the early data gathering phase of our research. We acknoweldge the reviewers of ICSE’22, and thank the reviewers of ECOOP’23 for their encouragments and for sticking around until 2024.
Volume
313
Article Number
27
Conference
ECOOP: European Conference on Object-Oriented Programming
Conference Location
Vienna, Austria
Conference Date
2024-09-16 – 2024-09-20
ISSN
IST-REx-ID

Cite this

Maj P, Muroya Lei S, Siek K, Di Grazia L, Vitek J. The fault in our stars: Designing reproducible large-scale code analysis experiments. In: 38th European Conference on Object-Oriented Programming. Vol 313. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2024. doi:10.4230/LIPIcs.ECOOP.2024.27
Maj, P., Muroya Lei, S., Siek, K., Di Grazia, L., & Vitek, J. (2024). The fault in our stars: Designing reproducible large-scale code analysis experiments. In 38th European Conference on Object-Oriented Programming (Vol. 313). Vienna, Austria: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.ECOOP.2024.27
Maj, Petr, Stefanie Muroya Lei, Konrad Siek, Luca Di Grazia, and Jan Vitek. “The Fault in Our Stars: Designing Reproducible Large-Scale Code Analysis Experiments.” In 38th European Conference on Object-Oriented Programming, Vol. 313. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024. https://doi.org/10.4230/LIPIcs.ECOOP.2024.27.
P. Maj, S. Muroya Lei, K. Siek, L. Di Grazia, and J. Vitek, “The fault in our stars: Designing reproducible large-scale code analysis experiments,” in 38th European Conference on Object-Oriented Programming, Vienna, Austria, 2024, vol. 313.
Maj P, Muroya Lei S, Siek K, Di Grazia L, Vitek J. 2024. The fault in our stars: Designing reproducible large-scale code analysis experiments. 38th European Conference on Object-Oriented Programming. ECOOP: European Conference on Object-Oriented Programming, LIPIcs, vol. 313, 27.
Maj, Petr, et al. “The Fault in Our Stars: Designing Reproducible Large-Scale Code Analysis Experiments.” 38th European Conference on Object-Oriented Programming, vol. 313, 27, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024, doi:10.4230/LIPIcs.ECOOP.2024.27.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2024-10-07
MD5 Checksum
2e75d305a8c817d76a0c7f136ce34f86


Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar
ISBN Search