BECAS
CARRUTHERS Juan AndrÉs
congresos y reuniones científicas
Título:
How are software datasets constructed in Empirical Software Engineering studies? A systematic mapping study
Autor/es:
JUAN ANDRÉS CARRUTHERS; JORGE ANDRÉS DIAZ-PACE; EMANUEL IRRAZÁBAL
Lugar:
Gran Canaria
Reunión:
Conferencia; Euromicro Conference on Software Engineering and Advanced Applications (SEAA); 2022
Institución organizadora:
Euromicro
Resumen:
Context: Software projects are common inputs in Empirical Software Engineering (ESE) studies, although they are often selected with ad-hoc strategies that reduce the generalizability of the results. An alternative is the usage of available datasets of software projects, which should be current and follow explicit rules for ensuring their validity over time. Goal: In this context, it is important to assess the general state of software datasets in terms of purpose, last update, project characterization, source code metrics, and tools to extract source-code-related artifacts. Method: We conducted a systematic mapping study retrieving software datasets used in ESE studies published from January 2013 to December 2021. Results: We selected 74 datasets created mainly for software defects, software estimation, and software maintainability studies. The majority of these datasets (64%) explicitly stated the characteristics to select the projects, and the most common programming languages were Java and C. Conclusions: Our study identified scarce efforts to keep datasets updated over time and also provides recommendations to support their construction and consumption for ESE studies.