Skip to content

sticeran/InconsistentLabels

Repository files navigation

The replication toolkit for the paper "Inconsistent defect labels: essence, causes, and influence"

Title: Inconsistent defect labels: essence, causes, and influence

Our work is to study the existence and influence of inconsistent labels in multi-version-project defect data sets. The “inconsistent label” phenomenon denotes that a module in multiple versions has the same source code (non-blank, non-comment) but the corresponding instances in these versions have different labels.

Dataset Summary

Dataset Project Versions #Instances (#IL-instances) %Defective #Metrics
ECLIPSE-2007 Eclipse 2.0, 2.1, 3.0 6,727-10,590 (91-145) 11%~15% 198
Metrics-Repo-2010 Ant 1.3, 1.4, 1.5, 1.6, 1.7 124-740 (2-14) 11%~26% 20
Metrics-Repo-2010 Camel 1.0, 1.2, 1.4, 1.6 339-927 (28-123) 4%~37%
Metrics-Repo-2010 Forrest 0.6, 0.7, 0.8 6-30 (0-1) 7%~17%
Metrics-Repo-2010 Jedit 3.2, 4.0, 4.1, 4.2, 4.3 259-487 (1-18) 2%~34%
Metrics-Repo-2010 Log4j 1.0, 1.1, 1.2 103-193 (39-48) 29%~95%
Metrics-Repo-2010 Lucene 2.0, 2.2, 2.4 186-330 (55-89) 49%~61%
Metrics-Repo-2010 Pbeans 1.0, 2.0 26-51 (1-1) 20%~77%
Metrics-Repo-2010 Poi 1.5, 2.0, 2.5, 3.0 235-438 (49-240) 12%~64%
Metrics-Repo-2010 Synapse 1.0, 1.1, 1.2 157-256 (10-20) 10%~34%
Metrics-Repo-2010 Velocity 1.4, 1.5, 1.6 195-229 (41-81) 34%~75%
Metrics-Repo-2010 Xalan 2.4, 2.5, 2.6, 2.7 676-899 (314-481) 16%~99%
Metrics-Repo-2010 Xerces Init, 1.2, 1.3, 1.4 162-451 (25-212) 15%~64%
JIRA-HA-2019/JIRA-RA-2019 ActiveMQ 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.8.0 1,884-3,420 (1-136/13-282) 4%~8% / 6%~16% 65
JIRA-HA-2019/JIRA-RA-2019 Camel 1.4.0, 2.9.0, 2.10.0, 2.11.0 1,503-8,809 (3-140/4-131) 2%~24% / 2%~19%
JIRA-HA-2019/JIRA-RA-2019 Derby 10.2.1.6, 10.3.1.4, 10.5.1.1 1,963-2,704 (42-56/163-281) 7%~9% / 14%~34%
JIRA-HA-2019/JIRA-RA-2019 Groovy 1.5.7, 1.6.0.Beta 1, 1.6.0.Beta 2 756-883 (19-28/20-21) 11%~14% / 3%~9%
JIRA-HA-2019/JIRA-RA-2019 HBase 0.94.0, 0.95.0, 0.95.2 1,047-1,801 (0-17/0-101) 6%~7% / 21%~27%
JIRA-HA-2019/JIRA-RA-2019 Hive 0.9.0, 0.10.0, 0.12.0 1,319-2,512 (14-26/43-68) 2%~4% / 8%~19%
JIRA-HA-2019/JIRA-RA-2019 JRuby 1.1, 1.4, 1.5, 1.7.0.preview1 723-1,551 (9-78/4-17) 10%~24% / 5%~19%
JIRA-HA-2019/JIRA-RA-2019 Lucene 2.3.0, 2.9.0, 3.0.0, 3.1.0 803-1,802 (15-29/10-32) 14%~20% / 6%~24%
JIRA-HA-2019/JIRA-RA-2019 Wicket 1.3.0.beta1, 1.3.0.beta2, 1.5.3 1,669-2,570 (0-0/0-49) 4%~17% / 4%~7%
MA-SZZ-2020 Zeppelin 0.5.0, 0.5.5, 0.5.6, 0.6.0, 0.6.1, 0.6.2, 0.7.0, 0.7.1, 0.7.2, 0.7.3 129-413 (6-24) 21~38% 44
MA-SZZ-2020 Shiro 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.3.0, 1.3.1 381-493 (0-1) 4~7%
MA-SZZ-2020 Maven 2.2.0, 2.2.1, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1 318-703 (0-9) 4~13%
MA-SZZ-2020 Flume 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.7.0, 1.8.0 272-574 (0-35) 4~32%
MA-SZZ-2020 Mahout 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.10.0, 0.11.0, 0.12.0, 0.13.0 1027-1267 (0-1) 17~29%
IND-JLMIV+R-2020 38 projects 395 versions 3-1708 (0-35) 0~36% 4198

Quick Start

In our work, in addition to using existing publicly available multi-version-project defect data sets, we collected a file-level multi-version-project defect data set (MA-SZZ-2020) using our replicated MA-SZZ [1] defect collection approach.

(1) /RQ1 experimental data/ The result folder of RQ1 (Section 5.1) contains the original data set (in the "original" folder) and the detection results of inconsistent labels (in the "(type)original" folder) for the six multi-version-project defect data sets we investigated: ECLIPSE-2007 [2], Metrics-Repo-2010 [3-5], JIRA-HA-2019 [6], JIRA-RA-2019 [6], MA-SZZ-2020, and IND-JLMIV+R-2020 data set [7]. Of these data sets, the former two data sets are widely used benchmark data sets in the literature, the next two are two recently pub-lished data sets in ICSE 2019, the fifth is a data set col-lected by ourselves using the state-of-the-art SZZ algo-rithm, and the last one is a recently published data set collected by a semi-automatic approach. With the ex-ception of our own newly collected MA-SZZ-2020 data sets, each data set has been cited extensively (9-817 cita-tions).

(2) /RQ2 and RQ3 experimental data and program/ The results folder of RQ2 (Section 5.2) and RQ3 (Section 5.3) contains clean and noise training data and test data (arff files) needed to build defect prediction models (CC vs. NC and NC vs. NN), as well as the executable JAR program. The raw results of RQ2 and RQ3 can be obtained by executing the JAR program. Please refer to the /RQ2 and RQ3 experimental data and program/README.md file to learn how to run the JAR program.

(3) /other program codes/ This folder contains the other code programs we developed that were used, including the metric extractor, TSILI (detecting inconsistent labels), and the replicated MA-SZZ algorithm. Each sub-folder in this folder contains README document. Please refer to these README documents to see how to run the corresponding programs.

If you use the MA-SZZ-2020 defect data set or the method code that this work implements, please cite our paper "Inconsistent defect labels: essence, causes, and influence", thanks.

References

[1] D.A. Costa, S. McIntosh, W. Shang, U. Kulesza, R. Coelho, A.E. Hassan. A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, 43(7), 2017: 641-657.
[2] T. Zimmermann, R. Premraj, A. Zeller. Predicting defects for Eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, ser. PROM-ISE ’07. IEEE Computer Society, 2007: 9–.
[3] M. Jureczko, D. Spinellis. Using object-oriented design met-rics to predict software defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej, 2010: 69-81.
[4] M. Jureczko, L. Madeyski. Towards identifying software project clusters with regard to defect prediction. In: Proceed-ings of the 6th International Conference on Predictive Models in Software Engineering, 2010: 1–10.
[5] T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, B. Turhan. The promise repository of empirical software engi-neering data, June 2012.
[6] S. Yatish, J. Jiarpakdee, P. Thongtanunam, C. Tan-tithamthavorn. Mining software defects: should we consider affected releases? ICSE 2019: 654-665.
[7] S. Herbold, A. Trautsch, F. Trautsch. Issues with SZZ: An empirical assessment of the state of practice of defect predic-tion data collection. arXiv preprint arXiv:1911.08938v2, 2020.

About

The replication toolkit for the paper "Inconsistent defect labels: essential, causes, and influence"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors