The replication toolkit for the paper "Inconsistent defect labels: essence, causes, and influence"

Title: Inconsistent defect labels: essence, causes, and influence

Our work is to study the existence and influence of inconsistent labels in multi-version-project defect data sets. The “inconsistent label” phenomenon denotes that a module in multiple versions has the same source code (non-blank, non-comment) but the corresponding instances in these versions have different labels.

Dataset Summary

Dataset	Project	Versions	#Instances (#IL-instances)	%Defective	#Metrics
ECLIPSE-2007	Eclipse	2.0, 2.1, 3.0	6,727-10,590 (91-145)	11%~15%	198
Metrics-Repo-2010	Ant	1.3, 1.4, 1.5, 1.6, 1.7	124-740 (2-14)	11%~26%	20
Metrics-Repo-2010	Camel	1.0, 1.2, 1.4, 1.6	339-927 (28-123)	4%~37%
Metrics-Repo-2010	Forrest	0.6, 0.7, 0.8	6-30 (0-1)	7%~17%
Metrics-Repo-2010	Jedit	3.2, 4.0, 4.1, 4.2, 4.3	259-487 (1-18)	2%~34%
Metrics-Repo-2010	Log4j	1.0, 1.1, 1.2	103-193 (39-48)	29%~95%
Metrics-Repo-2010	Lucene	2.0, 2.2, 2.4	186-330 (55-89)	49%~61%
Metrics-Repo-2010	Pbeans	1.0, 2.0	26-51 (1-1)	20%~77%
Metrics-Repo-2010	Poi	1.5, 2.0, 2.5, 3.0	235-438 (49-240)	12%~64%
Metrics-Repo-2010	Synapse	1.0, 1.1, 1.2	157-256 (10-20)	10%~34%
Metrics-Repo-2010	Velocity	1.4, 1.5, 1.6	195-229 (41-81)	34%~75%
Metrics-Repo-2010	Xalan	2.4, 2.5, 2.6, 2.7	676-899 (314-481)	16%~99%
Metrics-Repo-2010	Xerces	Init, 1.2, 1.3, 1.4	162-451 (25-212)	15%~64%
JIRA-HA-2019/JIRA-RA-2019	ActiveMQ	5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.8.0	1,884-3,420 (1-136/13-282)	4%~8% / 6%~16%	65
JIRA-HA-2019/JIRA-RA-2019	Camel	1.4.0, 2.9.0, 2.10.0, 2.11.0	1,503-8,809 (3-140/4-131)	2%~24% / 2%~19%
JIRA-HA-2019/JIRA-RA-2019	Derby	10.2.1.6, 10.3.1.4, 10.5.1.1	1,963-2,704 (42-56/163-281)	7%~9% / 14%~34%
JIRA-HA-2019/JIRA-RA-2019	Groovy	1.5.7, 1.6.0.Beta 1, 1.6.0.Beta 2	756-883 (19-28/20-21)	11%~14% / 3%~9%
JIRA-HA-2019/JIRA-RA-2019	HBase	0.94.0, 0.95.0, 0.95.2	1,047-1,801 (0-17/0-101)	6%~7% / 21%~27%
JIRA-HA-2019/JIRA-RA-2019	Hive	0.9.0, 0.10.0, 0.12.0	1,319-2,512 (14-26/43-68)	2%~4% / 8%~19%
JIRA-HA-2019/JIRA-RA-2019	JRuby	1.1, 1.4, 1.5, 1.7.0.preview1	723-1,551 (9-78/4-17)	10%~24% / 5%~19%
JIRA-HA-2019/JIRA-RA-2019	Lucene	2.3.0, 2.9.0, 3.0.0, 3.1.0	803-1,802 (15-29/10-32)	14%~20% / 6%~24%
JIRA-HA-2019/JIRA-RA-2019	Wicket	1.3.0.beta1, 1.3.0.beta2, 1.5.3	1,669-2,570 (0-0/0-49)	4%~17% / 4%~7%
MA-SZZ-2020	Zeppelin	0.5.0, 0.5.5, 0.5.6, 0.6.0, 0.6.1, 0.6.2, 0.7.0, 0.7.1, 0.7.2, 0.7.3	129-413 (6-24)	21~38%	44
MA-SZZ-2020	Shiro	1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.3.0, 1.3.1	381-493 (0-1)	4~7%
MA-SZZ-2020	Maven	2.2.0, 2.2.1, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1	318-703 (0-9)	4~13%
MA-SZZ-2020	Flume	1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.7.0, 1.8.0	272-574 (0-35)	4~32%
MA-SZZ-2020	Mahout	0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.10.0, 0.11.0, 0.12.0, 0.13.0	1027-1267 (0-1)	17~29%
IND-JLMIV+R-2020	38 projects	395 versions	3-1708 (0-35)	0~36%	4198

Quick Start

In our work, in addition to using existing publicly available multi-version-project defect data sets, we collected a file-level multi-version-project defect data set (MA-SZZ-2020) using our replicated MA-SZZ [1] defect collection approach.

(1) `/RQ1 experimental data/` The result folder of RQ1 (Section 5.1) contains the original data set (in the "original" folder) and the detection results of inconsistent labels (in the "(type)original" folder) for the six multi-version-project defect data sets we investigated: ECLIPSE-2007 [2], Metrics-Repo-2010 [3-5], JIRA-HA-2019 [6], JIRA-RA-2019 [6], MA-SZZ-2020, and IND-JLMIV+R-2020 data set [7]. Of these data sets, the former two data sets are widely used benchmark data sets in the literature, the next two are two recently pub-lished data sets in ICSE 2019, the fifth is a data set col-lected by ourselves using the state-of-the-art SZZ algo-rithm, and the last one is a recently published data set collected by a semi-automatic approach. With the ex-ception of our own newly collected MA-SZZ-2020 data sets, each data set has been cited extensively (9-817 cita-tions).

(2) `/RQ2 and RQ3 experimental data and program/` The results folder of RQ2 (Section 5.2) and RQ3 (Section 5.3) contains clean and noise training data and test data (arff files) needed to build defect prediction models (CC vs. NC and NC vs. NN), as well as the executable JAR program. The raw results of RQ2 and RQ3 can be obtained by executing the JAR program. Please refer to the `/RQ2 and RQ3 experimental data and program/README.md` file to learn how to run the JAR program.

(3) `/other program codes/` This folder contains the other code programs we developed that were used, including the metric extractor, TSILI (detecting inconsistent labels), and the replicated MA-SZZ algorithm. Each sub-folder in this folder contains README document. Please refer to these README documents to see how to run the corresponding programs.

If you use the MA-SZZ-2020 defect data set or the method code that this work implements, please cite our paper "Inconsistent defect labels: essence, causes, and influence", thanks.

References

[1] D.A. Costa, S. McIntosh, W. Shang, U. Kulesza, R. Coelho, A.E. Hassan. A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, 43(7), 2017: 641-657.
[2] T. Zimmermann, R. Premraj, A. Zeller. Predicting defects for Eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, ser. PROM-ISE ’07. IEEE Computer Society, 2007: 9–.
[3] M. Jureczko, D. Spinellis. Using object-oriented design met-rics to predict software defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej, 2010: 69-81.
[4] M. Jureczko, L. Madeyski. Towards identifying software project clusters with regard to defect prediction. In: Proceed-ings of the 6th International Conference on Predictive Models in Software Engineering, 2010: 1–10.
[5] T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, B. Turhan. The promise repository of empirical software engi-neering data, June 2012.
[6] S. Yatish, J. Jiarpakdee, P. Thongtanunam, C. Tan-tithamthavorn. Mining software defects: should we consider affected releases? ICSE 2019: 654-665.
[7] S. Herbold, A. Trautsch, F. Trautsch. Issues with SZZ: An empirical assessment of the state of practice of defect predic-tion data collection. arXiv preprint arXiv:1911.08938v2, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
RQ1 experimental data		RQ1 experimental data
RQ2 and RQ3 experimental data and program		RQ2 and RQ3 experimental data and program
other program codes		other program codes
Appendix-Inconsistent defect labels essence, causes, and influence.pdf		Appendix-Inconsistent defect labels essence, causes, and influence.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The replication toolkit for the paper "Inconsistent defect labels: essence, causes, and influence"

Title: Inconsistent defect labels: essence, causes, and influence

Dataset Summary

Quick Start

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The replication toolkit for the paper "Inconsistent defect labels: essence, causes, and influence"

Title: Inconsistent defect labels: essence, causes, and influence

Dataset Summary

Quick Start

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages