The replication package of MAT

Titile: How far have we progressed in identifying self-admitted technical debts? A comprehensive empirical study

This repository stores the source codes of the four state-of-the-art SATD comments detection approaches, and 20 Java projects whose comments were manually labeled by Maldonado et al. (10) and ourselves (10).

1. Folders Introduction

(1) `MAT/dataset/` This folder stores the comments data of 20 Java projects, consisting of 40 files: 20 `comments` files (e.g., data--Ant.txt), 20 `labels` files (i.e., label--Ant).

(2) `MAT/src/` This folder stores the source code of `Pattern`, `NLP`, `TM`, and `MAT` written in Java.

(3) `MAT/CNN_Code/` This folder stores the source code for `CNN` written in Python. This code was provided by Ren et al. and we modified some code so that it can be used for cross-project predictions.

[1] X. Ren, Z. Xing, X. Xia, D. Lo, X. Wang, J. Grundy. Neural network based detection of self-admitted technical debt: From performance to explainability. ACM Transactions on Software Engineering and Methodology, 28(3), 2019: 1-45.

(4) `MAT/exp_data/{approach}/` This floder stores the experimental data and classification result of a specific `approach` based on a specific `dataset`. Note that, `approach` is one of {`Pattern`, `Pattern`, `Pattern` and `Pattern`}.

(5) `MAT/result/` This folder stores all classification results of the each approaches. In particular, `MAT/result/predictions/` stores the detailed classification result for each comment of each project.

2. Studied Approaches

Year	Authors	Approach	isSupervised	Description
2015	Potdar et al.	Pattern	No	Pattern (key words) matching
2017	Maldonado et al.	NLP	Yes	Natural language processing
2018	Huang et al.	TM	Yes	Text mining
2019	Ren et al.	CNN	Yes	Convolutional Neural Network
2020	Yu et al.	Jitterbug	Yes	Pattern matching & Hunman effort

3. Dataset Summary

3.1 Projects labeled by Maldonado et al.

Project	Release	Contributors	#Classes	#Comments	#After flitering	SATD	% of SATD
Ant	1.7.0	74	1,475	21,587	3,052	102	0.47%
ArgoUML	0.34	87	2,609	67,716	5,426	969	1.43%
Columba	1.4	9	1,711	33,895	4,090	128	0.38%
EMF	2.4.1	30	1,458	25,229	2,585	74	0.29%
Hibernate	3.3.2	226	1,356	11,630	2,492	377	3.24%
JEdit	4.2	57	800	16,991	4,644	195	1.15%
JFreeChart	1.0.19	19	1,065	23,474	2,494	101	0.43%
JMeter	2.10	33	1,181	20,084	4,148	282	1.40%
JRuby	1.4.0	328	1,486	11,149	3,652	383	3.44%
SQuirrel	3.0.3	46	3,108	27,474	4,473	201	0.73%
Total	-----	-	16,249	259,229	37,056	2,812	1.08%

3.2 Projects labeled by ourselves.

Project	Release	Contributors	#Files	#Comments	#After flitering	SATD	% of SATD
Dubbo	2.7.4	255	1,493	5,875	1,649	85	1.45%
Gradle	5.6.3	409	7,965	15,901	3,324	321	2.02%
Groovy	2.5.8	284	1,526	14,199	4,435	249	1.75%
Hive	3.1.2	192	5,817	81,127	29,340	1,046	1.29%
Maven	3.6.2	87	886	5,448	1,219	136	2.50%
Poi	4.1.1	12	3,477	45,666	15,033	618	1.35%
SpringFramework	5.2.0	401	6,355	42,574	7,712	98	0.23%
Storm	2.1.0	304	2,267	12,258	3,639	92	0.75%
Tomcat	9.0.27	31	2,343	37,038	12,218	287	0.77%
Zookeeper	3.5.6	93	677	6,894	2,691	63	0.91%
Total	------	-	32,806	266,980	81,260	2,995	1.12%

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.idea		.idea
CNN_Code		CNN_Code
dataset		dataset
exp_data		exp_data
lib		lib
result		result
src		src
MAT.iml		MAT.iml
MoreTables.docx		MoreTables.docx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The replication package of MAT

Titile: How far have we progressed in identifying self-admitted technical debts? A comprehensive empirical study

1. Folders Introduction

(1) `MAT/dataset/` This folder stores the comments data of 20 Java projects, consisting of 40 files: 20 `comments` files (e.g., data--Ant.txt), 20 `labels` files (i.e., label--Ant).

(2) `MAT/src/` This folder stores the source code of `Pattern`, `NLP`, `TM`, and `MAT` written in Java.

(3) `MAT/CNN_Code/` This folder stores the source code for `CNN` written in Python. This code was provided by Ren et al. and we modified some code so that it can be used for cross-project predictions.

(4) `MAT/exp_data/{approach}/` This floder stores the experimental data and classification result of a specific `approach` based on a specific `dataset`. Note that, `approach` is one of {`Pattern`, `Pattern`, `Pattern` and `Pattern`}.

(5) `MAT/result/` This folder stores all classification results of the each approaches. In particular, `MAT/result/predictions/` stores the detailed classification result for each comment of each project.

2. Studied Approaches

3. Dataset Summary

3.1 Projects labeled by Maldonado et al.

3.2 Projects labeled by ourselves.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The replication package of MAT

Titile: How far have we progressed in identifying self-admitted technical debts? A comprehensive empirical study

1. Folders Introduction

(1) MAT/dataset/ This folder stores the comments data of 20 Java projects, consisting of 40 files: 20 comments files (e.g., data--Ant.txt), 20 labels files (i.e., label--Ant).

(2) MAT/src/ This folder stores the source code of Pattern, NLP, TM, and MAT written in Java.

(3) MAT/CNN_Code/ This folder stores the source code for CNN written in Python. This code was provided by Ren et al. and we modified some code so that it can be used for cross-project predictions.

(4) MAT/exp_data/{approach}/ This floder stores the experimental data and classification result of a specific approach based on a specific dataset. Note that, approach is one of {Pattern, Pattern, Pattern and Pattern}.

(5) MAT/result/ This folder stores all classification results of the each approaches. In particular, MAT/result/predictions/ stores the detailed classification result for each comment of each project.

2. Studied Approaches

3. Dataset Summary

3.1 Projects labeled by Maldonado et al.

3.2 Projects labeled by ourselves.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

(1) `MAT/dataset/` This folder stores the comments data of 20 Java projects, consisting of 40 files: 20 `comments` files (e.g., data--Ant.txt), 20 `labels` files (i.e., label--Ant).

(2) `MAT/src/` This folder stores the source code of `Pattern`, `NLP`, `TM`, and `MAT` written in Java.

(3) `MAT/CNN_Code/` This folder stores the source code for `CNN` written in Python. This code was provided by Ren et al. and we modified some code so that it can be used for cross-project predictions.

(4) `MAT/exp_data/{approach}/` This floder stores the experimental data and classification result of a specific `approach` based on a specific `dataset`. Note that, `approach` is one of {`Pattern`, `Pattern`, `Pattern` and `Pattern`}.

(5) `MAT/result/` This folder stores all classification results of the each approaches. In particular, `MAT/result/predictions/` stores the detailed classification result for each comment of each project.

Packages