Artifact for "WDD: Weighted Delta Debugging"

Update: WDD (Wddmin and WProbDD) have been integrated into Perses. Check the Perses repo for more details.

Introduction

Thank you for evaluating this artifact.

To evaluate this artifact, a Linux machine with docker is needed.

List of Claims Supported by the Artifact

WDD introduce the concept of weight to the classical delta debugging algorithms, supporting more rational partitioning strategy during delta debugging.
$W_{ddmin}$ and $W_{ProbDD}$, the implementations of WDD onå ddmin and ProbDD, outperform ddmin and ProbDD, respectively, in both efficiecny and effectiveness in tree-based test input minimization techniques, HDD and Perses.

Notes

All the experiments take very long time to finish, so it is recommended to use tools like screen and tmux to manage sessions if the experiments are run on remote server.
The experiments involving ProbDD in this paper were repeated 5 times to mitigate the randomness of ProbDD algorithms.

Docker Environment Setup

If docker is not installed, install it by following the instructions.
Install the docker image.
```
docker pull wddartifact/wdd:latest
```

Start a container

docker container run --cap-add SYS_PTRACE --interactive --tty wddartifact/wdd:latest /bin/bash
# for all operations in docker, use 'sudo' when meeting permission denied issues, password is 123
cd /tmp/WeightDD/

Benchmark Suites

Under the root directory of the project, the benchmarks are located in:

./c_benchmarks: benchmark-C which consists of 32 C programs;
./xml_benchmarks: benchmark-XML which consists of 30 XML files.

RQ1: Element Weight v.s. Deletion Probability Correlation

cd /tmp/WeightDD
# For C Benchmarks:
./run_stat_parallel_c.py -s c_benchmarks/* -r perses_ddmin_stat hdd_ddmin_stat -o stat_result_c -j 20
# For XML benchmarks:
./run_stat_parallel_xml.py -s xml_benchmarks/xml-* -r perses_ddmin_stat hdd_ddmin_stat -o stat_result_xml -j 20

# To calculate the correlations and export the results into csv files, got to the directry of the results, and run stat.py, use '-h' to see usage notes
python3 stat.py -d ./stat_result_c/perses_ddmin_stat_0/ -o rq1_csv/perses_ddmin_c.csv -t correlation
python3 stat.py -d ./stat_result_c/hdd_ddmin_stat_0/ -o rq1_csv/hdd_ddmin_c.csv -t correlation
python3 stat.py -d ./stat_result_xml/perses_ddmin_stat_0/ -o rq1_csv/perses_ddmin_xml.csv -t correlation
python3 stat.py -d ./stat_result_xml/hdd_ddmin_stat_0/ -o rq1_csv/hdd_ddmin_xml.csv -t correlation

RQ2: $W_{ddmin}$ v.s. $ddmin$

# For C Benchmarks:
./run_exp_parallel_c.py -s c_benchmarks/* -r perses_ddmin perses_wdd hdd_ddmin hdd_wdd -o result_wdd_c -j 20
# For XML Benchmarks:
./run_exp_parallel_xml.py -s xml_benchmarks/xml-* -r perses_ddmin perses_wdd hdd_ddmin hdd_wdd -o result_wdd_xml -j 20
# Run convert_result_to_csv.py to export the results into csv files, use '-h' to see usage notes
./convert_result_to_csv.py -d result_wdd_c/hdd_ddmin_0/*  -o hdd_ddmin_c.csv
./convert_result_to_csv.py -d result_wdd_c/hdd_wdd_0/*  -o hdd_wdd_c.csv
./convert_result_to_csv.py -d result_wdd_c/perses_ddmin_0/*  -o perses_ddmin_c.csv
./convert_result_to_csv.py -d result_wdd_c/perses_wdd_0/*  -o perses_wdd_c.csv
./convert_result_to_csv.py -d result_wdd_xml/hdd_ddmin_0/*  -o hdd_ddmin_xml.csv
./convert_result_to_csv.py -d result_wdd_xml/hdd_wdd_0/*  -o hdd_wdd_xml.csv
./convert_result_to_csv.py -d result_wdd_xml/perses_ddmin_0/*  -o perses_ddmin_xml.csv
./convert_result_to_csv.py -d result_wdd_xml/perses_wdd_0/*  -o perses_wdd_xml.csv

RQ3: $W_{ProbDD}$ v.s. $ProbDD$

# For C Benchmarks:
./run_exp_parallel_c.py -s c_benchmarks/* -r perses_probdd perses_wprobdd hdd_probdd hdd_wprobdd -o result_wprobdd_c -j 20
# For XML Benchmarks:
./run_exp_parallel_xml.py -s xml_benchmarks/xml-* -r perses_probdd perses_wprobdd hdd_probdd hdd_wprobdd -o result_wprobdd_xml -j 20
# Run convert_result_to_csv.py to export the results into csv files, use '-h' to see usage notes
./convert_result_to_csv.py -d result_wprobdd_c/hdd_probdd_0/*  -o hdd_probdd_c.csv
./convert_result_to_csv.py -d result_wprobdd_c/hdd_wprobdd_0/*  -o hdd_wprobdd_c.csv
./convert_result_to_csv.py -d result_wprobdd_c/perses_probdd_0/* -o perses_probdd_c.csv
./convert_result_to_csv.py -d result_wprobdd_c/perses_wprobdd_0/* -o perses_wprobdd_c.csv
./convert_result_to_csv.py -d result_wprobdd_xml/hdd_probdd_0/*  -o hdd_probdd_xml.csv
./convert_result_to_csv.py -d result_wprobdd_xml/hdd_wprobdd_0/*  -o hdd_wprobdd_xml.csv
./convert_result_to_csv.py -d result_wprobdd_xml/perses_probdd_0/* -o perses_probdd_xml.csv
./convert_result_to_csv.py -d result_wprobdd_xml/perses_wprobdd_0/* -o perses_wprobdd_xml.csv

Evaluation Results

RQ1: The raw data of the weights information during delta debugging are put in results_rq1, and the correlation values are exported to the csv files under results_rq1_csv. Using box diagram to visualize the result:

From this figure, the probability of elements being deleted is negatively correlated with their weights in ddmin executions in both HDD and Perses, to varying degrees. This validation provides a solid foundation for the design of $W_{ddmin}$.

RQ2 & RQ3: The raw data of the evaluation results used in RQ2 & 3 are saved in results_c and results_xml directories, and the exported csv files using convert_result_to_csv.py are put under results_csv. The detailed results are shown in Table 1 of the paper, and the following tables provides a brief illustration of the overall (average) results of each algorithm.

RQ2: $W_{ddmin}$ v.s. ddmin

Benchmark	HDD-ddmin		HDD-Wddmin		Perses-ddmin		Perses-Wddmin
	Time (s)	Size	Time (s)	Size	Time (s)	Size	Time (s)	Size
C Programs	39,108	518	18,022	477	4,582	281	4,169	278
			-53.92%	-7.81%			-9.01%	-1.04%
XML Files	3,152	98	2,621	82	1,295	37.6	1,273	37.5
			-16.86%	-16.51%			-1.67%	-0.27%

RQ3: $W_{ProbDD}$ v.s. ProbDD

Benchmark	HDD-ProbDD		HDD-WProbDD		Perses-ProbDD		Perses-WProbDD
	Time (s)	Size	Time (s)	Size	Time (s)	Size	Time (s)	Size
C Programs	6,042	567	5,384	488	3,455	280	2,975	273
			-10.89%	-13.91%			-13.89%	-2.43%
XML Files	3,004	89	2,574	80	1,838	37.4	1,813	37.5
			-14.31%	-9.74%			-1.35%	+0.42%

Our evaluation results demonstrate that $W_{ddmin}$ and $W_{ProbDD}$ outperform ddmin and ProbDD in both efficiency and effectiveness, respectively. These results highlight the significance of WDD in improving test input minimization techniques.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artifact for "WDD: Weighted Delta Debugging"

Introduction

List of Claims Supported by the Artifact

Notes

Docker Environment Setup

Benchmark Suites

RQ1: Element Weight v.s. Deletion Probability Correlation

RQ2: $W_{ddmin}$ v.s. $ddmin$

RQ3: $W_{ProbDD}$ v.s. $ProbDD$

Evaluation Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
binaries		binaries
c_benchmarks		c_benchmarks
results_c		results_c
results_csv		results_csv
results_rq1		results_rq1
results_rq1_csv		results_rq1_csv
results_xml		results_xml
tools		tools
xml_benchmarks		xml_benchmarks
README.md		README.md
convert_result_to_csv.py		convert_result_to_csv.py
patch_oracle.py		patch_oracle.py
plot.py		plot.py
rq1_box.png		rq1_box.png
run_benchmark.py		run_benchmark.py
run_exp_parallel_c.py		run_exp_parallel_c.py
run_exp_parallel_xml.py		run_exp_parallel_xml.py
run_stat_parallel_c.py		run_stat_parallel_c.py
run_stat_parallel_xml.py		run_stat_parallel_xml.py
sanity_test.sh		sanity_test.sh
stat.py		stat.py

Folders and files

Latest commit

History

Repository files navigation

Artifact for "WDD: Weighted Delta Debugging"

Introduction

List of Claims Supported by the Artifact

Notes

Docker Environment Setup

Benchmark Suites

RQ1: Element Weight v.s. Deletion Probability Correlation

RQ2: $W_{ddmin}$ v.s. $ddmin$

RQ3: $W_{ProbDD}$ v.s. $ProbDD$

Evaluation Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages