Usage:
-
Prepare datasets: Devign, Big-Vul and DiverseVul
-
use data_creator.ipynb to process datasets. Move the processed datasets to data folders of each model
-
Train and test a model:
- For CodeBERT or ReGVD:
codebert python run.py --output_folder_name=devign --output_dir=./saved_models --model_type=roberta --tokenizer_name=microsoft/codebert-base --model_name_or_path=microsoft/codebert-base --do_train --train_data_file=../data/devign/train.jsonl --do_eval --eval_data_file=../data/devign/valid.jsonl --do_test --test_data_file=../data/devign/test.jsonl --epoch 5 --block_size 400 --train_batch_size 8 --eval_batch_size 8 --learning_rate 2e-5 --max_grad_norm 1.0 --evaluate_during_training regvd python run_reproduction.py --dataset devign --batch_size 64-
For RoBERTa:
Run 1.Pretraining_RoBERTa.ipynb, 2.Finetuning_RoBERTa.ipynb, 3.Evaluation_RoBERTa.ipynb in order
-
Result: An extracted result is listed below. A devign_test_result.csv is provided for RQ3(See paper)
RQ1:
codebert devign
Accuracy: 0.6350658857979502
Precision: 0.6062658763759525
F-measure: 0.5895430218196789
Recall: 0.5737179487179487
codebert Big-Vul
Accuracy: 0.9459316194010071
Precision: 0.6615384615384615
F-measure: 0.2018779342723005
Recall: 0.11911357340720222
reGVD devign
accuracy = 62.0059
f1_score = 51.1299
precision = 61.9863
recall = 43.5096
reGVD Big-Vul
accuracy = 94.678
f1_score = 20.1908
precision = 72.5714
recall = 11.7267
vulBERTa devign
Confusion matrix:
[[944 533]
[572 683]]
TP: 683
FP: 533
TN: 944
FN: 572
Accuracy: 0.5955344070278185
Precision: 0.5616776315789473
Recall: 0.5442231075697211
F-measure: 0.5528126264670173
Precision-Recall AUC: 0.5720734956125912
AUC: 0.6304636565451126
MCC: 0.18386200837300865
RQ4:
codebert qemu qemu
Accuracy: 0.6170921198668147
Precision: 0.6833855799373041
F-measure: 0.5582586427656849
Recall: 0.47186147186147187
qemu FFmpeg
Accuracy: 0.5671476137624861
Precision: 0.5852272727272727
F-measure: 0.34563758389261745
Recall: 0.24523809523809523
FFmpeg FFmpeg
Accuracy: 0.5982241953385128
Precision: 0.5601659751037344
F-measure: 0.5986696230598669
Recall: 0.6428571428571429
FFmpeg qemu
Accuracy: 0.4938956714761376
Precision: 0.5127118644067796
F-measure: 0.3467048710601719
Recall: 0.2619047619047619
RQ5:
reGVD diversevul
accuracy = 97.055
f1_score = 72.5047
precision = 81.741
recall = 65.1438
diversevul_119
04/28/2024 06:14:04 - INFO - __main__ - accuracy = 90.015
04/28/2024 06:14:04 - INFO - __main__ - f1_score = 80.024
04/28/2024 06:14:04 - INFO - __main__ - precision = 81.4559
04/28/2024 06:14:04 - INFO - __main__ - recall = 78.6415
diversevul_125
04/28/2024 10:53:50 - INFO - __main__ - accuracy = 91.4035
04/28/2024 10:53:50 - INFO - __main__ - f1_score = 81.9519
04/28/2024 10:53:50 - INFO - __main__ - precision = 84.8111
04/28/2024 10:53:50 - INFO - __main__ - recall = 79.2793