This document provides instructions on how to run the evaluation experiments for our method as presented in our paper. Before proceeding, ensure that you have completed the installation and data preparation steps outlined in installation, prepare_data, and prepare_prompts.
The evaluation scripts are located in the CodeGraph directory. Each script corresponds to the evaluations for a specific table in the paper.
To reproduce the results in Table 1, which evaluates our method (CodeGraph) and baseline for other prompting methods using GPT-3.5, run the following scripts:
cd CodeGraph
./run_cg_gpt35_table1.shcd CodeGraph
./run_cot_gpt35_table1.shcd CodeGraph
./run_few_shot_gpt35_table1.shcd CodeGraph
./run_zero_shot_gpt35_table1.shTo compare the accuracy of various graph generators on different graph tasks using two graph encoding functions with GPT-3.5 Turbo, run:
cd CodeGraph
./run_diff_graphs_cg_gpt35_table2.shThis script evaluates our method with different graph structures including "er", "ba", "sbm", "sfn", "complete", "star", and "path", using two encoding methods "adjacency" and "friendship".
To evaluates the generalization performance of our method across various graph structures using GPT-3.5, run:
cd CodeGraph
./run_generalization_graphs_cg_gpt35_table3.shThis script evaluates the generalization performance of CodeGraph on tasks such as Connected Nodes and Node Degree with different graph encoding functions.
To evaluate our method with different models like Llama_3_70B, Mixtral_8x7B, and Mixtral_8x22B. To do this, modify the MODEL_NAMES variable in the existing scripts or create new scripts.
-
Copy the Existing Script
cd CodeGraph cp run_cg_gpt35_table1.sh run_cg_table4.sh -
Edit
run_cg_table4.shOpen
run_cg_table4.shand modify theMODEL_NAMESvariable:MODEL_NAMES=("Llama_3_70B") # Options: "Mixtral_8x7B", "Mixtral_8x22B"
-
Run the Modified Script
./run_cg_table4.sh
Note: Ensure that you have the appropriate API keys and configurations for the new models as per the installation instructions.
To evaluate our method with different numbers of shots (e.g., 2-shots, 3-shots). Ensure that you have prepare the prompts before evaluation.
-
Copy the Existing Script
cd CodeGraph cp run_cg_gpt35_table1.sh run_cg_shots_table5.sh -
Edit
run_cg_shots_table5.shModify the
K_SHOTvariable according to the prepared prompt:e.g:
K_SHOT=2
Note: The
K_SHOTvalue should match the number of exemplars used in the prepared prompts. For instance, if you created 2-shot prompts viaprepare_prompts.md, setK_SHOT=2in the script. -
Run the Modified Script
./run_cg_shots_table5.sh
You can customize the evaluation scripts by modifying variables to evaluate different tasks, encodings, graph generators, prompt methods, models, number of questions, and number of shots.
-
Task Names (
TASK_NAMES):TASK_NAMES=("edge_count" "connected_nodes" "cycle_check" "node_count" "node_degree" "edge_existence")
-
Text Encodings (
TEXT_ENCS):TEXT_ENCS=("adjacency" "coauthorship" "incident" "expert" "friendship" "social_network")
-
Graph Generators (
GRAPH_GENS):GRAPH_GENS=("er" "ba" "sbm" "sfn" "complete" "star" "path")
-
Prompt Method (
PROMPT_METHOD):PROMPT_METHOD="cg" # Options: "cg", "few_shot", "cot", "zero_shot"
-
Model Names (
MODEL_NAMES):MODEL_NAMES=("GPT35" "Llama_3_70B" "Mixtral_8x7B" "Mixtral_8x22B")
-
Number of Questions (
NUMBER_OF_QUESTIONS):NUMBER_OF_QUESTIONS=500
-
Number of Shots (
K_SHOTS):K_SHOTS=1 # Specify the number of exemplars for the cg method
You can also run evaluations manually using the evaluate.py script.
--prompt_source: Specify the prompt source (codegraphorgraphqa).--task_name: Choose fromedge_count,connected_nodes,cycle_check,node_count,node_degree,edge_existence.--text_enc: Choose fromadjacency,coauthorship,incident,expert,friendship,social_network,politician,got,south_park.--graph_gen: Choose fromer,ba,sbm,sfn,complete,star,path, and their variants.--prompt_method: Choose fromfew_shot,cot,zero_shot,cg.--model_name: Choose fromGPT35,Llama_3_70B,Mixtral_8x7B,Mixtral_8x22B.--number_of_questions: Specify the number of questions to evaluate.--k_shot: Specify the number of exemplars used for the codegraph method.
python evaluate.py \
--prompt_source "codegraph" \
--task_name "node_count" \
--text_enc "adjacency" \
--graph_gen "er" \
--prompt_method "cg" \
--model_name "Llama_3_70B" \
--number_of_questions 500 \
--k_shot 1- Logs and Results: The records for each thread and the evaluation results will be stored in the
logsandresultsdirectories, respectively. - API Keys: Ensure you have set up your API credentials as per the installation instructions.
- Contact: For questions or issues, please refer to the README.md or contact the project maintainers.