Title: Question about doc_to_decontamination_query and should_decontaminate keys impact on evaluation results
Body:
Hi,
Thanks for the great work!
I have a question regarding the YAML configuration for tasks. In the task settings, I noticed there are two keys: doc_to_decontamination_query and should_decontaminate.
I would like to ask: do these two keys affect the final evaluation results?
The reason I'm asking is that when I evaluated Qwen2.5-32B, I observed some noticeable differences compared to the published results — especially on CruxEval and TriviaQA-Adv. On the other hand, the results on GSM8K and GSM8K-Platinum were quite close to the ones reported.
Would appreciate any clarification or suggestions on this!
Thanks again!
Title: Question about doc_to_decontamination_query and should_decontaminate keys impact on evaluation results
Body:
Hi,
Thanks for the great work!
I have a question regarding the YAML configuration for tasks. In the task settings, I noticed there are two keys: doc_to_decontamination_query and should_decontaminate.
I would like to ask: do these two keys affect the final evaluation results?
The reason I'm asking is that when I evaluated Qwen2.5-32B, I observed some noticeable differences compared to the published results — especially on CruxEval and TriviaQA-Adv. On the other hand, the results on GSM8K and GSM8K-Platinum were quite close to the ones reported.
Would appreciate any clarification or suggestions on this!
Thanks again!