Hi, thank you for the cool work!
I have a question regarding the Fashion200K numbers reported in the paper.
In Table 1, for qt → ci (Fashion200K), the paper reports:
Train: 15K, Dev: 1.7K, Test: 1.7K, Pool: 201K
However, in Section 6.3 (Data Collection), under the Fashion200k paragraph, it says:
“The original test data is evenly divided into a validation and test set. We converted the dataset into M-BEIR format. In total, we have 15K task 1 (qt → ci).”
Could you please clarify how the 15K total becomes 1.7K for the dev and test splits?