Skip to content

Max samples param for mteb test#3966

Open
dkalinowski wants to merge 2 commits intomainfrom
max-samples-for-mteb
Open

Max samples param for mteb test#3966
dkalinowski wants to merge 2 commits intomainfrom
max-samples-for-mteb

Conversation

@dkalinowski
Copy link
Collaborator

@dkalinowski dkalinowski commented Feb 10, 2026

🛠 Summary

Requested on standup

To test only 20 samples, run:

python ovms_mteb.py --model emb --service_url http://localhost:11338/v3/embeddings --eval_splits test --max_samples 20

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CLI options to control which MTEB splits/subsets to evaluate and optionally cap dataset size for faster smoke-test runs.

Changes:

  • Added --eval_splits, --hf_subsets, and --max_samples CLI parameters.
  • Implemented in-place dataset truncation after task.load_data() when --max_samples is set.
  • Tightened OVMSModel.encode() return type annotation to np.ndarray.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 57 to 64
rng = random.Random(seed)

def _truncate_split(dataset, n):
if len(dataset) <= n:
return dataset
indices = list(range(len(dataset)))
rng.shuffle(indices)
return dataset.select(sorted(indices[:n]))
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This truncation path materializes a full indices list of size len(dataset), shuffles it, then sorts a slice. For large datasets this is unnecessarily memory/CPU heavy. Prefer using the Hugging Face datasets primitives (e.g., dataset.shuffle(seed=...).select(range(n)) or equivalent) to avoid building/shuffling a full index list and to remove the extra sorted(...) cost.

Suggested change
rng = random.Random(seed)
def _truncate_split(dataset, n):
if len(dataset) <= n:
return dataset
indices = list(range(len(dataset)))
rng.shuffle(indices)
return dataset.select(sorted(indices[:n]))
def _truncate_split(dataset, n):
if len(dataset) <= n:
return dataset
# Use Hugging Face datasets primitives for efficient shuffling and selection
shuffled = dataset.shuffle(seed=seed)
return shuffled.select(range(n))

Copilot uses AI. Check for mistakes.
Comment on lines 67 to 75
for key in task.dataset:
value = task.dataset[key]
if isinstance(value, DatasetDict):
# Multilingual: subset_name -> DatasetDict(split -> Dataset)
for split in value:
value[split] = _truncate_split(value[split], max_samples)
else:
# Flat: split -> Dataset
task.dataset[key] = _truncate_split(value, max_samples)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key/value naming here makes it harder to follow the two supported shapes (multilingual subset→splits vs flat split→dataset). Renaming to something shape-specific (e.g., subset_name/splits and split_name/split_ds) would make the control flow and data model much clearer, especially since both levels use dictionary iteration.

Suggested change
for key in task.dataset:
value = task.dataset[key]
if isinstance(value, DatasetDict):
# Multilingual: subset_name -> DatasetDict(split -> Dataset)
for split in value:
value[split] = _truncate_split(value[split], max_samples)
else:
# Flat: split -> Dataset
task.dataset[key] = _truncate_split(value, max_samples)
for subset_name in task.dataset:
subset_data = task.dataset[subset_name]
if isinstance(subset_data, DatasetDict):
# Multilingual: subset_name -> DatasetDict(split_name -> Dataset)
for split_name, split_ds in subset_data.items():
subset_data[split_name] = _truncate_split(split_ds, max_samples)
else:
# Flat: split_name -> Dataset
task.dataset[subset_name] = _truncate_split(subset_data, max_samples)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant