Max samples param for mteb test by dkalinowski · Pull Request #3966 · openvinotoolkit/model_server

dkalinowski · 2026-02-10T10:59:57Z

🛠 Summary

Requested on standup

To test only 20 samples, run:

python ovms_mteb.py --model emb --service_url http://localhost:11338/v3/embeddings --eval_splits test --max_samples 20

Copilot

Pull request overview

Adds CLI options to control which MTEB splits/subsets to evaluate and optionally cap dataset size for faster smoke-test runs.

Changes:

Added --eval_splits, --hf_subsets, and --max_samples CLI parameters.
Implemented in-place dataset truncation after task.load_data() when --max_samples is set.
Tightened OVMSModel.encode() return type annotation to np.ndarray.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T11:01:34Z

demos/embeddings/ovms_mteb.py

+    rng = random.Random(seed)
+
+    def _truncate_split(dataset, n):
+        if len(dataset) <= n:
+            return dataset
+        indices = list(range(len(dataset)))
+        rng.shuffle(indices)
+        return dataset.select(sorted(indices[:n]))


This truncation path materializes a full indices list of size len(dataset), shuffles it, then sorts a slice. For large datasets this is unnecessarily memory/CPU heavy. Prefer using the Hugging Face datasets primitives (e.g., dataset.shuffle(seed=...).select(range(n)) or equivalent) to avoid building/shuffling a full index list and to remove the extra sorted(...) cost.

Suggested change

rng = random.Random(seed)

def _truncate_split(dataset, n):

if len(dataset) <= n:

return dataset

indices = list(range(len(dataset)))

rng.shuffle(indices)

return dataset.select(sorted(indices[:n]))

def _truncate_split(dataset, n):

if len(dataset) <= n:

return dataset

# Use Hugging Face datasets primitives for efficient shuffling and selection

shuffled = dataset.shuffle(seed=seed)

return shuffled.select(range(n))

Copilot · 2026-02-10T11:01:35Z

demos/embeddings/ovms_mteb.py

+        for key in task.dataset:
+            value = task.dataset[key]
+            if isinstance(value, DatasetDict):
+                # Multilingual: subset_name -> DatasetDict(split -> Dataset)
+                for split in value:
+                    value[split] = _truncate_split(value[split], max_samples)
+            else:
+                # Flat: split -> Dataset
+                task.dataset[key] = _truncate_split(value, max_samples)


The key/value naming here makes it harder to follow the two supported shapes (multilingual subset→splits vs flat split→dataset). Renaming to something shape-specific (e.g., subset_name/splits and split_name/split_ds) would make the control flow and data model much clearer, especially since both levels use dictionary iteration.

Suggested change

for key in task.dataset:

value = task.dataset[key]

if isinstance(value, DatasetDict):

# Multilingual: subset_name -> DatasetDict(split -> Dataset)

for split in value:

value[split] = _truncate_split(value[split], max_samples)

else:

# Flat: split -> Dataset

task.dataset[key] = _truncate_split(value, max_samples)

for subset_name in task.dataset:

subset_data = task.dataset[subset_name]

if isinstance(subset_data, DatasetDict):

# Multilingual: subset_name -> DatasetDict(split_name -> Dataset)

for split_name, split_ds in subset_data.items():

subset_data[split_name] = _truncate_split(split_ds, max_samples)

else:

# Flat: split_name -> Dataset

task.dataset[subset_name] = _truncate_split(subset_data, max_samples)

dkalinowski requested review from Copilot, dtrawins and ngrozae February 10, 2026 10:59

Copilot AI reviewed Feb 10, 2026

View reviewed changes

save

c202f29

dkalinowski force-pushed the max-samples-for-mteb branch from 1ab48e4 to c202f29 Compare February 10, 2026 11:01

after review

c77482d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max samples param for mteb test#3966

Max samples param for mteb test#3966
dkalinowski wants to merge 2 commits intomainfrom
max-samples-for-mteb

dkalinowski commented Feb 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dkalinowski commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dkalinowski commented Feb 10, 2026 •

edited

Loading