feat(rankings): multimodal support for Cohere ranking endpoint by jzakrzew · Pull Request #896 · ai-dynamo/aiperf

jzakrzew · 2026-05-07T14:46:20Z

Add multimodal input support to the cohere_rankings endpoint for vLLM’s vision rerank API, including structured text, image, and video document payload formatting. See:
https://docs.vllm.ai/en/latest/examples/pooling/score/#vision-rerank-api-online

We already have text-only Cohere rerank support. This keeps the existing text-only payload shape unchanged, while switching to structured Cohere documents when media is present.

Changes

Extend the shared rankings base endpoint to pass media inputs into endpoint-specific payload builders.
Add multimodal payload support to cohere_rankings for text, image, and video rerank documents.
Add synthetic multimodal ranking dataset generation
Add validation, mock server support, tests, and documentation for multimodal rankings inputs.

Example usage

Set up a vLLM server:

vllm serve nvidia/llama-nemotron-rerank-vl-1b-v2 \
  --runner pooling \
  --trust-remote-code \
  --chat-template "$(curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm/main/examples/pooling/score/template/nemotron-vl-rerank.jinja)"

Run AIPerf with synthetic multimodal rankings inputs:

aiperf profile \
      -m nvidia/llama-nemotron-rerank-vl-1b-v2 \
      --endpoint-type cohere_rankings \
      --custom-endpoint /rerank \
      --url localhost:8000 \
      --request-count 10 \
      --rankings-passages-mean 4 \
      --rankings-passages-stddev 0 \
      --rankings-passages-prompt-token-mean 32 \
      --rankings-passages-prompt-token-stddev 0 \
      --rankings-query-prompt-token-mean 16 \
      --rankings-query-prompt-token-stddev 0 \
      --image-width-mean 224 \
      --image-width-stddev 0 \
      --image-height-mean 224 \
      --image-height-stddev 0 \
      --image-batch-size 1

Summary by CodeRabbit

New Features
- Multimodal reranking: Cohere Rankings now supports ranking with text, images, and videos (index-aligned across modalities) and synthetic multimodal dataset generation.
Documentation
- Added vLLM multimodal reranking guidance, CLI examples, and instructions for JSONL inputs with base64 image data.
Tests
- Expanded unit and integration tests to cover multimodal payloads, dataset composition, and endpoint metadata.
Bug Fixes / Validation
- Added media count validations and explicit rejection of unsupported audio inputs.

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

copy-pr-bot · 2026-05-07T14:46:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-07T14:46:38Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@17968c9298a63a47b928b20975eee427c2da4702

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@17968c9298a63a47b928b20975eee427c2da4702

Last updated for commit: 17968c9 • Browse code

codecov · 2026-05-07T14:55:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai · 2026-05-07T14:57:26Z

Walkthrough

This PR extends AIPerf's Cohere Rankings endpoint to support multimodal requests containing image and video content alongside text. The implementation includes enhanced dataset composition, index-aligned payload construction, metadata-driven validation, and comprehensive test coverage across integration and unit scopes.

Changes

Multimodal Cohere Rankings Feature

Layer / File(s)	Summary
Data Contracts & Signatures `src/aiperf/endpoints/base_rankings_endpoint.py`, `src/aiperf/endpoints/cohere_rankings.py`, `src/aiperf/endpoints/hf_tei_rankings.py`, `src/aiperf/endpoints/nim_rankings.py`, `src/aiperf/plugin/plugins.yaml`, `tests/aiperf_mock_server/models.py`	`BaseRankingsEndpoint.build_payload` abstract signature expanded to accept optional `images`, `videos`, `audios` keyword parameters; `CohereRankingsEndpoint` introduces multimodal parameters and document helpers; HF TEI and NIM accept multimodal params for interface consistency; plugin metadata declares `supports_images` and `supports_videos`; mock server broadens `CohereRerankRequest.documents` to accept multimodal structures.
Cohere Multimodal Logic `src/aiperf/endpoints/cohere_rankings.py`	`build_payload` rewritten to accept multimodal inputs and reject audio; new `_build_documents`, `_validate_document_counts`, and `_document_count` helpers construct index-paired document objects with content arrays combining text and optional image/video URL references.
Base Endpoint Extraction & Validation `src/aiperf/endpoints/base_rankings_endpoint.py`	`format_payload` refactored with new extraction helpers (`_extract_rankings_texts`, `_extract_media_contents`, `_select_query_text`, `_warn_if_no_documents`, `_validate_media_support`) to handle media-aware turn parsing, metadata-driven validation, and error handling.
Text-Only Endpoint Stubs `src/aiperf/endpoints/hf_tei_rankings.py`, `src/aiperf/endpoints/nim_rankings.py`	Payload construction logic remains unchanged while accepting multimodal parameters for interface consistency.
Synthetic Dataset Composition `src/aiperf/dataset/composer/synthetic_rankings.py`	`_create_turn` extended to conditionally generate and append `Image` and `Video` payloads via new helpers; `include_image` and `include_video` properties determine generation based on batch size and dimension configuration.
Request/Response Handling `tests/aiperf_mock_server/models.py`	`CohereRerankRequest.passage_texts` property now parses multimodal `documents` field, extracting text and media URL representations via new `_document_to_text` and `_media_url_text` helpers.
Integration & Unit Tests `tests/integration/utils.py`, `tests/component_integration/endpoints/test_rankings_endpoint.py`, `tests/unit/dataset/composer/test_synthetic_rankings_composer.py`, `tests/unit/endpoints/test_cohere_rankings_endpoint.py`, `tests/unit/endpoints/test_hf_tei_rankings_endpoint.py`, `tests/unit/endpoints/test_nim_rankings_endpoint.py`, `tests/unit/server/test_models.py`	New `create_multimodal_rankings_dataset` utility; integration tests run multimodal profiling against Cohere endpoint; unit tests validate synthetic media generation per passage, payload formatting with mixed modalities, count mismatches, unsupported media rejection, and mock server parsing.
Documentation `docs/tutorials/rankings.md`	"Profile vLLM Vision Rerank Models" section added covering multimodal request payload structure, index alignment requirements, synthetic input generation, and AIPerf CLI usage examples with custom `multimodal-rankings.jsonl` datasets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through mocks and payload streams,
Images and videos joining ranking dreams.
Per-index pairing kept each item true,
Text and vision stitched into the view.
A little rabbit cheers the multimodal crew!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding multimodal support to the Cohere ranking endpoint, which is the core objective of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/aiperf/endpoints/nim_rankings.py (1)

23-29: ⚡ Quick win

Prefer explicit rejection of unsupported media instead of silently discarding it.

At Line 28, media args are ignored. Failing fast here prevents accidental data loss when build_payload is called directly and keeps behavior explicit.

Proposed fix

     def build_payload(
         self,
         query_text: str,
         passages: Sequence[str],
         model_name: str,
         *,
         images: Sequence[str] = (),
         videos: Sequence[str] = (),
         audios: Sequence[str] = (),
     ) -> dict[str, Any]:
         """Build payload to match NIM rankings API schema."""
-        _ = images, videos, audios
+        if images or videos or audios:
+            raise ValueError(
+                "NIM rankings does not support image, video, or audio input."
+            )
         payload = {
             "model": model_name,
             "query": {"text": query_text},
             "passages": [{"text": p} for p in passages],
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/aiperf/endpoints/nim_rankings.py` around lines 23 - 29, The build_payload
function currently ignores the media arguments (images, videos, audios) by
assigning them to _ and silently discarding any input; change this to fail-fast
by validating those parameters at the start of build_payload and raising a clear
exception (e.g., ValueError) if any of images, videos, or audios is non-empty,
mentioning which unsupported media was passed so callers know why the call
failed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/aiperf/dataset/composer/synthetic_rankings.py`:
- Around line 110-117: The _generate_video_payloads function currently skips
falsy results from video_generator.generate(), which can silently reduce the
number of videos and break passage alignment; change it to fail fast by checking
the result of video_generator.generate() and raising a clear exception (e.g.,
RuntimeError or ValueError) that includes context (count requested and index)
when data is falsy, or alternatively append a deterministic placeholder object
to Video.contents to preserve one-to-one correspondence; update references in
_generate_video_payloads, Video (contents), and video_generator.generate
accordingly.

---

Nitpick comments:
In `@src/aiperf/endpoints/nim_rankings.py`:
- Around line 23-29: The build_payload function currently ignores the media
arguments (images, videos, audios) by assigning them to _ and silently
discarding any input; change this to fail-fast by validating those parameters at
the start of build_payload and raising a clear exception (e.g., ValueError) if
any of images, videos, or audios is non-empty, mentioning which unsupported
media was passed so callers know why the call failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a61ee8c9-460f-40fb-866e-9d7012727fdf

📥 Commits

Reviewing files that changed from the base of the PR and between 1393442 and 3a66f4e.

📒 Files selected for processing (15)

docs/tutorials/rankings.md
src/aiperf/dataset/composer/synthetic_rankings.py
src/aiperf/endpoints/base_rankings_endpoint.py
src/aiperf/endpoints/cohere_rankings.py
src/aiperf/endpoints/hf_tei_rankings.py
src/aiperf/endpoints/nim_rankings.py
src/aiperf/plugin/plugins.yaml
tests/aiperf_mock_server/models.py
tests/component_integration/endpoints/test_rankings_endpoint.py
tests/integration/utils.py
tests/unit/dataset/composer/test_synthetic_rankings_composer.py
tests/unit/endpoints/test_cohere_rankings_endpoint.py
tests/unit/endpoints/test_hf_tei_rankings_endpoint.py
tests/unit/endpoints/test_nim_rankings_endpoint.py
tests/unit/server/test_models.py

coderabbitai · 2026-05-07T14:57:29Z

+    def _generate_video_payloads(self, count: int) -> Video:
+        """Generate one synthetic video per ranking passage."""
+        video = Video(name="video_url")
+        for _ in range(count):
+            data = self.video_generator.generate()
+            if data:
+                video.contents.append(data)
+        return video


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid silently dropping generated videos; fail fast on generation miss.

At Line 115, falsy data is skipped, which can produce fewer videos than passages and trigger downstream count-mismatch errors far from the source. Raise immediately (or append a placeholder) to keep per-passage alignment deterministic.

Proposed fix

def _generate_video_payloads(self, count: int) -> Video: """Generate one synthetic video per ranking passage.""" video = Video(name="video_url") for _ in range(count): data = self.video_generator.generate() - if data: - video.contents.append(data) + if not data: + raise ValueError( + "Video generation returned empty content while multimodal rankings are enabled." + ) + video.contents.append(data) return video

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/aiperf/dataset/composer/synthetic_rankings.py` around lines 110 - 117, The _generate_video_payloads function currently skips falsy results from video_generator.generate(), which can silently reduce the number of videos and break passage alignment; change it to fail fast by checking the result of video_generator.generate() and raising a clear exception (e.g., RuntimeError or ValueError) that includes context (count requested and index) when data is falsy, or alternatively append a deterministic placeholder object to Video.contents to preserve one-to-one correspondence; update references in _generate_video_payloads, Video (contents), and video_generator.generate accordingly.

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

coderabbitai

🧹 Nitpick comments (2)

tests/unit/dataset/composer/test_synthetic_rankings_composer.py (2)

128-130: ⚡ Quick win

Avoid hard-coded passage multiplier in expected call count

expected_media_count is tied to a magic number (* 3). This makes the test brittle if passage generation defaults/config change. Derive the expected count from generated turns (sum of passage counts) or from the configured mean variable in the test setup.

Proposed test hardening

-    expected_media_count = synthetic_config.input.conversation.num_dataset_entries * 3
-    assert generate_image.call_count == expected_media_count
-    assert generate_video.call_count == expected_media_count
+    expected_media_count = sum(
+        len(conversation.turns[0].texts[1].contents) for conversation in dataset
+    )
+    assert generate_image.call_count == expected_media_count
+    assert generate_video.call_count == expected_media_count

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/dataset/composer/test_synthetic_rankings_composer.py` around lines
128 - 130, Replace the hard-coded multiplier used to compute
expected_media_count with a calculation derived from the actual passages
produced by the composer or from the test's configured mean; specifically,
compute expected_media_count by summing the number of passages across the
generated turns (or reading the configured passages-per-turn mean used in the
test setup) rather than using "* 3", then assert generate_image.call_count and
generate_video.call_count against that computed value (references:
expected_media_count, generate_image, generate_video,
synthetic_config.input.conversation.num_dataset_entries).

162-181: ⚡ Quick win

Assert generators are not called when media batch size is zero

This test validates output shape ([]) but not the “disabled generation” behavior. Patch both generators and assert zero calls to prevent regressions where media is still generated then discarded.

Proposed coverage extension

-    composer = SyntheticRankingsDatasetComposer(synthetic_config, mock_tokenizer)
-    dataset = composer.create_dataset()
+    composer = SyntheticRankingsDatasetComposer(synthetic_config, mock_tokenizer)
+    with (
+        patch.object(composer.image_generator, "generate") as generate_image,
+        patch.object(composer.video_generator, "generate") as generate_video,
+    ):
+        dataset = composer.create_dataset()
 
     for conversation in dataset:
         turn = conversation.turns[0]
         assert turn.images == []
         assert turn.videos == []
+    generate_image.assert_not_called()
+    generate_video.assert_not_called()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/dataset/composer/test_synthetic_rankings_composer.py` around lines
162 - 181, Extend the test to patch/mock the media generator functions used by
SyntheticRankingsDatasetComposer (the image and video generator callables used
internally when producing turns) before calling
SyntheticRankingsDatasetComposer.create_dataset, then assert those mocks were
not called when synthetic_config.input.image.batch_size and ...video.batch_size
are 0; this ensures generation is disabled (reference
SyntheticRankingsDatasetComposer and its create_dataset path that invokes the
image/video generators) and prevents regressions where media is produced then
discarded.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/dataset/composer/test_synthetic_rankings_composer.py`:
- Around line 128-130: Replace the hard-coded multiplier used to compute
expected_media_count with a calculation derived from the actual passages
produced by the composer or from the test's configured mean; specifically,
compute expected_media_count by summing the number of passages across the
generated turns (or reading the configured passages-per-turn mean used in the
test setup) rather than using "* 3", then assert generate_image.call_count and
generate_video.call_count against that computed value (references:
expected_media_count, generate_image, generate_video,
synthetic_config.input.conversation.num_dataset_entries).
- Around line 162-181: Extend the test to patch/mock the media generator
functions used by SyntheticRankingsDatasetComposer (the image and video
generator callables used internally when producing turns) before calling
SyntheticRankingsDatasetComposer.create_dataset, then assert those mocks were
not called when synthetic_config.input.image.batch_size and ...video.batch_size
are 0; this ensures generation is disabled (reference
SyntheticRankingsDatasetComposer and its create_dataset path that invokes the
image/video generators) and prevents regressions where media is produced then
discarded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bb54f89-c4c2-40b9-825f-97343ed6d4b2

📥 Commits

Reviewing files that changed from the base of the PR and between 3a66f4e and 17968c9.

📒 Files selected for processing (2)

tests/unit/dataset/composer/test_synthetic_rankings_composer.py
tests/unit/endpoints/test_cohere_rankings_endpoint.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/endpoints/test_cohere_rankings_endpoint.py

jzakrzew and others added 3 commits May 7, 2026 14:43

feat(rankings): support multimodal Cohere rerank payloads

5edd7a5

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

feat(rankings): add synthetic multimodal rankings data

c2af23a

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

fix an unnecessary lazy import

3a66f4e

Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

github-actions Bot added the feat label May 7, 2026

dynamo-ops reviewed May 7, 2026

View reviewed changes

Comment thread src/aiperf/endpoints/base_rankings_endpoint.py

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

test(rankings): cover multimodal edge cases

17968c9

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rankings): multimodal support for Cohere ranking endpoint#896

feat(rankings): multimodal support for Cohere ranking endpoint#896
jzakrzew wants to merge 4 commits into
ai-dynamo:mainfrom
jzakrzew:cohere-rerank-endpoint-multimodal

jzakrzew commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jzakrzew commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example usage

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jzakrzew commented May 7, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading