Add Nvidia inference specification #5794

Jan-Kazlouski-elastic · 2025-12-05T10:55:28Z

This PR adds changes to specification caused by elastic/elasticsearch#132388

Additional actions

Signed the CLA
Executed make contrib

github-actions · 2025-12-05T10:59:56Z

Following you can find the validation changes against the target branch for the API.

API	Status	Request	Response
`inference.put_nvidia`	➕ ⚪	Missing test	Missing test

You can validate this API yourself by using the make validate target.

# Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

DonalEvans · 2025-12-12T00:06:03Z

package.json

  },
  "dependencies": {
-    "@redocly/cli": "^1.34.5"
+    "@redocly/cli": "^1.34.6"


I don't think this should be getting changed here.

This is done by make setup & make contrib execution. It is mandatory to execute that before merging. Changing it to ^1.34.5 results in it going back to ^1.34.6 as soon as make setup is called.

DonalEvans · 2025-12-12T00:35:08Z

specification/_json_spec/inference.put_nvidia.json

+                "rerank",
+                "text_embedding",
+                "completion",
+                "chat_completion"


Nitpick, but could these be in alphabetical order?

DonalEvans · 2025-12-12T00:41:01Z

specification/inference/_types/CommonTypes.ts

+   */
+  model_id: string
+  /**
+   * For a `text_embedding` task, the maximum number of tokens per input before chunking occurs.


This should be "For a `text_embedding` task, the maximum number of tokens per input. Inputs exceeding this value are truncated prior to sending to the Nvidia API."

This is wrong almost everywhere in the docs; there's an issue describing some of the problems with max_input_tokens.

DonalEvans · 2025-12-12T00:41:41Z

specification/inference/_types/CommonTypes.ts

+  text_embedding,
+  completion,
+  chat_completion,
+  rerank


For consistency, could these be in alphabetical order?

DonalEvans · 2025-12-12T00:49:13Z

specification/inference/_types/CommonTypes.ts

+   */
+  input_type?: NvidiaInputType
+  /**
+   * For a `text_embedding` task, the method to handle inputs longer than the maximum token length.


To help differentiate this from max_input_tokens it might be better to word it like "the method used by the Nvidia model to handle inputs longer than..."

Good thinking. Fixed.

DonalEvans · 2025-12-12T00:54:58Z

specification/inference/_types/CommonTypes.ts

+  /**
+   * The URL of the Nvidia model endpoint.
+   */


Would it be helpful to include the default URLs for each task type if url isn't specified?

Good idea. Added.

DonalEvans · 2025-12-12T00:59:38Z

specification/inference/_types/TaskType.ts

+  text_embedding,
+  chat_completion,
+  completion,
+  rerank


For consistency, could these be in alphabetical order?

…ng task types

Jan-Kazlouski-elastic · 2025-12-16T12:50:01Z

Hi @DonalEvans
Thank you for your review. Comments are addressed. Could you please give this PR another look?

DonalEvans · 2025-12-16T18:55:04Z

specification/inference/_types/CommonTypes.ts

  mistral
 }

+export class NvidiaServiceSettings {


There should also be a dimensions field documented for the text_embedding task, sorry for missing this in the first review.

I don't think we should declare the parameters the user can't specify.
According to your suggestion from this comment - dimensions cannot be set during the creation of an endpoint and are not sent to the Nvidia

Urgh, my mistake, sorry for forgetting about that, I've been up to my eyeballs in the Jina text embedding code lately (which does support specifying dimensions) and the context switch tripped me up.

DonalEvans · 2025-12-16T18:57:39Z

specification/inference/_types/CommonTypes.ts

+   * * `ingest`: Mapped to Nvidia's `passage` value in request. Used when generating embeddings during indexing.
+   * * `search`: Mapped to Nvidia's `query` value in request. Used when generating embeddings during querying.
+   *
+   * IMPORTANT: If not specified `input_type` field in request to Nvidia endpoint is set as `query` by default.


This would be better as "For Nvidia endpoints, if the `input_type` field is not specified, it defaults to `query`."

DonalEvans · 2025-12-16T19:00:32Z

specification/inference/_types/CommonTypes.ts

+   */
+  max_input_tokens?: integer
+  /**
+   * For a `text_embedding` task, the similarity measure. One of cosine, dot_product, l2_norm.


It would be good to add that if no similarity measure is specified, the default value is dot_product.

# Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

Add Nvidia inference specification

f9ac65a

Jan-Kazlouski-elastic assigned DonalEvans Dec 5, 2025

Jan-Kazlouski-elastic requested a review from a team as a code owner December 5, 2025 10:55

Jan-Kazlouski-elastic added specification ml skip-backport This pull request should not be backported Team:ML labels Dec 5, 2025

Jan-Kazlouski-elastic requested a review from DonalEvans December 5, 2025 11:04

Merge remote-tracking branch 'origin/main' into nvidia-integration

3b8f3a5

# Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

DonalEvans reviewed Dec 12, 2025

View reviewed changes

Jan-Kazlouski-elastic added 2 commits December 16, 2025 11:52

Merge remote-tracking branch 'origin/main' into nvidia-integration

43756d4

Enhance Nvidia integration by updating task descriptions and reorderi…

a5ba9ef

…ng task types

Jan-Kazlouski-elastic requested a review from DonalEvans December 16, 2025 12:49

DonalEvans reviewed Dec 16, 2025

View reviewed changes

Jan-Kazlouski-elastic added 2 commits December 16, 2025 19:12

Update @redocly/cli and related dependencies to version 1.34.5

8041d23

Clarify documentation for Nvidia input_type and similarity fields

472f855

Jan-Kazlouski-elastic requested a review from DonalEvans December 16, 2025 20:08

Merge remote-tracking branch 'origin/main' into nvidia-integration

4b85eee

# Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

DonalEvans approved these changes Dec 16, 2025

View reviewed changes

Update output

caff811

Jan-Kazlouski-elastic enabled auto-merge (squash) December 16, 2025 20:16

Jan-Kazlouski-elastic merged commit e5d156f into main Dec 16, 2025
9 checks passed

Jan-Kazlouski-elastic deleted the nvidia-integration branch December 16, 2025 20:16

Add Nvidia inference specification #5794

Add Nvidia inference specification #5794

Uh oh!

Conversation

Jan-Kazlouski-elastic commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic commented Dec 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 5, 2025 •

edited

Loading