Skip to content

Support vector search on Cosmos DB#33991

Merged
ajcvickers merged 2 commits intomainfrom
VelocityIsAVector
Aug 12, 2024
Merged

Support vector search on Cosmos DB#33991
ajcvickers merged 2 commits intomainfrom
VelocityIsAVector

Conversation

@ajcvickers
Copy link
Copy Markdown
Contributor

Fixes #33783

This PR introduces:

  • IsVector() to configure a property to be configured as a vector (embedding) in the document.
    • The distance function and dimensions are specified.
    • The data type can be specified, or otherwise is inferred.
  • HasIndex().ForVectors() to configure a vector index over a vector property.
  • VectorDistance() which translates to the Cosmos VectorDistance function
    • The distance function and data type are taken from the property mapping, or can be overridden.

Known issues:

  • Float16 (Half) is not working in Cosmos--needs investigation
  • Exception on int array case--could be EF or Cosmos--needs investigation
  • Owned types mess up the materialization--this will be fixed by the ReadItem improvements I am working on

@ajcvickers ajcvickers requested a review from a team June 14, 2024 12:03
Comment thread src/EFCore.Cosmos/EFCore.Cosmos.csproj Outdated
Comment thread src/EFCore.Cosmos/Extensions/CosmosDbFunctionsExtensions.cs Outdated
Comment thread src/EFCore.Cosmos/Extensions/CosmosDbFunctionsExtensions.cs Outdated
@ajcvickers
Copy link
Copy Markdown
Contributor Author

@roji I added a JSON fragment expression, as we discussed.

Copy link
Copy Markdown
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! See mainly nits.

Probably a good idea for @AndriySvyryd to take a look for the metadata/model and CosmosClientWrapper stuff.

Comment thread src/EFCore.Cosmos/Extensions/CosmosDbFunctionsExtensions.cs Outdated
Comment thread src/EFCore.Cosmos/Extensions/CosmosDbFunctionsExtensions.cs
Comment thread src/EFCore.Cosmos/Extensions/CosmosIndexBuilderExtensions.cs
Comment thread src/EFCore.Cosmos/Extensions/CosmosPropertyBuilderExtensions.cs
Comment thread src/EFCore.Cosmos/Extensions/CosmosPropertyBuilderExtensions.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/VectorSearchCosmosTest.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/VectorSearchCosmosTest.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/VectorSearchCosmosTest.cs
Comment thread test/EFCore.Cosmos.FunctionalTests/VectorSearchCosmosTest.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/VectorSearchCosmosTest.cs Outdated
@ajcvickers ajcvickers force-pushed the VelocityIsAVector branch 3 times, most recently from 8c3fdf6 to 15f5a2c Compare June 17, 2024 11:34
@ajcvickers ajcvickers requested a review from roji June 17, 2024 11:34
@ajcvickers
Copy link
Copy Markdown
Contributor Author

@Pilchie What is the new guidance here given that the latest SDK has made all the code to configure vector indexes and embeddings internal?

@ajcvickers ajcvickers requested a review from a team July 30, 2024 17:40
@ajcvickers
Copy link
Copy Markdown
Contributor Author

This has been updated to the latest SDK and the container configuration has been made to throw. Verified that all the tests pass if the container has already been created.

Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosClientWrapper.cs Outdated
Copy link
Copy Markdown
Member

@AndriySvyryd AndriySvyryd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test to CompiledModelCosmosTest

@ajcvickers ajcvickers force-pushed the VelocityIsAVector branch 4 times, most recently from 59ce915 to b2b21ae Compare August 6, 2024 13:41
@ajcvickers
Copy link
Copy Markdown
Contributor Author

@roji This is now constrained.

Copy link
Copy Markdown
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As discussed offline, we really should deliver this feature with support for arrays as well (e.g. float[]). Maybe the best way forward is to build in support for it in this PR for the type mapping side and have failing tests for the query translation, which I'd work on fixing after rc1 (given feature work stops in a few days).

BTW at what level is Half (float16) not supported? Is the Cosmos SDK the blocking one here? If the intention is for it to be supported soon, it may be worth just implementing it on our side and letting Cosmos/the SDK fail, since it'll be harder to add it in later.

Comment thread src/EFCore.Cosmos/Extensions/CosmosPropertyBuilderExtensions.cs Outdated
/// <param name="property">The property.</param>
/// <param name="vectorType">The type of vector stored in the property.</param>
[Experimental(EFDiagnostics.CosmosVectorSearchExperimental)]
public static void SetVectorType(this IMutableProperty property, CosmosVectorType? vectorType)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW does Cosmos actually support nullable vector properties? Might be worth checking

<value>The type '{givenType}' cannot be mapped as a dictionary because it does not implement '{dictionaryType}'.</value>
</data>
<data name="BadVectorDataType" xml:space="preserve">
<value>The type '{clrType}' is being used as a vector, but the vector data type cannot be inferred. Only 'ReadOnlyMemory&lt;byte&gt;, ReadOnlyMemory&lt;sbyte&gt;, and ReadOnlyMemory&lt;float&gt; are supported.</value>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, arrays should also work.

Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosTypeMappingSource.cs Outdated
@ajcvickers ajcvickers merged commit ce41847 into main Aug 12, 2024
@ajcvickers ajcvickers deleted the VelocityIsAVector branch August 12, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cosmos: support vector search

7 participants