From 3b8fe72b0f94834e4efa94dc3c772de63ac73ef5 Mon Sep 17 00:00:00 2001 From: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com> Date: Mon, 19 Aug 2024 20:37:35 -0700 Subject: [PATCH 1/4] Initial data for Entity Framework connector --- .../0051-entity-framework-as-connector.md | 110 ++++++++ dotnet/Directory.Packages.props | 8 +- dotnet/SK-dotnet.sln | 11 +- .../Connectors.Memory.EntityFramework.csproj | 30 +++ ...ityFrameworkVectorStoreRecordCollection.cs | 248 ++++++++++++++++++ ...eworkVectorStoreRecordCollectionOptions.cs | 21 ++ .../QueryableExtensions.cs | 33 +++ .../EntityFramework/ApplicationDbContext.cs | 16 ++ .../EntityFramework/EntityFrameworkHotel.cs | 43 +++ ...ameworkVectorStoreRecordCollectionTests.cs | 243 +++++++++++++++++ .../IntegrationTests/IntegrationTests.csproj | 2 + 11 files changed, 761 insertions(+), 4 deletions(-) create mode 100644 docs/decisions/0051-entity-framework-as-connector.md create mode 100644 dotnet/src/Connectors/Connectors.Memory.EntityFramework/Connectors.Memory.EntityFramework.csproj create mode 100644 dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollection.cs create mode 100644 dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollectionOptions.cs create mode 100644 dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs create mode 100644 dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/ApplicationDbContext.cs create mode 100644 dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkHotel.cs create mode 100644 dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkVectorStoreRecordCollectionTests.cs diff --git a/docs/decisions/0051-entity-framework-as-connector.md b/docs/decisions/0051-entity-framework-as-connector.md new file mode 100644 index 000000000000..fd3d86ad7905 --- /dev/null +++ b/docs/decisions/0051-entity-framework-as-connector.md @@ -0,0 +1,110 @@ +--- +# These are optional elements. Feel free to remove any of them. +status: proposed +contact: dmytrostruk +date: 2024-08-19 +deciders: sergeymenshykh, markwallace, rbarreto, westey-m +--- + +# Entity Framework as Vector Store Connector + +## Context and Problem Statement + +This ADR contains investigation results about adding Entity Framework as Vector Store connector to the Semantic Kernel codebase. + +Entity Framework is a modern object-relation mapper that allows to build a clean, portable, and high-level data access layer with .NET (C#) across a variety of databases, including SQL Database (on-premises and Azure), SQLite, MySQL, PostgreSQL, Azure Cosmos DB and more. It supports LINQ queries, change tracking, updates and schema migrations. + +One of the huge benefits of Entity Framework for Semantic Kernel is the support of multiple databases. In theory, one Entity Framework connector can work as a hub to multiple databases at the same time, which should simplify the development and maintenance of integration with these databases. + +However, there are some limitations, which won't allow Entity Framework to fit in updated Vector Store design. + +### Collection Creation + +In new Vector Store design, interface `IVectorStoreRecordCollection` contains methods to manipulate with database collections: +- `CollectionExistsAsync` +- `CreateCollectionAsync` +- `CreateCollectionIfNotExistsAsync` +- `DeleteCollectionAsync` + +In Entity Framework, collection (also known as schema/table) creation using programmatic approach is not recommended in production scenarios. The recommended approach is to use Migrations (in case of code-first approach), or to use Reverse Engineering (also known as scaffolding/database-first approach). Programmatic schema creation is recommended only for testing/local scenarios. Also, collection creation process differs for different databases. For example, MongoDB EF Core provider doesn't support schema migrations or database-first/model-first approaches. Instead, the collection is created automatically when a document is inserted for the first time, if collection doesn't already exist. This brings the complexity around methods such as `CreateCollectionAsync` from `IVectorStoreRecordCollection` interface, since there is no abstraction around collection management in EF that will work for most databases. For such cases, the recommended approach is to handle collection creation individually for each database. As an example, in MongoDB it's recommended to use MongoDB C# Driver directly. + +Sources: +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/ +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/ensure-created +- https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/applying?tabs=dotnet-core-cli#apply-migrations-at-runtime +- https://github.com/mongodb/mongo-efcore-provider?tab=readme-ov-file#not-supported--out-of-scope-features + +### Key Management + +It won't be possible to define one set of valid key types, since not all databases support all types as keys. In such case, it will be possible to support only standard type for keys such as `string`, and then the conversion should be performed to satisfy key restrictions for specific database. This removes the advantage of unified connector implementation, since key management should be handled for each database individually. + +Sources: +- https://learn.microsoft.com/en-us/ef/core/modeling/keys?tabs=data-annotations + +### Vector Management + +`ReadOnlyMemory` type, which is used in most SK connectors today to hold embeddings is not supported in Entity Framework out-of-the-box. When trying to use this type, the following error occurs: + +``` +The property '{Property Name}' could not be mapped because it is of type 'ReadOnlyMemory?', which is not a supported primitive type or a valid entity type. Either explicitly map this property, or ignore it using the '[NotMapped]' attribute or by using 'EntityTypeBuilder.Ignore' in 'OnModelCreating'. +``` + +However, it's possible to use `byte[]` type or create explicit mapping to support `ReadOnlyMemory`. It's already implemented in `pgvector` package, but it's not clear whether it will work with different databases. + +Sources: +- https://github.com/pgvector/pgvector-dotnet/blob/master/README.md#entity-framework-core +- https://github.com/pgvector/pgvector-dotnet/blob/master/src/Pgvector/Vector.cs +- https://github.com/pgvector/pgvector-dotnet/blob/master/src/Pgvector.EntityFrameworkCore/VectorTypeMapping.cs + +### Testing + +Create Entity Framework connector and write the tests using SQLite database doesn't mean that this integration will work for other EF-supported databases. Each database implements its own set of Entity Framework features, so in order to ensure that Entity Framework connector covers main use-cases with specific database, unit/integration tests should be added using each database separately. + +Sources: +- https://github.com/mongodb/mongo-efcore-provider?tab=readme-ov-file#supported-features + +### Compatibility + +It's not possible to use latest Entity Framework Core package and develop it for .NET Standard. Last version of EF Core which supports .NET Standard was version 5.0 (latest EF Core version is 8.0). Which means that Entity Framework connector can target .NET 8.0 only (which is different from other available SK connectors today, which target both net8.0 and netstandard2.0). + +Another way would be to use Entity Framework 6, which can target both net8.0 and netstandard2.0, but this version of Entity Framework is no longer being actively developed. Entity Framework Core offers new features that won't be implemented in EF6. + +Sources: +- https://learn.microsoft.com/en-us/ef/core/miscellaneous/platforms +- https://learn.microsoft.com/en-us/ef/efcore-and-ef6/ + +### Existence of current SK connectors + +Taking into account that Semantic Kernel already has some integration with databases, which are also supported Entity Framework, there are multiple options how to proceed: +- Support both Entity Framework and DB connector (e.g. `Microsoft.SemanticKernel.Connectors.EntityFramework` and `Microsoft.SemanticKernel.Connectors.MongoDB`) - in this case both connectors should produce exactly the same outcome, so additional work will be required (such as implementing the same set of unit/integration tests) to ensure this state. Also, any modifications to the logic should be applied in both connectors. +- Support just one Entity Framework connector (e.g. `Microsoft.SemanticKernel.Connectors.EntityFramework`) - in this case, existing DB connector should be removed, which may be a breaking change to existing customers. An additional work will be required to ensure that Entity Framework covers exactly the same set of features as previous DB connector. +- Support just one DB connector (e.g. `Microsoft.SemanticKernel.Connectors.MongoDB`) - in this case, if such connector already exists - no additional work is required. If such connector doesn't exist and it's important to add it - additional work is required to implement that DB connector. + + +Table with Entity Framework and Semantic Kernel database support (only for databases which support vector search): + +|Database Engine|Maintainer / Vendor|Supported in EF|Supported in SK|Updated to SK memory v2 design +|-|-|-|-|-| +|Azure Cosmos|Microsoft|Yes|Yes|Yes| +|Azure SQL and SQL Server|Microsoft|Yes|Yes|No| +|SQLite|Microsoft|Yes|Yes|No| +|PostgreSQL|Npgsql Development Team|Yes|Yes|No| +|MongoDB|MongoDB|Yes|Yes|No| +|MySQL|Oracle|Yes|No|No| +|Oracle DB|Oracle|Yes|No|No| +|Google Cloud Spanner|Cloud Spanner Ecosystem|Yes|No|No| + +**Note**: +One database engine can have multiple Entity Framework integrations, which can be maintained by different vendors (e.g. there are 2 MySQL EF NuGet packages - one is maintained by Oracle and another one is maintained by Pomelo Foundation Project). + +Sources: +- https://learn.microsoft.com/en-us/ef/core/providers/?tabs=dotnet-core-cli#current-providers + +## Considered Options + +- Add new `Microsoft.SemanticKernel.Connectors.EntityFramework` connector. +- Do not add `Microsoft.SemanticKernel.Connectors.EntityFramework` connector, but add new connector for individual database when needed. + +## Decision Outcome + +TBD. diff --git a/dotnet/Directory.Packages.props b/dotnet/Directory.Packages.props index b1c7dc58eddc..c6ed28fe1043 100644 --- a/dotnet/Directory.Packages.props +++ b/dotnet/Directory.Packages.props @@ -26,6 +26,8 @@ + + @@ -134,8 +136,8 @@ runtime; build; native; contentfiles; analyzers; buildtransitive - - - + + + \ No newline at end of file diff --git a/dotnet/SK-dotnet.sln b/dotnet/SK-dotnet.sln index b6cd87d2040b..47ad928d8e68 100644 --- a/dotnet/SK-dotnet.sln +++ b/dotnet/SK-dotnet.sln @@ -320,7 +320,9 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Connectors.Qdrant.UnitTests EndProject Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "StepwisePlannerMigration", "samples\Demos\StepwisePlannerMigration\StepwisePlannerMigration.csproj", "{38374C62-0263-4FE8-A18C-70FC8132912B}" EndProject -Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "AIModelRouter", "samples\Demos\AIModelRouter\AIModelRouter.csproj", "{E06818E3-00A5-41AC-97ED-9491070CDEA1}" +Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "AIModelRouter", "samples\Demos\AIModelRouter\AIModelRouter.csproj", "{E06818E3-00A5-41AC-97ED-9491070CDEA1}" +EndProject +Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Connectors.Memory.EntityFramework", "src\Connectors\Connectors.Memory.EntityFramework\Connectors.Memory.EntityFramework.csproj", "{C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}" EndProject Global GlobalSection(SolutionConfigurationPlatforms) = preSolution @@ -803,6 +805,12 @@ Global {E06818E3-00A5-41AC-97ED-9491070CDEA1}.Publish|Any CPU.Build.0 = Debug|Any CPU {E06818E3-00A5-41AC-97ED-9491070CDEA1}.Release|Any CPU.ActiveCfg = Release|Any CPU {E06818E3-00A5-41AC-97ED-9491070CDEA1}.Release|Any CPU.Build.0 = Release|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Debug|Any CPU.ActiveCfg = Debug|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Debug|Any CPU.Build.0 = Debug|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Publish|Any CPU.ActiveCfg = Debug|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Publish|Any CPU.Build.0 = Debug|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Release|Any CPU.ActiveCfg = Release|Any CPU + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4}.Release|Any CPU.Build.0 = Release|Any CPU EndGlobalSection GlobalSection(SolutionProperties) = preSolution HideSolutionNode = FALSE @@ -913,6 +921,7 @@ Global {E92AE954-8F3A-4A6F-A4F9-DC12017E5AAF} = {0247C2C9-86C3-45BA-8873-28B0948EDC0C} {38374C62-0263-4FE8-A18C-70FC8132912B} = {5D4C0700-BBB5-418F-A7B2-F392B9A18263} {E06818E3-00A5-41AC-97ED-9491070CDEA1} = {5D4C0700-BBB5-418F-A7B2-F392B9A18263} + {C16F77B2-BFD7-41FF-9BDF-4CF2C4DE79E4} = {24503383-A8C4-4255-9998-28D70FE8E99A} EndGlobalSection GlobalSection(ExtensibilityGlobals) = postSolution SolutionGuid = {FBDC56A3-86AD-4323-AA0F-201E59123B83} diff --git a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/Connectors.Memory.EntityFramework.csproj b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/Connectors.Memory.EntityFramework.csproj new file mode 100644 index 000000000000..a163804d6844 --- /dev/null +++ b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/Connectors.Memory.EntityFramework.csproj @@ -0,0 +1,30 @@ + + + + + Microsoft.SemanticKernel.Connectors.EntityFramework + $(AssemblyName) + net8.0 + $(NoWarn);NU5104;SKEXP0001,SKEXP0010 + alpha + + + + + + + + + Semantic Kernel - Entity Framework Connector + Entity Framework connector for Semantic Kernel plugins and semantic memory + + + + + + + + + + + diff --git a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollection.cs b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollection.cs new file mode 100644 index 000000000000..73f2b54d8e75 --- /dev/null +++ b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollection.cs @@ -0,0 +1,248 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System; +using System.Collections.Generic; +using System.Linq; +using System.Runtime.CompilerServices; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.EntityFrameworkCore; +using Microsoft.SemanticKernel.Data; + +namespace Microsoft.SemanticKernel.Connectors.EntityFramework; + +/// +/// Service for storing and retrieving vector records, that uses Entity Framework as the underlying storage. +/// +/// The data model to use for adding, updating and retrieving data from storage. +#pragma warning disable CA1711 // Identifiers should not have incorrect suffix +public class EntityFrameworkVectorStoreRecordCollection : IVectorStoreRecordCollection where TRecord : class +#pragma warning restore CA1711 // Identifiers should not have incorrect +{ + /// A set of types that a key on the provided model may have. + private static readonly HashSet s_supportedKeyTypes = + [ + typeof(string) + ]; + + /// A set of types that data properties on the provided model may have. + private static readonly HashSet s_supportedDataTypes = + [ + typeof(string), + typeof(int), + typeof(long), + typeof(double), + typeof(float), + typeof(bool), + typeof(DateTimeOffset), + typeof(int?), + typeof(long?), + typeof(double?), + typeof(float?), + typeof(bool?), + typeof(DateTimeOffset?), + ]; + + /// A set of types that vector properties on the provided model may have. + private static readonly HashSet s_supportedVectorTypes = + [ + typeof(byte[]), + ]; + + /// that can be used to manage tables in Entity Framework. + private readonly DbContext _dbContext; + + /// Optional configuration options for this class. + private readonly EntityFrameworkVectorStoreRecordCollectionOptions _options; + + /// A definition of the current storage model. + private readonly VectorStoreRecordDefinition _vectorStoreRecordDefinition; + + /// The key property of the current storage model. + private readonly VectorStoreRecordKeyProperty _keyProperty; + + /// + public string CollectionName => string.Empty; + + /// + /// Initializes a new instance of the class. + /// + /// that can be used to manage tables in Entity Framework. + /// Optional configuration options for this class. + public EntityFrameworkVectorStoreRecordCollection( + DbContext dbContext, + EntityFrameworkVectorStoreRecordCollectionOptions? options = default) + { + Verify.NotNull(dbContext); + + this._dbContext = dbContext; + this._options = options ?? new(); + this._vectorStoreRecordDefinition = this._options.VectorStoreRecordDefinition ?? VectorStoreRecordPropertyReader.CreateVectorStoreRecordDefinitionFromType(typeof(TRecord), true); + + var (keyProperty, dataProperties, vectorProperties) = VectorStoreRecordPropertyReader.SplitDefinitionAndVerify(typeof(TRecord).Name, this._vectorStoreRecordDefinition, supportsMultipleVectors: true, requiresAtLeastOneVector: false); + VectorStoreRecordPropertyReader.VerifyPropertyTypes([keyProperty], s_supportedKeyTypes, "Key"); + VectorStoreRecordPropertyReader.VerifyPropertyTypes(dataProperties, s_supportedDataTypes, "Data", supportEnumerable: true); + VectorStoreRecordPropertyReader.VerifyPropertyTypes(vectorProperties, s_supportedVectorTypes, "Vector"); + + this._keyProperty = keyProperty; + } + + /// + public Task CollectionExistsAsync(CancellationToken cancellationToken = default) + { + throw new System.NotImplementedException(); + } + + /// + public Task CreateCollectionAsync(CancellationToken cancellationToken = default) + { + throw new System.NotImplementedException(); + } + + /// + public Task CreateCollectionIfNotExistsAsync(CancellationToken cancellationToken = default) + { + throw new System.NotImplementedException(); + } + + /// + public Task DeleteCollectionAsync(CancellationToken cancellationToken = default) + { + throw new System.NotImplementedException(); + } + + /// + public async Task DeleteAsync(string key, DeleteRecordOptions? options = null, CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + + var entity = await dbSet.FindAsync([key], cancellationToken).ConfigureAwait(false); + + if (entity != null) + { + dbSet.Remove(entity); + await this._dbContext.SaveChangesAsync(cancellationToken).ConfigureAwait(false); + } + } + + /// + public async Task DeleteBatchAsync(IEnumerable keys, DeleteRecordOptions? options = null, CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + + var entities = await dbSet + .FilterByIds(keys.ToList(), this._keyProperty.DataModelPropertyName) + .ToListAsync(cancellationToken) + .ConfigureAwait(false); + + dbSet.RemoveRange(entities); + + await this._dbContext.SaveChangesAsync(cancellationToken).ConfigureAwait(false); + } + + /// + public async Task GetAsync(string key, GetRecordOptions? options = null, CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + return await dbSet.FindAsync([key], cancellationToken).ConfigureAwait(false); + } + + /// + public async IAsyncEnumerable GetBatchAsync( + IEnumerable keys, + GetRecordOptions? options = null, + [EnumeratorCancellation] CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + + var query = dbSet + .FilterByIds(keys.ToList(), this._keyProperty.DataModelPropertyName); + + await foreach (var item in query.AsAsyncEnumerable().ConfigureAwait(false)) + { + yield return item; + } + } + + /// + public async Task UpsertAsync(TRecord record, UpsertRecordOptions? options = null, CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + + var id = this.GetEntityId(record); + var existingEntry = await dbSet.FindAsync([id], cancellationToken).ConfigureAwait(false); + + if (existingEntry != null) + { + this._dbContext.Entry(existingEntry).CurrentValues.SetValues(record); + } + else + { + dbSet.Add(record); + } + + await this._dbContext.SaveChangesAsync(cancellationToken).ConfigureAwait(false); + + return id; + } + + /// + public async IAsyncEnumerable UpsertBatchAsync( + IEnumerable records, + UpsertRecordOptions? options = null, + [EnumeratorCancellation] CancellationToken cancellationToken = default) + { + var dbSet = this._dbContext.Set(); + + var entityDictionary = records.ToDictionary(this.GetEntityId); + var ids = entityDictionary.Keys.ToList(); + + var existingEntities = await dbSet + .FilterByIds(ids, this._keyProperty.DataModelPropertyName) + .ToListAsync(cancellationToken) + .ConfigureAwait(false); + + // Update existing entities. + foreach (var existingEntity in existingEntities) + { + var entityId = this.GetEntityId(existingEntity); + + if (entityDictionary.TryGetValue(entityId, out var newEntity)) + { + this._dbContext.Entry(existingEntity).CurrentValues.SetValues(newEntity); + + // Remove updated entity from dictionary to insert new entities later. + entityDictionary.Remove(entityId); + } + } + + // Insert new entities. + dbSet.AddRange(entityDictionary.Values); + + await this._dbContext.SaveChangesAsync(cancellationToken).ConfigureAwait(false); + + foreach (var id in ids) + { + yield return id; + } + } + + #region private + + private string GetEntityId(TRecord entity) + { + var keyPropertyName = this._keyProperty.DataModelPropertyName; + var keyProperty = typeof(TRecord).GetProperty(keyPropertyName)!; + + var id = keyProperty.GetValue(entity) as string; + + if (string.IsNullOrWhiteSpace(id)) + { + throw new VectorStoreOperationException($"Key property {keyPropertyName} is not initialized."); + } + + return id; + } + + #endregion +} diff --git a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollectionOptions.cs b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollectionOptions.cs new file mode 100644 index 000000000000..7a6f7a908150 --- /dev/null +++ b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/EntityFrameworkVectorStoreRecordCollectionOptions.cs @@ -0,0 +1,21 @@ +// Copyright (c) Microsoft. All rights reserved. + +using Microsoft.SemanticKernel.Data; + +namespace Microsoft.SemanticKernel.Connectors.EntityFramework; + +/// +/// Options when creating a . +/// +public sealed class EntityFrameworkVectorStoreRecordCollectionOptions where TRecord : class +{ + /// + /// Gets or sets an optional record definition that defines the schema of the record type. + /// + /// + /// If not provided, the schema will be inferred from the record model class using reflection. + /// In this case, the record model properties must be annotated with the appropriate attributes to indicate their usage. + /// See , and . + /// + public VectorStoreRecordDefinition? VectorStoreRecordDefinition { get; init; } = null; +} diff --git a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs new file mode 100644 index 000000000000..0a6cb5da4b80 --- /dev/null +++ b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs @@ -0,0 +1,33 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Collections.Generic; +using System.Linq.Expressions; +using System.Linq; +using System; + +namespace Microsoft.SemanticKernel.Connectors.EntityFramework; + +internal static class QueryableExtensions +{ + internal static IQueryable FilterByIds(this IQueryable source, List ids, string idPropertyName) + { + if (ids is not { Count: > 0 }) + { + return source; + } + + var parameter = Expression.Parameter(typeof(TEntity), "entity"); + var property = Expression.Property(parameter, idPropertyName); + + var idsExpression = Expression.Constant(ids); + var containsMethod = typeof(Enumerable) + .GetMethods() + .First(m => m.Name == nameof(Enumerable.Contains) && m.GetParameters().Length == 2) + .MakeGenericMethod(typeof(string)); + + var containsExpression = Expression.Call(containsMethod, idsExpression, property); + var lambda = Expression.Lambda>(containsExpression, parameter); + + return source.Where(lambda); + } +} diff --git a/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/ApplicationDbContext.cs b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/ApplicationDbContext.cs new file mode 100644 index 000000000000..c89133cbb35b --- /dev/null +++ b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/ApplicationDbContext.cs @@ -0,0 +1,16 @@ +// Copyright (c) Microsoft. All rights reserved. + +using Microsoft.EntityFrameworkCore; + +namespace SemanticKernel.IntegrationTests.Connectors.Memory.EntityFramework; + +public sealed class ApplicationDbContext(DbContextOptions options) : DbContext(options) +{ + public DbSet Hotels { get; set; } + + protected override void OnModelCreating(ModelBuilder modelBuilder) + { + modelBuilder.Entity() + .HasKey(l => l.HotelId); + } +} diff --git a/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkHotel.cs b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkHotel.cs new file mode 100644 index 000000000000..f295b2c90775 --- /dev/null +++ b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkHotel.cs @@ -0,0 +1,43 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System.Collections.Generic; +using Microsoft.SemanticKernel.Data; + +namespace SemanticKernel.IntegrationTests.Connectors.Memory.EntityFramework; + +#pragma warning disable CS8618, CA1819 + +public record EntityFrameworkHotel() +{ + /// The key of the record. + [VectorStoreRecordKey] + public string HotelId { get; init; } + + /// A string metadata field. + [VectorStoreRecordData(IsFilterable = true)] + public string? HotelName { get; set; } + + /// An int metadata field. + [VectorStoreRecordData(IsFullTextSearchable = true)] + public int HotelCode { get; set; } + + /// A float metadata field. + [VectorStoreRecordData] + public float? HotelRating { get; set; } + + /// A bool metadata field. + [VectorStoreRecordData] + public bool ParkingIncluded { get; set; } + + /// An array metadata field. + [VectorStoreRecordData] + public List Tags { get; set; } = []; + + /// A data field. + [VectorStoreRecordData] + public string Description { get; set; } + + /// A vector field. + [VectorStoreRecordVector(Dimensions: 4, IndexKind: IndexKind.Flat, DistanceFunction: DistanceFunction.CosineSimilarity)] + public byte[] DescriptionEmbedding { get; set; } +} diff --git a/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkVectorStoreRecordCollectionTests.cs b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkVectorStoreRecordCollectionTests.cs new file mode 100644 index 000000000000..a15635303c45 --- /dev/null +++ b/dotnet/src/IntegrationTests/Connectors/Memory/EntityFramework/EntityFrameworkVectorStoreRecordCollectionTests.cs @@ -0,0 +1,243 @@ +// Copyright (c) Microsoft. All rights reserved. + +using System; +using System.Data.Common; +using System.Linq; +using System.Runtime.InteropServices; +using System.Threading.Tasks; +using Microsoft.Data.Sqlite; +using Microsoft.EntityFrameworkCore; +using Microsoft.SemanticKernel.Connectors.EntityFramework; +using Xunit; + +namespace SemanticKernel.IntegrationTests.Connectors.Memory.EntityFramework; + +public sealed class EntityFrameworkVectorStoreRecordCollectionTests : IDisposable +{ + private readonly DbConnection _connection; + private readonly DbContextOptions _contextOptions; + + public EntityFrameworkVectorStoreRecordCollectionTests() + { + this._connection = new SqliteConnection("Filename=:memory:"); + this._connection.Open(); + + this._contextOptions = new DbContextOptionsBuilder() + .UseSqlite(this._connection) + .Options; + + using var context = new ApplicationDbContext(this._contextOptions); + + context.Database.EnsureCreated(); + } + + [Fact] + public async Task ItCanUpsertAndGetAsync() + { + // Arrange + const string HotelId = "55555555-5555-5555-5555-555555555555"; + + await using var context = this.CreateContext(); + + var sut = new EntityFrameworkVectorStoreRecordCollection(context); + + var record = this.CreateTestHotel(HotelId); + + // Act + var upsertResult = await sut.UpsertAsync(record); + var getResult = await sut.GetAsync(HotelId); + + // Assert + Assert.Equal(HotelId, upsertResult); + Assert.NotNull(getResult); + + Assert.Equal(record.HotelId, getResult.HotelId); + Assert.Equal(record.HotelName, getResult.HotelName); + Assert.Equal(record.HotelCode, getResult.HotelCode); + Assert.Equal(record.HotelRating, getResult.HotelRating); + Assert.Equal(record.ParkingIncluded, getResult.ParkingIncluded); + Assert.Equal(record.Tags.ToArray(), getResult.Tags.ToArray()); + Assert.Equal(record.Description, getResult.Description); + Assert.Equal(record.DescriptionEmbedding, getResult.DescriptionEmbedding); + } + + [Fact] + public async Task ItCanDeleteAsync() + { + // Arrange + const string HotelId = "55555555-5555-5555-5555-555555555555"; + + await using var context = this.CreateContext(); + + var sut = new EntityFrameworkVectorStoreRecordCollection(context); + + var record = this.CreateTestHotel(HotelId); + + // Act + var upsertResult = await sut.UpsertAsync(record); + var getResult = await sut.GetAsync(HotelId); + + Assert.Equal(HotelId, upsertResult); + Assert.NotNull(getResult); + + await sut.DeleteAsync(HotelId); + + getResult = await sut.GetAsync(HotelId); + + Assert.Null(getResult); + } + + [Fact] + public async Task ItCanGetAndDeleteBatchAsync() + { + // Arrange + const string HotelId1 = "11111111-1111-1111-1111-111111111111"; + const string HotelId2 = "22222222-2222-2222-2222-222222222222"; + const string HotelId3 = "33333333-3333-3333-3333-333333333333"; + + await using var context = this.CreateContext(); + + var sut = new EntityFrameworkVectorStoreRecordCollection(context); + + var record1 = this.CreateTestHotel(HotelId1); + var record2 = this.CreateTestHotel(HotelId2); + var record3 = this.CreateTestHotel(HotelId3); + + var upsertResults = await sut.UpsertBatchAsync([record1, record2, record3]).ToListAsync(); + var getResults = await sut.GetBatchAsync([HotelId1, HotelId2, HotelId3]).ToListAsync(); + + Assert.Equal([HotelId1, HotelId2, HotelId3], upsertResults); + + Assert.NotNull(getResults.First(l => l.HotelId == HotelId1)); + Assert.NotNull(getResults.First(l => l.HotelId == HotelId2)); + Assert.NotNull(getResults.First(l => l.HotelId == HotelId3)); + + // Act + await sut.DeleteBatchAsync([HotelId1, HotelId2, HotelId3]); + + getResults = await sut.GetBatchAsync([HotelId1, HotelId2, HotelId3]).ToListAsync(); + + // Assert + Assert.Empty(getResults); + } + + [Fact] + public async Task ItCanUpsertRecordAsync() + { + // Arrange + const string HotelId = "55555555-5555-5555-5555-555555555555"; + + await using var context = this.CreateContext(); + + var sut = new EntityFrameworkVectorStoreRecordCollection(context); + + var record = this.CreateTestHotel(HotelId); + + var upsertResult = await sut.UpsertAsync(record); + var getResult = await sut.GetAsync(HotelId); + + Assert.Equal(HotelId, upsertResult); + Assert.NotNull(getResult); + + // Act + record.HotelName = "Updated name"; + record.HotelRating = 10; + + upsertResult = await sut.UpsertAsync(record); + getResult = await sut.GetAsync(HotelId); + + // Assert + Assert.NotNull(getResult); + Assert.Equal("Updated name", getResult.HotelName); + Assert.Equal(10, getResult.HotelRating); + } + + [Fact] + public async Task ItCanUpsertBatchAsync() + { + // Arrange + const string HotelId1 = "11111111-1111-1111-1111-111111111111"; + const string HotelId2 = "22222222-2222-2222-2222-222222222222"; + const string HotelId3 = "33333333-3333-3333-3333-333333333333"; + + await using var context = this.CreateContext(); + + var sut = new EntityFrameworkVectorStoreRecordCollection(context); + + var record1 = this.CreateTestHotel(HotelId1); + var record2 = this.CreateTestHotel(HotelId2); + var record3 = this.CreateTestHotel(HotelId3); + + var upsertResults = await sut.UpsertBatchAsync([record1, record2, record3]).ToListAsync(); + var getResults = await sut.GetBatchAsync([HotelId1, HotelId2, HotelId3]).ToListAsync(); + + Assert.Equal([HotelId1, HotelId2, HotelId3], upsertResults); + + Assert.NotNull(getResults.First(l => l.HotelId == HotelId1)); + Assert.NotNull(getResults.First(l => l.HotelId == HotelId2)); + Assert.NotNull(getResults.First(l => l.HotelId == HotelId3)); + + // Act + record1.HotelName = "Updated name 1"; + record1.HotelRating = 1; + + record2.HotelName = "Updated name 2"; + record2.HotelRating = 2; + + record3.HotelName = "Updated name 3"; + record3.HotelRating = 3; + + upsertResults = await sut.UpsertBatchAsync([record1, record2, record3]).ToListAsync(); + getResults = await sut.GetBatchAsync([HotelId1, HotelId2, HotelId3]).ToListAsync(); + + // Assert + Assert.NotNull(getResults); + + Assert.Equal("Updated name 1", getResults[0].HotelName); + Assert.Equal(1, getResults[0].HotelRating); + + Assert.Equal("Updated name 2", getResults[1].HotelName); + Assert.Equal(2, getResults[1].HotelRating); + + Assert.Equal("Updated name 3", getResults[2].HotelName); + Assert.Equal(3, getResults[2].HotelRating); + } + + public void Dispose() + { + using var context = this.CreateContext(); + + context.Database.EnsureDeleted(); + + this._connection.Dispose(); + } + + #region private + + private ApplicationDbContext CreateContext() => new(this._contextOptions); + + private EntityFrameworkHotel CreateTestHotel(string hotelId) + { + return new EntityFrameworkHotel + { + HotelId = hotelId, + HotelName = $"My Hotel {hotelId}", + HotelCode = 42, + HotelRating = 4.5f, + ParkingIncluded = true, + Tags = { "t1", "t2" }, + Description = "This is a great hotel.", + DescriptionEmbedding = ConvertToByteArray(new[] { 30f, 31f, 32f, 33f }), + }; + } + + private static byte[] ConvertToByteArray(ReadOnlyMemory memory) + { + var length = memory.Length * sizeof(float); + var bytes = new byte[length]; + MemoryMarshal.AsBytes(memory.Span).CopyTo(bytes); + return bytes; + } + + #endregion +} diff --git a/dotnet/src/IntegrationTests/IntegrationTests.csproj b/dotnet/src/IntegrationTests/IntegrationTests.csproj index 55a6ac6d1006..5067d7463149 100644 --- a/dotnet/src/IntegrationTests/IntegrationTests.csproj +++ b/dotnet/src/IntegrationTests/IntegrationTests.csproj @@ -35,6 +35,7 @@ + @@ -60,6 +61,7 @@ + From 6b54d124e682ca5029c8e6c6b5e839c4075aa6fc Mon Sep 17 00:00:00 2001 From: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com> Date: Mon, 19 Aug 2024 20:38:18 -0700 Subject: [PATCH 2/4] Updated date --- docs/decisions/0051-entity-framework-as-connector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/decisions/0051-entity-framework-as-connector.md b/docs/decisions/0051-entity-framework-as-connector.md index fd3d86ad7905..3ff5b3baa8d4 100644 --- a/docs/decisions/0051-entity-framework-as-connector.md +++ b/docs/decisions/0051-entity-framework-as-connector.md @@ -2,7 +2,7 @@ # These are optional elements. Feel free to remove any of them. status: proposed contact: dmytrostruk -date: 2024-08-19 +date: 2024-08-20 deciders: sergeymenshykh, markwallace, rbarreto, westey-m --- From 8dc570d964e1524083ba0e0b97c10687d5afea18 Mon Sep 17 00:00:00 2001 From: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com> Date: Mon, 19 Aug 2024 20:39:57 -0700 Subject: [PATCH 3/4] Small improvement --- docs/decisions/0051-entity-framework-as-connector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/decisions/0051-entity-framework-as-connector.md b/docs/decisions/0051-entity-framework-as-connector.md index 3ff5b3baa8d4..a02dd006c4ec 100644 --- a/docs/decisions/0051-entity-framework-as-connector.md +++ b/docs/decisions/0051-entity-framework-as-connector.md @@ -26,7 +26,7 @@ In new Vector Store design, interface `IVectorStoreRecordCollection` interface, since there is no abstraction around collection management in EF that will work for most databases. For such cases, the recommended approach is to handle collection creation individually for each database. As an example, in MongoDB it's recommended to use MongoDB C# Driver directly. +In Entity Framework, collection (also known as schema/table) creation using programmatic approach is not recommended in production scenarios. The recommended approach is to use Migrations (in case of code-first approach), or to use Reverse Engineering (also known as scaffolding/database-first approach). Programmatic schema creation is recommended only for testing/local scenarios. Also, collection creation process differs for different databases. For example, MongoDB EF Core provider doesn't support schema migrations or database-first/model-first approaches. Instead, the collection is created automatically when a document is inserted for the first time, if collection doesn't already exist. This brings the complexity around methods such as `CreateCollectionAsync` from `IVectorStoreRecordCollection` interface, since there is no abstraction around collection management in EF that will work for most databases. For such cases, the recommended approach is to rely on automatic creation or handle collection creation individually for each database. As an example, in MongoDB it's recommended to use MongoDB C# Driver directly. Sources: - https://learn.microsoft.com/en-us/ef/core/managing-schemas/ From 94cc32ccae392e4ba2050b11f4bc177888b880ad Mon Sep 17 00:00:00 2001 From: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com> Date: Mon, 19 Aug 2024 20:52:26 -0700 Subject: [PATCH 4/4] Fixed usings --- .../Connectors.Memory.EntityFramework/QueryableExtensions.cs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs index 0a6cb5da4b80..8abb2263b63d 100644 --- a/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs +++ b/dotnet/src/Connectors/Connectors.Memory.EntityFramework/QueryableExtensions.cs @@ -1,9 +1,9 @@ // Copyright (c) Microsoft. All rights reserved. +using System; using System.Collections.Generic; -using System.Linq.Expressions; using System.Linq; -using System; +using System.Linq.Expressions; namespace Microsoft.SemanticKernel.Connectors.EntityFramework;