You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since Cosmos DB is schema-less database, it is possible to reference a property that is not defined in all documents in a query. For example, the following query:
SELECTc.MissingPropertyFROM Customers c
Returns the following results for a collection that contains three documents that do not contain MissingProperty:
[
{},
{},
{}
]
When EF Core models mapped to Cosmos DB evolve, we expect that it will be common to use a new version of an entity type that contains a property that is not defined in existing documented already stored in the database.
From the perspective of materialization, this could be dealt with by just skipping properties that are missing in the store. This would result in the properties on the objects to keep whatever value they were initialized to. For example, for optional properties, a missing value in the store would become equivalent to the property being null.
However there is an important caveat with this approach: because of how indexing works in Cosmos DB, queries that reference the missing property somewhere else than in the projection could return unexpected results. For example:
If a property that is missing in some documents is referenced in a predicate that tests it against null, only documents that contain the property will be returned
If a property that is missing in some documents is referenced in the sort expression or an ORDER BY clause, documents that contain the property with any value (including null) will be sorted, but documents that do not define the properties will be filtered out because they are not in the index used to resolve ORDER BY
Although for ORDER BY there is a way we could compensate by issuing two separate queries (the first one to get all the data, and the second one to get the order an potentially less data), it seems that this could be relatively expensive. This approach would not help for the WHERE clause case because it could require all the data from the collection to be retrieved.
At some point come up with "schema" evolution tooling that makes sure new properties are added to existing documents
Try to figure out a way (probably with help of annotations in the model) we can warn when properties are used in queries which could be missing values in existing documents.
Since Cosmos DB is schema-less database, it is possible to reference a property that is not defined in all documents in a query. For example, the following query:
Returns the following results for a collection that contains three documents that do not contain MissingProperty:
[ {}, {}, {} ]When EF Core models mapped to Cosmos DB evolve, we expect that it will be common to use a new version of an entity type that contains a property that is not defined in existing documented already stored in the database.
From the perspective of materialization, this could be dealt with by just skipping properties that are missing in the store. This would result in the properties on the objects to keep whatever value they were initialized to. For example, for optional properties, a missing value in the store would become equivalent to the property being null.
However there is an important caveat with this approach: because of how indexing works in Cosmos DB, queries that reference the missing property somewhere else than in the projection could return unexpected results. For example:
If a property that is missing in some documents is referenced in a predicate that tests it against null, only documents that contain the property will be returned
If a property that is missing in some documents is referenced in the sort expression or an ORDER BY clause, documents that contain the property with any value (including null) will be sorted, but documents that do not define the properties will be filtered out because they are not in the index used to resolve ORDER BY
Although for ORDER BY there is a way we could compensate by issuing two separate queries (the first one to get all the data, and the second one to get the order an potentially less data), it seems that this could be relatively expensive. This approach would not help for the WHERE clause case because it could require all the data from the collection to be retrieved.
But for WHERE we could use IS_DEFINED (see #13131 (comment)).
What we can do?
The alternatives I can see are: