Skip to content

Cosmos: Deal with missing property values #13131

@divega

Description

@divega

Since Cosmos DB is schema-less database, it is possible to reference a property that is not defined in all documents in a query. For example, the following query:

SELECT c.MissingProperty FROM Customers c

Returns the following results for a collection that contains three documents that do not contain MissingProperty:

[
    {},
    {},
    {}
]

When EF Core models mapped to Cosmos DB evolve, we expect that it will be common to use a new version of an entity type that contains a property that is not defined in existing documented already stored in the database.

From the perspective of materialization, this could be dealt with by just skipping properties that are missing in the store. This would result in the properties on the objects to keep whatever value they were initialized to. For example, for optional properties, a missing value in the store would become equivalent to the property being null.

However there is an important caveat with this approach: because of how indexing works in Cosmos DB, queries that reference the missing property somewhere else than in the projection could return unexpected results. For example:

  1. If a property that is missing in some documents is referenced in a predicate that tests it against null, only documents that contain the property will be returned

  2. If a property that is missing in some documents is referenced in the sort expression or an ORDER BY clause, documents that contain the property with any value (including null) will be sorted, but documents that do not define the properties will be filtered out because they are not in the index used to resolve ORDER BY

Although for ORDER BY there is a way we could compensate by issuing two separate queries (the first one to get all the data, and the second one to get the order an potentially less data), it seems that this could be relatively expensive. This approach would not help for the WHERE clause case because it could require all the data from the collection to be retrieved.

But for WHERE we could use IS_DEFINED (see #13131 (comment)).

What we can do?

The alternatives I can see are:

  1. Make sure the behavior is well documented
  2. At some point come up with "schema" evolution tooling that makes sure new properties are added to existing documents
  3. Try to figure out a way (probably with help of annotations in the model) we can warn when properties are used in queries which could be missing values in existing documents.
  4. Add an extension method to EF.Functions that maps to IS_DEFINED() (see Cosmos: Deal with missing property values #13131 (comment)) so that it can be explicitly used in LINQ queries.
  5. Compensate for null navigation properties (see Cosmos: Deal with missing property values #13131 (comment)) and scalar properties by expanding IS NULL to also check for NOT IS_DEFINED().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions