Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
0f85255
engineering-plans: Start subgraph composition engineering plan
Dec 9, 2019
1fdcb49
engineering-plans: subgraph-schema-merging
Dec 13, 2019
9905197
engineering-plans: Typo
Dec 13, 2019
db843ae
engineering-plans: Remove template content
Dec 13, 2019
d841b1e
engineering-plan: Update link for PR
Dec 13, 2019
323c8eb
engineering-plans: Add sammple GraphQL documents to describe merging
Dec 13, 2019
4856a89
engineering-plans: Fix sentence
Dec 13, 2019
7e16211
engineering-plans: Shorten name and update plan #
Dec 16, 2019
8636b8e
engineering-plans: Update markup
Dec 16, 2019
a2d9b47
engineering-plans: Address feedback on pull #6
Dec 16, 2019
4139d9c
engineering-plans: @mergeInfo -> @originalName
Dec 16, 2019
20edb4b
engineering-plans: Add more detail to the merge algorithm
Dec 16, 2019
84480da
engineering-plans: Add detail to cache invalidation
Dec 17, 2019
111c875
engineering-plans: Add detail to schema merging cache invalidation
Dec 17, 2019
75ff610
engineering-plans: Add entry to approved.md
Dec 17, 2019
b367c3f
SUMMARY: Add subgraph schema merging to list
Dec 17, 2019
10acac4
engineering-plans: Address importing types for non scalar fields
Dec 20, 2019
1da8b07
engineering-plans: Rename subgraph schema merging plan
Jan 14, 2020
7bd8982
engineering-plans: Add entry to approved.md
Dec 17, 2019
77d2bc7
engineering-plans: Add plan for subgraph composition query execution
Jan 14, 2020
2eb16c3
engineering-plans: Subgraph composition query execution
Jan 15, 2020
5a822c4
engineering-plans: Address typo
Jan 15, 2020
8b6aa11
engineering-plans: Add more detail to query execution plan
Jan 16, 2020
8dbc3bb
engineering-plans: touch up subgraph composition query execution
Jan 16, 2020
5b0926e
subgraph-composition-query-execution: Small updates
Feb 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@
- [PLAN-0001: GraphQL Query Prefetching](./engineering-plans/0001-graphql-query-prefetching.md)
- [PLAN-0002: Ethereum Tracing Cache](./engineering-plans/0002-ethereum-tracing-cache.md)
- [PLAN-0003: Remove JSONB Storage](./engineering-plans/0003-remove-jsonb-storage.md)
- [PLAN-0004: Subgraph Schema Merging](./engineering-plans/0004-subgraph-schema-merging.md)
- [Obsolete Plans](./engineering-plans/obsolete.md)
- [Rejected Plans](./engineering-plans/rejected.md)
284 changes: 284 additions & 0 deletions engineering-plans/0004-subgraph-schema-merging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
# PLAN-0004: Subgraph Schema Merging

<dl>
<dt>Author</dt>
<dd>Jorge Olivero</dd>

<dt>Implements</dt>
<dd><a href="../rfcs/0001-subgraph-composition.md">RFC-0001 Subgraph Composition</a></dd>

<dt>Engineering Plan pull request</dt>
<dd><a href="https://github.com/graphprotocol/rfcs/pull/5">Engineering Plan PR</a></dd>

<dt>Obsoletes (if applicable)</dt>
<dd>-</dd>

<dt>Date of submission</dt>
<dd>2019-12-09</dd>

<dt>Date of approval</dt>
<dd>TBD</dd>

<dt>Approved by</dt>
<dd>TBD</dd>
</dl>

## Summary

Subgraph composition allows a subgraph to import types from another subgraph. Imports are designed to be loosely coupled throughout the deployment and indexing phases, and tightly coupled during query execution for the subgraph.

To generate an API schema for the subgraph, the subgraph schema needs to be merged with the imported subgraph schemas. The merging process needs to take the following into account:

1. An imported schema not being available on the Graph Node
2. An imported schema mising a type imported by the subgraph schema
3. A schema imported by subgraph name changes
4. Ability to tell which subgraph/schema each type belongs to

## Implementation

The schema merging implementation consists of two parts:

1. A cache of subgraph schemas
2. The schema merging logic

### Schema Merging

Add a `merged_schema(&Schema, HashMap<SchemaReference, Arc<Schema>) -> Schema` function to the `graphql::schema` crate which will add each of the imported types to the provided document with a `@subgraphId` directive denoting which subgraph the type came from. If any of the imported types have non scalar fields, import those types as well.

The `HashMap<SchemaReference, Arc<Schema>` includes all of the schemas in the subgraph's import graph which are available on the Graph Node. For each `@import` directive on the subgraph, find the imported types by tracing their path along the import graph.

- If any schema node along that path is missing or if a type is missing in the schema, add a type definition to the subgraphs merged schema with the proper name, a `@subgraphId(id: "...")` directive (if available), and a `@placeholder` directive denoting that type was not found.
sc- If the type is found, copy it and add a `@subgraphId(id: "...")` directive.
- If the type is imported with the `{ name: "", as: "" }` format, the merged type will include an `@originalName(name: "...")` directive preserving the type name from the original schema.

The `api_schema` function will add all the necessary types and fields for the imported types without requiring any changes.

#### Example #1: Complete merge

Local schema before calling `merged_schema`:

```graphql
type _Schema_
@import(
types: ["B"],
from: { id: "X" }
)

type A @entity {
id: ID!
foo: B!
}
```

Imported Schema X:

```graphql
type B @entity {
id: ID!
bar: String
}
```

Schema after calling `merged_schema`:

```graphql
type A @entity {
id: ID!
foo: B!
}

type B @entity @subgraphId(id: "X") {
id: ID!
bar: String
}
```

#### Example #2: Incomplete merge

Schema before calling `merged_schema`:

```graphql
type _Schema_
@import(
types: ["B"],
from: { id: "X" }
)

type A @entity {
id: ID!
foo: B!
}
```

Imported Schema X:

```graphql
NOT AVAILABLE
```

Schema after calling `merged_schema`

```graphql
type A @entity @subgraphId(id: "...") {
id: ID!
foo: B!
}

type B @entity @placeholder {
id: ID!
}
```

#### Example #3: Complete merge with `{ name: "...", as: "..." }`

Schema before calling `merged_schema`

```graphql
type _Schema_
@imports(
types: [{ name: "B", as: "BB" }]
from: { id: "X" }
)

type B @entity {
id: ID!
foo: BB!
}
```

Imported Schema X:

```graphql
type B @entity {
id: ID!
bar: String
}
```

Schema after calling `merged_schema`

```graphql
type B @entity {
id: ID!
foo: BB!
}

type BB @entity @subgraphId(id: "X") @originalName(name: "B") {
id: ID!
bar: String
}
```

#### Example #4: Complete merge with nested types

Schema before calling `merged_schema`

```graphql
type _Schema_
@imports(
types: [{ name: "B", as: "BB" }]
from: { id: "X" }
)

type B @entity {
id: ID!
foo: BB!
}
```

Imported Schema X:

```graphql
type _Schema_
@imports(
types: [{ name: "C", as: "CC" }]
from: { id: "Y" }
)

type B @entity {
id: ID!
bar: CC!
baz: DD!
}

type DD @entity {
id: ID!
}
```

Imported Schema Y:

```graphql
type C @entity {
id: ID!
}
```

Schema after calling `merged_schema`

```graphql
type B @entity {
id: ID!
foo: BB!
}

type BB @entity @subgraphId(id: "X") @originalName(name: "B") {
id: ID!
bar: CC!
baz: DD!
}

type CC @entity @subgraphId(id: "Y") @originalName(name: "C") {
id: ID!
}

type DD @entity @subgraphId(id: "X") {
id: ID!
}
```



After the schema document is merged, the `api_schema` function will be called.

### Cache Invalidation

For each schema in the cache, keep a vector of subgraph pointers containing an element for each schema in the subgraph's import graph which was imported by name and the subgraph ID which was used during the schema merge. When a schema is accessed from the schema cache (and possibly only if this check hasn't happened in the last N seconds), check the current version for each of these schemas and run a diff against the versions used for the most recent schema merge. If there are any new versions, re merge the schema.

Currently the `schema_cache` in the `Store` is a `Mutex<LruCache<SubgraphDeploymentId, SchemaPair>>`. A `SchemaPair` consists of two fields: `input_schema` and `api_schema`. To support the refresh flow, `SchemaPair` would be extended to be a `SchemaEntry`, with the fields `input_schema`, `api_schema`, `schemas_imported` (`Vec<(SchemaReference, SubgraphDeploymentId)>`), and a `last_refresh_check` timestamp.

A more performant invalidation solution would be to have the cache maintain a listener notifying it every time a subgraph's current version changes. Upon receiving the notification the listener scans the schemas in the cache for those which should be remerged.


## Tests

1. Schemas are merged correctly when all schemas and imported types are available.

2. Placeholder types are properly inserted into the merged schema when a schema is not available.

3. Placeholder types are properly inserted into the merged schema when the relevant schemas are available but the types are not.

4. The cache is invalidated when the store returns an updated version of a cache entry's dependency.

## Migration

Subgraph composition is an additive feature which doesn't require a special migration plan.

## Documentation

Documentation on https://thegraph.com/docs needs to outline:

1. The reserved _Schema_ type and how to define imports on it.
2. The semantics of importing by subgraph ID vs. subgraph name, i.e. what happens when a subgraph imported by name removes expected types from the schema.
3. How queries are processed when imported subgraphs schemas or types are not available on the graph-node processing the query.

## Implementation Plan

- Implement the `merged_schema` function (2d)
- Write tests for the `merged_schema` function (1d)
- Integrate `merged_schema` into `Store::cached_schema` and update the cache to include the relevant information for imported schemas and types (1d)
- Add cache invalidation logic to `Store::cached_schema` (2d)

## Open Questions

- The execution of queries and subscriptions needs to be updated to leverage the types in a merged schema.
Loading