Skip to content

Block Triggers hash write-up #3554

@evaporei

Description

@evaporei

One of The Graph's coolest features is that queries are deterministic, and
given a Qm subgraph hash, indexing it should always give the same result.
This is possible because we inherit the blockchain's determinism property,
however there's a big loophole which can break this amazing feature, which is
the chain provider.

Currently the main (or only) type of connection we give as option to indexers
(in The Graph Network) is the JSON-RPC one. To use it, they can either run a
node themselves or use a third party service like Alchemy. Either way the
provider can be faulty and give incorrect results for a number of different
reasons.

To be a little more specific, let's say there are indexers/nodes A and B.
Both are indexing subgraph Z. Indexer A is using Alchemy and B is using
Infura.

Given a block 14_722_714 of a determined hash, both providers will very
likely give the same result for these two values (block number and hash),
however other fields such as gas_used or total_difficulty could be
incorrect. And yes, ideally they would always be correct since they are chain
providers, that's their main job, however what I'm describing is the exact
issue we've faced when testing indexing Ethereum mainnet with the Firehose.

These field/value differences between providers are directly fed into the
subgraph mappings, which are the current input of the POI algorithm and the
base of The Graph's determinism property. Not taking the possible faultyness
of the chain providers into account, can break determinism altogether.

And the biggest problem today is that, to spot these POI differences, we have
to index subgraphs that use those values in their mappings. If by any chance
in Firehose shootout we've done in the integration cluster, there were no
subgraphs using these values we wouldn't spot any POI differences, which
is a very severe issue.

POI differences described in the Firehose shootout for reference:
https://gist.github.com/evaporei/660e57d95e6140ca877f338426cea200.

So in summary, the problems being described above are:

  • That currently we consider the chain provider as a source of truth,
    which can only be questioned in behalf of re-orgs;
  • We don't have a good way to compare provider input (that could spot POI
    differences) without the indirection of a subgraph mapping.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions