Skip to content

Merge post-mortem #18483

@AskAlexSharov

Description

@AskAlexSharov

during 0-2048 steps merge of ethmainnet. we observed chain-tip impact. Collecting reasons here:

  • During merge: new bloom filters opened while old are not closed yet. It caused +2gb ram use. It's by-design (we opening new files before closing old - closing old only after no readers left) - but .kv bloom-filters here are adding some constraints:
Image
  • We agreed to prioritize work on "prohibit large merges on User's side". But work on this issue was not started because we had not free arms: handling 2x domain folder free space when merge #15343 (comment)

  • commitment.kv - need disable keys compression (or replace by page-level compression). just for speed of merge.

  • commitment.kv merge - does traversing files sequentially, but it does reading acc.kv/storage.kv files ("resolve short keys") in random order. And i think this is main reason of ChainTip impact:

1 @ 0x47a539 0x4bca3b 0x122b862 0x122bf2d 0x123605e 0x1755531 0x1756ff2 0x14bd4ec 0x1756e6a 0x17842d3 0x1729d5e 0x91a350 0x4bb461
#	0x122b861	github.com/erigontech/erigon/db/seg.(*Getter).nextPos+0x101					github.com/erigontech/erigon/db/seg/decompress.go:620
#	0x122bf2c	github.com/erigontech/erigon/db/seg.(*Getter).Next+0x4c						github.com/erigontech/erigon/db/seg/decompress.go:726
#	0x123605d	github.com/erigontech/erigon/db/seg.(*Reader).Next+0x3d						github.com/erigontech/erigon/db/seg/seg_auto_rw.go:65
#	0x1755530	github.com/erigontech/erigon/db/state.(*DomainRoTx).lookupByShortenedKey+0x170			github.com/erigontech/erigon/db/state/domain_committed.go:281
#	0x1756ff1	github.com/erigontech/erigon/db/state.(*DomainRoTx).commitmentValTransformDomain.func1.1+0x111	github.com/erigontech/erigon/db/state/domain_committed.go:370
#	0x14bd4eb	github.com/erigontech/erigon/execution/commitment.BranchData.ReplacePlainKeys+0xdab		github.com/erigontech/erigon/execution/commitment/commitment.go:478
#	0x1756e69	github.com/erigontech/erigon/db/state.(*DomainRoTx).commitmentValTransformDomain.func1+0x6c9	github.com/erigontech/erigon/db/state/domain_committed.go:424
#	0x17842d2	github.com/erigontech/erigon/db/state.(*DomainRoTx).mergeFiles+0xd12				github.com/erigontech/erigon/db/state/merge.go:502
#	0x1729d5d	github.com/erigontech/erigon/db/state.(*AggregatorRoTx).mergeFiles.func2+0x31d			github.com/erigontech/erigon/db/state/aggregator.go:1462
#	0x91a34f	golang.org/x/sync/errgroup.(*Group).Go.func1+0x4f						golang.org/x/sync@v0.18.0/errgroup/errgroup.go:93
  • We don't have IO-rate-limiter (to make ChainTip impact predictable)
  • If restart erigon during indexing files: will be downtime (indexing is blocking-operation now at startup). Example building of: v2.0-commitment.0-2048.kvi

7 days data from monitoring (just to keep more evidence in 1 place):

Image Image

And Prune got slower that time (amount of steps growing in DB - prune can't keep-up). Means other 'prune issues' can be related to "merge" impact on node:
Image

Image

Raw Ideas:

  • need experiment on embedding state into commitment.kv to avoid "reslove keys" random-reads of acc.kv/storage.kv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions