Skip to content

Mem pressure try 3#43

Closed
moodyjon wants to merge 10 commits intolbryio:masterfrom
moodyjon:mem_pressure_try_3
Closed

Mem pressure try 3#43
moodyjon wants to merge 10 commits intolbryio:masterfrom
moodyjon:mem_pressure_try_3

Conversation

@moodyjon
Copy link
Collaborator

@moodyjon moodyjon commented Jun 2, 2022

Based on work by @BrannonKing in the mem_pressure_try_2 and reduce_memory_pressure branches...

I think the most impactful thing is reducing time for "Building the entire claim trie in RAM" phase. It's CPU intensive, so time spent in GC reduces performance.

Second thing is allowing the GOGC= environment variable to take effect overriding the hard-coded value (10%). Ten percent is quite low, and leads to constant GC activity. Opening up GOGC= allows tradeoffs to be made.

I don't think reducing GC burden accelerates the overall sync process. (I had hoped it might...) But other things (network/disk?) are limiting as long as there's enough CPU available. I tried running "env GOGC=400 ./lbcd" and although CPU use is low, sync doesn't go faster. :-( Perhaps the way to accelerate sync would be syncing blocks from multiple peers.

Finally, I am not entirely comfortable with the Mutable/Immutable treap changes in database/... It's my best attempt to combine original Immutable treap semantics with the way that ffldb/transaction actually wants to use it. I was very successful in reducing allocations, but I'm still not sure it's safe. It could also be fragile if treap is used in different ways in the future.

Test data forthcoming...

moodyjon and others added 10 commits June 2, 2022 14:08
Return []Changes and a "closer" func to return them to Pool.
Update tests for LoadChanges new return values.
…tring entails lots of temporary allocations.
…Mutable

treap recycle nodes consistently. Rework Immutable treap node recycling
attempting to make it safer in the presence of code that takes snapshots
(dbCacheSnapshot) of the treap. Add special mutable PutM and DeleteM
methods which DB transaction can use to apply changes more efficiently
without creating lots of garbage memory.
@coveralls
Copy link

Pull Request Test Coverage Report for Build 2430460733

  • 163 of 345 (47.25%) changed or added relevant lines in 10 files are covered.
  • 5 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.09%) to 51.695%

Changes Missing Coverage Covered Lines Changed/Added Lines %
blockchain/claimtrie.go 0 1 0.0%
claimtrie/node/node.go 2 3 66.67%
database/internal/treap/common.go 24 25 96.0%
claimtrie/node/noderepo/pebble.go 55 61 90.16%
database/ffldb/dbcache.go 38 52 73.08%
database/internal/treap/mutable.go 10 26 38.46%
lbcd.go 0 19 0.0%
database/internal/treap/immutable.go 31 155 20.0%
Files with Coverage Reduction New Missed Lines %
claimtrie/node/noderepo/pebble.go 1 68.75%
database/ffldb/dbcache.go 1 75.07%
peer/peer.go 3 75.76%
Totals Coverage Status
Change from base Build 2403229780: -0.09%
Covered Lines: 22517
Relevant Lines: 43557

💛 - Coveralls

@roylee17
Copy link
Collaborator

roylee17 commented Jun 2, 2022

Good analysis and attempt :-)

You've managed to group the changes into logical commits, let's break them down into separate PRs as well.
Even smaller commits deserve their own PRs if they are logically different change sets.

  • Treap related changes
  • Memprofile support
  • GC override support
  • Claimtrie
  • ...

Thanks!

@moodyjon
Copy link
Collaborator Author

moodyjon commented Jun 3, 2022

Syncing from 0. Killed after 1 hour wall clock time.

before.log.txt


before cpu

before mem allocs

after.log.txt


after cpu

after mem allocs

@moodyjon
Copy link
Collaborator Author

moodyjon commented Jun 3, 2022

Syncing from 984287. Killed after 1 hour wall clock time.

before.log.txt

before cpu

before mem allocs

after.log.txt

after cpu

after mem allocs

@moodyjon
Copy link
Collaborator Author

moodyjon commented Jun 3, 2022

node Clone() is a major new source of temporary allocations after 8f95946

It's most noticeable when syncing from 984287.

@roylee17
Copy link
Collaborator

roylee17 commented Jun 3, 2022

Syncing from 984287. Killed after 1 hour wall clock time.

The chain got much busier after 800K.
From your report, it seems to have some improvements: sync'ed more blocks with less resource usage.

before: 4.5GB memory at 1,018,896
after: 4.3 GB memory at 1,020,362

I'm running full sync to the tip, which should give us even more noticeable numbers.

We'll follow up in #46

@moodyjon
Copy link
Collaborator Author

moodyjon commented Jun 3, 2022

before: 4.5GB memory at 1,018,896 after: 4.3 GB memory at 1,020,362

Are you basing this on:
2022-06-03 08:22:57.082 [INF] MAIN: RAM: using 4.3 GB with 6.9 available, DISK: using 80.6 GB with 876.0 available

I was not trying to reduce RSS or active heap memory. Just reduce temp allocations that become garbage.

I have the before/after.mem.heap files, though I only uploaded before/after.mem.allocs.

@moodyjon
Copy link
Collaborator Author

Closing this as the changes were unbundled into individual pull requests.

@moodyjon moodyjon closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants