Skip to content

Fix lease extension, and make it recursive#9956

Merged
stuhood merged 4 commits into
pantsbuild:masterfrom
stuhood:stuhood/lease-extension-fixes
Jun 4, 2020
Merged

Fix lease extension, and make it recursive#9956
stuhood merged 4 commits into
pantsbuild:masterfrom
stuhood:stuhood/lease-extension-fixes

Conversation

@stuhood
Copy link
Copy Markdown
Member

@stuhood stuhood commented Jun 4, 2020

Problem

As was misdiagnosed on #9942, extension of leases is currently broken due to Store::lease_all using a separate implementation of fn lease that is not aware of VersionedFingerprint and thus has a different key. This means that the leases being read in Store::shrink are different from the leases being extended in Store::lease_all.

Additionally, lease extension is not currently recursive into all values held in memory in the Graph, which might mean that a value held in memory in pantsd can become invalidated.

Solution

Fix the first issue by removing the duplicate implementation of fn lease, and making ShardedLmdb more directly responsible for lease management: add a test to cover the failing case.

In separate commits, adjust our lease extension of in-memory content to be recursive into Snapshots and ProcessResults.

Result

Fixes #9942.

@stuhood stuhood added this to the 1.29.x milestone Jun 4, 2020
@stuhood stuhood force-pushed the stuhood/lease-extension-fixes branch from 12caed5 to 40e1f71 Compare June 4, 2020 04:01
@stuhood stuhood marked this pull request as ready for review June 4, 2020 04:02
@stuhood
Copy link
Copy Markdown
Member Author

stuhood commented Jun 4, 2020

Commits are useful to review independently. Thanks!

stuhood added 4 commits June 3, 2020 21:07
… a different key. Centralize, and add a test.

[ci skip-jvm-tests]
…ecursive more parallel and more async/awaity.

[ci skip-jvm-tests]
# Delete this line to force CI to run the JVM tests.
[ci skip-jvm-tests]
@stuhood stuhood force-pushed the stuhood/lease-extension-fixes branch from 40e1f71 to e8c6a7a Compare June 4, 2020 04:08
let f_dbs = self.inner.file_dbs.clone()?;
let is_file = f_dbs.exists(fingerprint);

// TODO: Could technically use select to return slightly more quickly with the first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to fix this TODO at some point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly? It's preeeetty micro-optimizey, but might be worth doing the next time someone is in there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that this introduces a TODO that will never get done. Maybe just have the comment not say TODO if there is no intent to change later?

let executor = local.executor().clone();
let maybe_upload = local
.load_bytes_with(entry_type, digest, move |bytes| {
// NB: `load_bytes_with` runs on a spawned thread which we can safely block.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an assumption that could confuse a reader (if not for the comment). Should load_bytes_with be renamed to say it can block (e.g., load_bytes_with_blocking)? Have all other load_bytes_with call sites received a similar comment? Does Rust have a way to encode the "blockability" of a closure into the type? (probably not but I'm curious if it has)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an assumption that could confuse a reader (if not for the comment). Should load_bytes_with be renamed to say it can block (e.g., load_bytes_with_blocking)?

This is existing code that was only moved: see #9793. The called method also has a comment.

Does Rust have a way to encode the "blockability" of a closure into the type? (probably not but I'm curious if it has)

Good question: whee, colored functions.

No, it does not. On the "bright side", tokio will panic to let you know that you've blocked in the wrong place: so you'll know if you've done it wrong, but you don't get compiler assistance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@stuhood stuhood merged commit 9333b09 into pantsbuild:master Jun 4, 2020
@stuhood stuhood deleted the stuhood/lease-extension-fixes branch June 4, 2020 20:32
stuhood added a commit that referenced this pull request Jun 5, 2020
As was misdiagnosed on #9942, extension of leases is currently broken due to `Store::lease_all` using a separate implementation of `fn lease` that is not aware of `VersionedFingerprint` and thus has a different key. This means that the leases being read in `Store::shrink` are different from the leases being extended in `Store::lease_all`.

Additionally, lease extension is not currently recursive into all values held in memory in the `Graph`, which might mean that a value held in memory in `pantsd` can become invalidated.

Fix the first issue by removing the duplicate implementation of `fn lease`, and making `ShardedLmdb` more directly responsible for lease management: add a test to cover the failing case.

In separate commits, adjust our lease extension of in-memory content to be recursive into `Snapshots` and `ProcessResults`.

Fixes #9942.

[ci skip-jvm-tests]
@stuhood stuhood removed the needs-cherrypick [CI] label Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Garbage collection of the store is overly eager

3 participants