Conversation
c12b748 to
afe4d1e
Compare
| batch_boundary: BatchBoundary, | ||
| current_ts: t.Optional[int] = None, | ||
| ignore_ttl: bool = False, | ||
| ) -> t.Optional[ExpiredSnapshotBatch]: |
There was a problem hiding this comment.
Was there a reason to make this t.Optional vs just returning an empty list if there are no expired snapshots like it did before?
There was a problem hiding this comment.
Yes that is correct
afe4d1e to
b3c689f
Compare
17019ad to
1096c84
Compare
|
|
||
| def delete_expired_snapshots( | ||
| state_sync: StateSync, | ||
| snapshot_evaluator: SnapshotEvaluator, |
There was a problem hiding this comment.
IMHO, I don't think this belongs here. I don't believe State Sync should have anything to do with SnapshotEvaluator. Plus this creates a circular dependency between modules.
The logical relationship should be SnapshotEvaluator -> StateSync.
Anticipating your question - yes, I think putting cleanup_expired_views in here was a mistake.
There was a problem hiding this comment.
Where do you want it located then?
There was a problem hiding this comment.
perhaps it's time to create the core/janitor.py module?
There was a problem hiding this comment.
Can we consider this out of scope for this PR? I'm open to following up if you think it is important but it doesn't seem like this should be required to get this change in.
There was a problem hiding this comment.
sure. But I do think it's important
08c6053 to
0f2211c
Compare
0f2211c to
c95209a
Compare
Batches the deletion of expired snapshots by paging over the candidates using the
updated_at,name, andidentifieras stable fields to do the paging. Simple viz explaining what is going on (just showsupdated_atbut know that it includes all 3 columns):In that example, 3 batches are processed. Getting expired snapshots a limited since it would be unbounded otherwise. Instead of passing though snapshots directly into delete instead the "boundary" is passed into the delete. That becomes the "upper boundary" for the delete to then look backwards to perform the deletes. Limit is not needed since this is bounded. There is an underlying assumption that these upper boundaries are provided in paged order (on other words, the last batch isn't provided first since that would remove the benefit of batching).
This approach compared to passing in the deleted snapshot ids directly has the following benefits: