-
Notifications
You must be signed in to change notification settings - Fork 24
Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces a mechanism to garbage collect (GC) replication requests
(rreqs) that may hang indefinitely, thereby consuming memory and disk resources
unnecessarily. These rreqs can enter a hanging state under several
circumstances, as outlined below:
1. Scenario with Delayed Commit:
- Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer
than the raft timeout to precommit/commit it.
- L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was
committed in a previous attempt, this log entry is skipped, leaving the
rreq hanging indefinitely.
2. Scenario with Leader Failure Before Data Completion:
- Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed,
L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new
rreq and commits it, but the initial rreq from L1 hangs indefinitely as it
cannot fetch data from the new leader L2.
3. Scenario with Leader Failure After Data Write:
- Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of
LSN 100 reaches F1, L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the
original rreq hanging.
This garbage collection process cleans up based on DSN. Any rreqs in
`m_repl_key_req_map`, whose DSN is already committed (`rreq->dsn <
repl_dev->m_next_dsn`), will be GC'd. This is safe on the follower side, as the
follower updates `m_next_dsn` during commit. Any DSN below `cur_dsn` should
already be committed, implying that the rreq should already be removed from
`m_repl_key_req_map`.
On the leader side, since `m_next_dsn` is updated when sending out the proposal,
it is not safe to clean up based on `m_next_dsn`. Therefore, we explicitly skip
the leader in this GC process.
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #576 +/- ##
===========================================
+ Coverage 56.51% 67.28% +10.77%
===========================================
Files 108 109 +1
Lines 10300 10677 +377
Branches 1402 1459 +57
===========================================
+ Hits 5821 7184 +1363
+ Misses 3894 2801 -1093
- Partials 585 692 +107 ☔ View full report in Codecov by Sentry. |
Leader may send duplicate raft logs, if we localize them unconditionally duplicate data will be written to chunk during fetch_data. It is safe for us to skip those logs that already committed, there is no way those LSN can be over-written. Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
zhiteng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
| // FIXME: Skipping proposer for now, the DSN in proposer increased in proposing stage, not when commit(). | ||
| // Need other mechanism. | ||
| if (rreq->is_proposer()) { | ||
| RD_LOGD("Skipping rreq=[{}] due to is_proposer, elapsed_time_sec{};", rreq->to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create lot of logs.
| std::vector< repl_req_ptr_t > expired_rreqs; | ||
|
|
||
| auto req_map_size = m_repl_key_req_map.size(); | ||
| RD_LOGW("m_repl_key_req_map size is {};", req_map_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to LOGI, this logging is helpful as we hit mem-leak twice around same place that in certain cases we dont remove rreq from m_repl_key_req_map. The first one is what you found and fixed that we forgot to remove in on_commit, this patch is the second time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better keep a metrics of this repl related map's of count and total memory usage.
| rreq->dsn(), cur_dsn, cur_dsn - rreq->dsn()); | ||
| // FIXME: Wait till the rreq expired is obviously safer, though as commited request will | ||
| // be removed from map in on_commit(), we probably don't need wait till expired. | ||
| if (rreq->is_expired()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its already committed on follower, why check for expire. we can GC it immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can do that.
| if (rreq->is_expired()) { | ||
| expired_keys.push_back(key); | ||
| RD_LOGD("rreq=[{}] is expired, cleaning up; elapsed_time_sec{};", rreq->to_string(), | ||
| RD_LOGD("StateMachine: rreq=[{}] is expired, elapsed_time_sec{};", rreq->to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not adding to expired_rreqs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very risky to remove a rreq from state-machine as the time you check it , it might in the middle of "commit" or "pre-commit" which will causing NPE/assert.
As we ensure logs are added to state-machine after data written, I dont find a case where we can have a request expiring in state-machine , so I am intentionally to remove this for loop (as said in the FIXME), but trying to verify through logging to get confidence.
| m_repl_key_req_map.erase(removing_rreq->rkey()); | ||
| } | ||
| // 3. remove from state-machine | ||
| if (removing_rreq->has_state(repl_req_state_t::LOG_FLUSHED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are iterating, we delete or unlink from the same map. Is it safe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didnt get this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterate_repl_reqs is iterating through m_lsn_req_map and unlink is erasing it from m_lsn_req_map. Its not iterator so it looks safe.
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
JacksonYao287
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only a minor comment, other parts look good
| if (it->second == rreq) { | ||
| RD_LOG(DEBUG, "Raft channel: erase lsn {}, rreq {}", lsn, it->second->to_string()); | ||
| m_lsn_req_map.erase(lsn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there might a very small case that some change happens between line 229 and line 231.
my suggestion is using erase_if_equal instead
https://github.com/facebook/folly/blob/30a4e783a7618f17a5b24048625872e363068887/folly/concurrency/ConcurrentHashMap.h#L497
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
JacksonYao287
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
JacksonYao287
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! this a very nice step for sm long run
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
JacksonYao287
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now, let`s revisit here if necessary in the future.
| // don't clean up proposer's request | ||
| continue; | ||
| } | ||
| if (rreq->dsn() < cur_dsn && rreq->is_expired()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can revisit here if we have better solution to accurately identify the garbage in the future.
for now, let`s go ahead and not block sm long run.
* Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks
This commit introduces a mechanism to garbage collect (GC) replication requests
(rreqs) that may hang indefinitely, thereby consuming memory and disk resources
unnecessarily. These rreqs can enter a hanging state under several
circumstances, as outlined below:
1. Scenario with Delayed Commit:
- Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer
than the raft timeout to precommit/commit it.
- L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was
committed in a previous attempt, this log entry is skipped, leaving the
rreq hanging indefinitely.
2. Scenario with Leader Failure Before Data Completion:
- Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed,
L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new
rreq and commits it, but the initial rreq from L1 hangs indefinitely as it
cannot fetch data from the new leader L2.
3. Scenario with Leader Failure After Data Write:
- Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of
LSN 100 reaches F1, L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the
original rreq hanging.
This garbage collection process cleans up based on DSN. Any rreqs in
`m_repl_key_req_map`, whose DSN is already committed (`rreq->dsn <
repl_dev->m_next_dsn`), will be GC'd. This is safe on the follower side, as the
follower updates `m_next_dsn` during commit. Any DSN below `cur_dsn` should
already be committed, implying that the rreq should already be removed from
`m_repl_key_req_map`.
On the leader side, since `m_next_dsn` is updated when sending out the proposal,
it is not safe to clean up based on `m_next_dsn`. Therefore, we explicitly skip
the leader in this GC process.
Skipping localize raft logs we already committed.
Leader may send duplicate raft logs, if we localize them
unconditionally duplicate data will be written to chunk during
fetch_data.
It is safe for us to skip those logs that already committed,
there is no way those LSN can be over-written.
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
This commit introduces a mechanism to garbage collect (GC) replication requests (rreqs) that may hang indefinitely, thereby consuming memory and disk resources unnecessarily. These rreqs can enter a hanging state under several circumstances, as outlined below:
Scenario with Delayed Commit:
Scenario with Leader Failure Before Data Completion:
Scenario with Leader Failure After Data Write:
This garbage collection process cleans up based on DSN. Any rreqs in
m_repl_key_req_map, whose DSN is already committed (rreq->dsn < repl_dev->m_next_dsn), will be GC'd. This is safe on the follower side, as the follower updatesm_next_dsnduring commit. Any DSN belowcur_dsnshould already be committed, implying that the rreq should already be removed fromm_repl_key_req_map.On the leader side, since
m_next_dsnis updated when sending out the proposal, it is not safe to clean up based onm_next_dsn. Therefore, we explicitly skip the leader in this GC process.