Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks #576

xiaoxichen · 2024-10-30T23:51:09Z

This commit introduces a mechanism to garbage collect (GC) replication requests (rreqs) that may hang indefinitely, thereby consuming memory and disk resources unnecessarily. These rreqs can enter a hanging state under several circumstances, as outlined below:

Scenario with Delayed Commit:
- Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer than the raft timeout to precommit/commit it.
- L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was committed in a previous attempt, this log entry is skipped, leaving the rreq hanging indefinitely.
Scenario with Leader Failure Before Data Completion:
- Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed, L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new rreq and commits it, but the initial rreq from L1 hangs indefinitely as it cannot fetch data from the new leader L2.
Scenario with Leader Failure After Data Write:
- Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of LSN 100 reaches F1, L1 fails and L2 becomes the new leader.
- L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the original rreq hanging.

This garbage collection process cleans up based on DSN. Any rreqs in m_repl_key_req_map, whose DSN is already committed (rreq->dsn < repl_dev->m_next_dsn), will be GC'd. This is safe on the follower side, as the follower updates m_next_dsn during commit. Any DSN below cur_dsn should already be committed, implying that the rreq should already be removed from m_repl_key_req_map.

On the leader side, since m_next_dsn is updated when sending out the proposal, it is not safe to clean up based on m_next_dsn. Therefore, we explicitly skip the leader in this GC process.

This commit introduces a mechanism to garbage collect (GC) replication requests (rreqs) that may hang indefinitely, thereby consuming memory and disk resources unnecessarily. These rreqs can enter a hanging state under several circumstances, as outlined below: 1. Scenario with Delayed Commit: - Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer than the raft timeout to precommit/commit it. - L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was committed in a previous attempt, this log entry is skipped, leaving the rreq hanging indefinitely. 2. Scenario with Leader Failure Before Data Completion: - Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new rreq and commits it, but the initial rreq from L1 hangs indefinitely as it cannot fetch data from the new leader L2. 3. Scenario with Leader Failure After Data Write: - Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of LSN 100 reaches F1, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the original rreq hanging. This garbage collection process cleans up based on DSN. Any rreqs in `m_repl_key_req_map`, whose DSN is already committed (`rreq->dsn < repl_dev->m_next_dsn`), will be GC'd. This is safe on the follower side, as the follower updates `m_next_dsn` during commit. Any DSN below `cur_dsn` should already be committed, implying that the rreq should already be removed from `m_repl_key_req_map`. On the leader side, since `m_next_dsn` is updated when sending out the proposal, it is not safe to clean up based on `m_next_dsn`. Therefore, we explicitly skip the leader in this GC process. Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

codecov-commenter · 2024-10-31T00:20:46Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 51.35135% with 18 lines in your changes missing coverage. Please review.

Project coverage is 67.28%. Comparing base (1a0cef8) to head (8ccd8ee).
Report is 85 commits behind head on master.

Files with missing lines	Patch %	Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp	45.45%	17 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #576       +/-   ##
===========================================
+ Coverage   56.51%   67.28%   +10.77%     
===========================================
  Files         108      109        +1     
  Lines       10300    10677      +377     
  Branches     1402     1459       +57     
===========================================
+ Hits         5821     7184     +1363     
+ Misses       3894     2801     -1093     
- Partials      585      692      +107

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Leader may send duplicate raft logs, if we localize them unconditionally duplicate data will be written to chunk during fetch_data. It is safe for us to skip those logs that already committed, there is no way those LSN can be over-written. Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

src/lib/replication/repl_dev/raft_repl_dev.cpp

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

zhiteng

lgtm

src/lib/replication/repl_dev/raft_repl_dev.cpp

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

sanebay · 2024-11-01T18:12:56Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+        // FIXME: Skipping proposer for now, the DSN in proposer increased in proposing stage, not when commit().
+        // Need other mechanism.
        if (rreq->is_proposer()) {
+            RD_LOGD("Skipping rreq=[{}] due to is_proposer, elapsed_time_sec{};", rreq->to_string(),


This will create lot of logs.

sanebay · 2024-11-01T18:13:04Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+    std::vector< repl_req_ptr_t > expired_rreqs;
+
+    auto req_map_size = m_repl_key_req_map.size();
+    RD_LOGW("m_repl_key_req_map size is {};", req_map_size);


remove the log.

Moved it to LOGI, this logging is helpful as we hit mem-leak twice around same place that in certain cases we dont remove rreq from m_repl_key_req_map. The first one is what you found and fixed that we forgot to remove in on_commit, this patch is the second time...

Better keep a metrics of this repl related map's of count and total memory usage.

sanebay · 2024-11-01T18:13:31Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+                    rreq->dsn(), cur_dsn, cur_dsn - rreq->dsn());
+            // FIXME: Wait till the rreq expired is obviously safer, though as commited request will
+            //  be removed from map in on_commit(), we probably don't need wait till expired.
+            if (rreq->is_expired()) {


If its already committed on follower, why check for expire. we can GC it immediately.

Yeah we can do that.

sanebay · 2024-11-02T00:59:29Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

        if (rreq->is_expired()) {
-            expired_keys.push_back(key);
-            RD_LOGD("rreq=[{}] is expired, cleaning up; elapsed_time_sec{};", rreq->to_string(),
+            RD_LOGD("StateMachine: rreq=[{}] is expired, elapsed_time_sec{};", rreq->to_string(),


not adding to expired_rreqs ?

It is very risky to remove a rreq from state-machine as the time you check it , it might in the middle of "commit" or "pre-commit" which will causing NPE/assert.

As we ensure logs are added to state-machine after data written, I dont find a case where we can have a request expiring in state-machine , so I am intentionally to remove this for loop (as said in the FIXME), but trying to verify through logging to get confidence.

sanebay · 2024-11-02T01:00:55Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+            m_repl_key_req_map.erase(removing_rreq->rkey());
+        }
+        // 3. remove from state-machine
+        if (removing_rreq->has_state(repl_req_state_t::LOG_FLUSHED)) {


While we are iterating, we delete or unlink from the same map. Is it safe ?

sorry I didnt get this.

iterate_repl_reqs is iterating through m_lsn_req_map and unlink is erasing it from m_lsn_req_map. Its not iterator so it looks safe.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

JacksonYao287

only a minor comment, other parts look good

JacksonYao287 · 2024-11-04T02:33:54Z

src/lib/replication/repl_dev/raft_state_machine.cpp

+        if (it->second == rreq) {
+            RD_LOG(DEBUG, "Raft channel: erase lsn {},  rreq {}", lsn, it->second->to_string());
+            m_lsn_req_map.erase(lsn);


there might a very small case that some change happens between line 229 and line 231.
my suggestion is using erase_if_equal instead
https://github.com/facebook/folly/blob/30a4e783a7618f17a5b24048625872e363068887/folly/concurrency/ConcurrentHashMap.h#L497

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

JacksonYao287

LG

JacksonYao287

LGTM! this a very nice step for sm long run

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

JacksonYao287

LGTM for now, let`s revisit here if necessary in the future.

JacksonYao287 · 2024-11-06T09:32:49Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+            // don't clean up proposer's request
+            continue;
+        }
+        if (rreq->dsn() < cur_dsn && rreq->is_expired()) {


we can revisit here if we have better solution to accurately identify the garbage in the future.
for now, let`s go ahead and not block sm long run.

* Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks This commit introduces a mechanism to garbage collect (GC) replication requests (rreqs) that may hang indefinitely, thereby consuming memory and disk resources unnecessarily. These rreqs can enter a hanging state under several circumstances, as outlined below: 1. Scenario with Delayed Commit: - Follower F1 receives LSN 100 and DSN 104 from Leader L1 and takes longer than the raft timeout to precommit/commit it. - L1 resends LSN 100, causing F1 to fetch the data again. Since LSN 100 was committed in a previous attempt, this log entry is skipped, leaving the rreq hanging indefinitely. 2. Scenario with Leader Failure Before Data Completion: - Follower F1 receives LSN 100 from L1, but before all data is fetched/pushed, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 with L2 as the new originator. F1 proceeds with the new rreq and commits it, but the initial rreq from L1 hangs indefinitely as it cannot fetch data from the new leader L2. 3. Scenario with Leader Failure After Data Write: - Follower F1 receives data (DSN 104) from L1 and writes it. Before the log of LSN 100 reaches F1, L1 fails and L2 becomes the new leader. - L2 resends LSN 100 to F1, and F1 fetches DSN 104 from L2, leaving the original rreq hanging. This garbage collection process cleans up based on DSN. Any rreqs in `m_repl_key_req_map`, whose DSN is already committed (`rreq->dsn < repl_dev->m_next_dsn`), will be GC'd. This is safe on the follower side, as the follower updates `m_next_dsn` during commit. Any DSN below `cur_dsn` should already be committed, implying that the rreq should already be removed from `m_repl_key_req_map`. On the leader side, since `m_next_dsn` is updated when sending out the proposal, it is not safe to clean up based on `m_next_dsn`. Therefore, we explicitly skip the leader in this GC process. Skipping localize raft logs we already committed. Leader may send duplicate raft logs, if we localize them unconditionally duplicate data will be written to chunk during fetch_data. It is safe for us to skip those logs that already committed, there is no way those LSN can be over-written. Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

xiaoxichen requested review from sanebay and yamingk October 31, 2024 00:36

xiaoxichen force-pushed the debug branch from d3ea460 to a5ce211 Compare October 31, 2024 03:42

xiaoxichen force-pushed the debug branch from a5ce211 to 94e5c87 Compare October 31, 2024 03:48

xiaoxichen requested a review from JacksonYao287 October 31, 2024 07:13

zhiteng requested changes Oct 31, 2024

View reviewed changes

src/lib/replication/repl_dev/raft_repl_dev.cpp Outdated Show resolved Hide resolved

src/lib/replication/repl_dev/raft_repl_dev.cpp Show resolved Hide resolved

xiaoxichen added 2 commits October 31, 2024 17:08

apply clang_format

37dc0e4

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

address comment

ae19ee5

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

zhiteng previously approved these changes Oct 31, 2024

View reviewed changes

JacksonYao287 reviewed Oct 31, 2024

View reviewed changes

src/lib/replication/repl_dev/raft_repl_dev.cpp Show resolved Hide resolved

src/lib/replication/repl_dev/raft_repl_dev.cpp Outdated Show resolved Hide resolved

xiaoxichen dismissed zhiteng’s stale review via 5acaf16 November 1, 2024 07:04

Sanity check when unlink_lsn_to_req

aed083e

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

xiaoxichen force-pushed the debug branch from 5acaf16 to aed083e Compare November 1, 2024 07:47

sanebay reviewed Nov 2, 2024

View reviewed changes

xiaoxichen added 2 commits November 2, 2024 10:07

address comment

79cdfa5

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

Merge branch 'master' into debug

5a69c40

JacksonYao287 reviewed Nov 4, 2024

View reviewed changes

use erase_if_equal

0c52055

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

xiaoxichen force-pushed the debug branch from 27e2821 to 0c52055 Compare November 5, 2024 14:37

xiaoxichen added 2 commits November 5, 2024 08:25

Merge branch 'master' into debug

6439794

update connan

45d52fa

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

JacksonYao287 previously approved these changes Nov 6, 2024

View reviewed changes

xiaoxichen dismissed JacksonYao287’s stale review via 5ae91a2 November 6, 2024 03:09

xiaoxichen force-pushed the debug branch from 5ae91a2 to 45d52fa Compare November 6, 2024 04:10

JacksonYao287 previously approved these changes Nov 6, 2024

View reviewed changes

DSN can be out of order

8ccd8ee

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

xiaoxichen dismissed JacksonYao287’s stale review via 8ccd8ee November 6, 2024 08:22

JacksonYao287 approved these changes Nov 6, 2024

View reviewed changes

xiaoxichen merged commit 50f42ff into eBay:master Nov 6, 2024

Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks #576

Implement GC_REPL_REQ Based on DSN to Prevent Resource Leaks #576

Uh oh!

Conversation

xiaoxichen commented Oct 30, 2024

Uh oh!

codecov-commenter commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

zhiteng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Oct 31, 2024 •

edited

Loading