Skip to content

Conversation

@adiholden
Copy link
Contributor

fixes #5135
On the error path for incremental snapshot, the stream is finalized from the snapshot fiber by calling SliceSnapshot::FinalizeJournalStream. This ends up with a join call to the snapshot fiber on itself, which triggers an assertion

The error path of incremental snapshotting occurred because when sending the lsn to to the replica flow we sent incorrect lsn number because we first fetch lsn of shard and than migrate the connection. To fix this we now migrate the connection before fetching the lsn.
@abhijat was able to figure out the problem with the flow, I followed up with a PR to fix the lsn and a simple test to reproduce

Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden adiholden requested review from abhijat and romange May 27, 2025 18:49
std::make_error_code(errc::state_not_recoverable),
absl::StrCat("Partial sync was unsuccessful because entry #", lsn,
" was dropped from the buffer. Current lsn=", journal->GetLsn()));
FinalizeJournalStream(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@adiholden adiholden merged commit bc72fb3 into main May 28, 2025
10 checks passed
@adiholden adiholden deleted the fix_5135 branch May 28, 2025 07:00
romange pushed a commit that referenced this pull request May 28, 2025
fix server: fix partial sync flow

Signed-off-by: adi_holden <adi@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dragonfly crash during replication: v1.30.0

5 participants