fix: capture head revision atomically with atext to prevent mismatched apply#7480
fix: capture head revision atomically with atext to prevent mismatched apply#7480JohnMcLear merged 6 commits intodevelopfrom
Conversation
Review Summary by QodoFix atomic capture of head revision to prevent mismatched apply errors
WalkthroughsDescription• Fixes race condition where concurrent edits caused revision mismatch • Captures head revision atomically with atext to ensure consistency • Prevents "mismatched apply" errors on client initialization • Adds regression test verifying CLIENT_VARS rev matches initialAttributedText Diagramflowchart LR
A["Client connects"] --> B["Capture atext at rev N"]
B --> C["Concurrent edits advance to rev N+3"]
C --> D["Old: capture rev N+3 separately"]
D --> E["Mismatch: text from N, rev N+3"]
E --> F["Next changeset fails"]
A --> G["New: capture atext AND headRev together"]
G --> H["Both from same revision N"]
H --> I["Consistent: text and rev both N"]
I --> J["Next changeset succeeds"]
File Changes1. src/node/handler/PadMessageHandler.ts
|
Code Review by Qodo
|
3fc7b74 to
5bec620
Compare
Review Summary by QodoFix race condition in CLIENT_VARS revision consistency
WalkthroughsDescription• Capture head revision atomically with atext to prevent race condition - Concurrent edits between atext snapshot and revision read caused mismatched apply errors - Now captures headRev at same time as atext for consistency • Flush missed revisions after socket joins pad room - Revisions appended during clientVars hook await window were missed by connecting client - Call updatePadClients after socket.join to deliver catch-up changesets • Add regression tests for CLIENT_VARS revision consistency - Verify rev matches initialAttributedText state - Test catch-up changesets during clientVars hook await window Diagramflowchart LR
A["Concurrent edits<br/>advance revision"] -->|"Previously: gap<br/>between reads"| B["Mismatched apply<br/>error"]
A -->|"Now: atomic capture"| C["headRev captured<br/>with atext"]
C --> D["CLIENT_VARS<br/>consistent"]
D --> E["socket.join<br/>then flush revisions"]
E --> F["Client receives<br/>all changesets"]
File Changes1. src/node/handler/PadMessageHandler.ts
|
Code Review by Qodo
1. No loadTesting reproduction test
|
Addresses Qodo review items 1, 2, 5 from https://github.com/ether/etherpad-lite/issues/comments/4194702740 : - Concern 1 (no loadTesting reproduction test): the suite now toggles settings.loadTest = true in before(), restores in after(). The middle test also pre-populates the pad with 20 revisions before connecting so we genuinely exercise a busy/loaded pad rather than a fresh one. - Concern 2 (no CLIENT_VARS / NEW_CHANGES delay test): the slow clientVars hook in the middle test now has explicit setTimeout delays before AND after the mid-handshake edits, so the race window between atext snapshot and CLIENT_VARS send is observably wide rather than relying on async scheduling alone. The test also collects post-handshake messages and asserts a NEW_CHANGES catch-up arrives when the pad advanced past the advertised rev. - Concern 5 (test doesn't validate rev): both rev-consistency tests use pad.getInternalRevisionAText(advertisedRev) and assert text and attribs match, not just `pad.text() === clientVars.text`. Concerns 3 (connect can miss revisions) and 4 (NaN timeDelta) were already addressed in earlier commits on this branch via the catch-up updatePadClients() call and the sessionInfo.time initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d apply When constructing CLIENT_VARS, pad.atext was captured at one point but pad.getHeadRevisionNumber() was called later. If concurrent edits advanced the revision between these two reads, the client received initialAttributedText from rev N but rev=N+3, causing "mismatched apply" errors when the next changeset arrived (expecting rev N+3 text). Now captures headRev at the same time as atext and uses the captured value consistently in CLIENT_VARS and sessionInfo. Fixes #4040 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
During handleClientReady(), the server awaits the clientVars hook before socket.join(). Any revisions appended during that await window are broadcast to existing room members but the connecting socket misses them. Call updatePadClients(pad) after joining to flush any such revisions. Also adds a regression test that injects a slow clientVars hook and verifies the connecting client receives catch-up changesets for edits that occurred during the hook await window. Fixes #4040 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Listen for messages during handshake to avoid missing NEW_CHANGES that arrive before the explicit waitForSocketEvent listener is attached. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The catch-up updatePadClients() call introduced in this PR could send NEW_CHANGES with timeDelta=NaN because sessionInfo.time was never set for new sessions. NaN poisons the client-side broadcast/timeslider currentTime tracking. Initialize sessionInfo.time to the timestamp of the snapshot revision before the catch-up flush, with a fallback to Date.now() if the revision date is unavailable. Also strengthens the regression tests: - Validate that initialAttributedText matches the pad AText at the EXACT advertised rev (not just the latest pad text), using pad.getInternalRevisionAText(rev). - Add a load test that hammers the pad with concurrent edits while multiple clients connect, asserting CLIENT_VARS consistency under the exact race condition the fix is targeting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous load test ran 'while (!stopLoad) await pad.setText(...)' in the background while the test connected clients. This saturated ueberDB's write queue and on shutdown the queued writes never drained, hanging the mocha process for the full 6h GitHub Actions job timeout. Replace it with a bounded approach: a clientVars hook lands 3 edits mid-handshake (deterministic, no background loop, no shutdown hang). Still exercises the exact race the fix targets — an edit advancing the rev after the atext snapshot but before CLIENT_VARS is sent — and asserts AText / rev consistency via getInternalRevisionAText. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses Qodo review items 1, 2, 5 from https://github.com/ether/etherpad-lite/issues/comments/4194702740 : - Concern 1 (no loadTesting reproduction test): the suite now toggles settings.loadTest = true in before(), restores in after(). The middle test also pre-populates the pad with 20 revisions before connecting so we genuinely exercise a busy/loaded pad rather than a fresh one. - Concern 2 (no CLIENT_VARS / NEW_CHANGES delay test): the slow clientVars hook in the middle test now has explicit setTimeout delays before AND after the mid-handshake edits, so the race window between atext snapshot and CLIENT_VARS send is observably wide rather than relying on async scheduling alone. The test also collects post-handshake messages and asserts a NEW_CHANGES catch-up arrives when the pad advanced past the advertised rev. - Concern 5 (test doesn't validate rev): both rev-consistency tests use pad.getInternalRevisionAText(advertisedRev) and assert text and attribs match, not just `pad.text() === clientVars.text`. Concerns 3 (connect can miss revisions) and 4 (NaN timeDelta) were already addressed in earlier commits on this branch via the catch-up updatePadClients() call and the sessionInfo.time initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
61e3748 to
6e44d98
Compare
|
I have some concerns w/ this PR RE Timings etc. but it's the best I can get it so far... |
Summary
pad.atextwas cloned at one point andpad.getHeadRevisionNumber()was called laterinitialAttributedTextfrom rev N butrev: N+3NEW_CHANGESarrived (for rev N+4), its changeset expected rev N+3 text but the client had rev N text, causing"mismatched apply: X / Y"headRevat the same time asatextand use it consistentlyRoot Cause
The client thinks it's at N+3 with rev N text. Next changeset for N+4 expects N+3 text → boom.
Test plan
revmatchesinitialAttributedTextstateFixes #4040
🤖 Generated with Claude Code