Skip to content

fix(zero-cache): delay the replication-manager handoff for loadbalancer task registration#5250

Merged
darkgnotic merged 2 commits into
mainfrom
darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration
Dec 2, 2025
Merged

fix(zero-cache): delay the replication-manager handoff for loadbalancer task registration#5250
darkgnotic merged 2 commits into
mainfrom
darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration

Conversation

@darkgnotic
Copy link
Copy Markdown
Contributor

@darkgnotic darkgnotic commented Dec 2, 2025

Restore the original replication-manager behavior of delaying the replication stream takeover to allow the task to be registered as a healthy target by the load balancer (i.e. after a minimum number of health checks). This fixes the temporary unreachability of the replication-manager when the handoff happens before the load-balancer has recognized the new replication-manager as healthy.

This original functionality was simplified away with the introduction of auto-discovery (#4335), since that replaced the dns and proxying component, but never restored when proxy-based routing was reintroduced in #4584 (and is now the recommended configuration).

This new implementation is more compartmentalized than the original implementation, encapsulating all of the logic in the ChangeStreamerHttpService, so that the ChangeStreamerService itself is agnostic to the details of health checks and startup delays.

@darkgnotic darkgnotic requested review from cesara and grgbkr December 2, 2025 00:21
@vercel
Copy link
Copy Markdown

vercel Bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
replicache-docs Ready Ready Preview Comment Dec 2, 2025 0:26am
zbugs Ready Ready Preview Comment Dec 2, 2025 0:26am

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 2, 2025

🐰 Bencher Report

Branchdarkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration
TestbedLinux
Click to view all benchmark results
BenchmarkFile SizeBenchmark Result
kilobytes (KB)
(Result Δ%)
Upper Boundary
kilobytes (KB)
(Limit %)
zero-package.tgz📈 view plot
🚷 view threshold
1,752.78 KB
(+0.05%)Baseline: 1,751.87 KB
1,786.91 KB
(98.09%)
zero.js📈 view plot
🚷 view threshold
237.77 KB
(0.00%)Baseline: 237.77 KB
242.52 KB
(98.04%)
zero.js.br📈 view plot
🚷 view threshold
65.69 KB
(0.00%)Baseline: 65.69 KB
67.01 KB
(98.04%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 2, 2025

🐰 Bencher Report

Branchdarkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration
Testbedself-hosted

🚨 1 Alert

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Lower Boundary
(Limit %)
zpg: scan with one depth relatedThroughput
operations / second (ops/s)
📈 plot
🚷 threshold
🚨 alert (🔔)
362.62 ops/s
(-14.51%)Baseline: 424.17 ops/s
378.26 ops/s
(104.32%)

Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s)
(Result Δ%)
Lower Boundary
operations / second (ops/s)
(Limit %)
zpg: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
835.60 ops/s
(-7.56%)Baseline: 903.91 ops/s
749.21 ops/s
(89.66%)
zpg: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
876.55 ops/s
(-6.69%)Baseline: 939.37 ops/s
806.02 ops/s
(91.95%)
zpg: (table scan) select * from album📈 view plot
🚷 view threshold
708.37 ops/s
(+0.03%)Baseline: 708.18 ops/s
610.60 ops/s
(86.20%)
zpg: OR with empty branch and limit📈 view plot
🚷 view threshold
877.58 ops/s
(+5.87%)Baseline: 828.96 ops/s
697.06 ops/s
(79.43%)
zpg: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
606.45 ops/s
(-11.52%)Baseline: 685.41 ops/s
558.16 ops/s
(92.04%)
zpg: all playlists📈 view plot
🚷 view threshold
5.83 ops/s
(+0.85%)Baseline: 5.78 ops/s
5.64 ops/s
(96.65%)
zpg: scan with one depth related📈 view plot
🚷 view threshold
🚨 view alert (🔔)
362.62 ops/s
(-14.51%)Baseline: 424.17 ops/s
378.26 ops/s
(104.32%)

zql: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
131,034.83 ops/s
(+3.62%)Baseline: 126,452.34 ops/s
104,353.72 ops/s
(79.64%)
zql: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
1,582.60 ops/s
(-27.29%)Baseline: 2,176.46 ops/s
1,544.86 ops/s
(97.62%)
zql: (table scan) select * from album📈 view plot
🚷 view threshold
6,523.84 ops/s
(-6.22%)Baseline: 6,956.22 ops/s
6,359.23 ops/s
(97.48%)
zql: OR with empty branch and limit📈 view plot
🚷 view threshold
53,103.55 ops/s
(-7.24%)Baseline: 57,251.13 ops/s
40,131.28 ops/s
(75.57%)
zql: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
12,137.60 ops/s
(-2.71%)Baseline: 12,475.93 ops/s
9,911.23 ops/s
(81.66%)
zql: all playlists📈 view plot
🚷 view threshold
4.65 ops/s
(+3.02%)Baseline: 4.51 ops/s
4.05 ops/s
(86.99%)
zql: edit for limited query, inside the bound📈 view plot
🚷 view threshold
244,699.31 ops/s
(+2.47%)Baseline: 238,804.65 ops/s
220,156.39 ops/s
(89.97%)
zql: edit for limited query, outside the bound📈 view plot
🚷 view threshold
272,831.63 ops/s
(+7.92%)Baseline: 252,812.22 ops/s
211,395.24 ops/s
(77.48%)
zql: push into limited query, inside the bound📈 view plot
🚷 view threshold
120,245.19 ops/s
(+1.65%)Baseline: 118,294.34 ops/s
111,569.03 ops/s
(92.78%)
zql: push into limited query, outside the bound📈 view plot
🚷 view threshold
463,634.31 ops/s
(+0.20%)Baseline: 462,714.98 ops/s
400,166.31 ops/s
(86.31%)
zql: push into unlimited query📈 view plot
🚷 view threshold
371,597.73 ops/s
(+0.37%)Baseline: 370,214.44 ops/s
340,880.68 ops/s
(91.73%)
zql: scan with one depth related📈 view plot
🚷 view threshold
511.91 ops/s
(+2.47%)Baseline: 499.58 ops/s
416.81 ops/s
(81.42%)
zqlite: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
46,576.17 ops/s
(-1.56%)Baseline: 47,314.39 ops/s
40,711.20 ops/s
(87.41%)
zqlite: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
11,249.15 ops/s
(-3.56%)Baseline: 11,664.17 ops/s
10,279.58 ops/s
(91.38%)
zqlite: (table scan) select * from album📈 view plot
🚷 view threshold
1,299.94 ops/s
(-6.43%)Baseline: 1,389.24 ops/s
1,271.16 ops/s
(97.79%)
zqlite: OR with empty branch and limit📈 view plot
🚷 view threshold
18,623.45 ops/s
(-4.34%)Baseline: 19,467.97 ops/s
15,747.44 ops/s
(84.56%)
zqlite: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
5,457.20 ops/s
(-4.77%)Baseline: 5,730.59 ops/s
4,612.87 ops/s
(84.53%)
zqlite: all playlists📈 view plot
🚷 view threshold
1.52 ops/s
(+0.64%)Baseline: 1.51 ops/s
1.41 ops/s
(92.55%)
zqlite: edit for limited query, inside the bound📈 view plot
🚷 view threshold
127,618.84 ops/s
(-0.08%)Baseline: 127,723.93 ops/s
117,662.89 ops/s
(92.20%)
zqlite: edit for limited query, outside the bound📈 view plot
🚷 view threshold
130,496.21 ops/s
(-0.30%)Baseline: 130,887.82 ops/s
119,870.89 ops/s
(91.86%)
zqlite: push into limited query, inside the bound📈 view plot
🚷 view threshold
4,203.50 ops/s
(-1.77%)Baseline: 4,279.31 ops/s
4,116.88 ops/s
(97.94%)
zqlite: push into limited query, outside the bound📈 view plot
🚷 view threshold
156,210.36 ops/s
(+2.38%)Baseline: 152,576.70 ops/s
132,911.72 ops/s
(85.09%)
zqlite: push into unlimited query📈 view plot
🚷 view threshold
138,121.54 ops/s
(+2.99%)Baseline: 134,116.62 ops/s
122,930.62 ops/s
(89.00%)
zqlite: scan with one depth related📈 view plot
🚷 view threshold
142.92 ops/s
(-13.77%)Baseline: 165.74 ops/s
131.64 ops/s
(92.11%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 2, 2025

🐰 Bencher Report

Branchdarkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration
Testbedself-hosted
Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s) x 1e3
(Result Δ%)
Lower Boundary
operations / second (ops/s) x 1e3
(Limit %)
src/client/custom.bench.ts > big schema📈 view plot
🚷 view threshold
923.75 ops/s x 1e3
(+6.04%)Baseline: 871.12 ops/s x 1e3
652.32 ops/s x 1e3
(70.62%)
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers)📈 view plot
🚷 view threshold
2.96 ops/s x 1e3
(-0.11%)Baseline: 2.96 ops/s x 1e3
2.81 ops/s x 1e3
(95.09%)
src/client/zero.bench.ts > pk compare > pk = N📈 view plot
🚷 view threshold
47.53 ops/s x 1e3
(+1.64%)Baseline: 46.76 ops/s x 1e3
41.87 ops/s x 1e3
(88.09%)
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers)📈 view plot
🚷 view threshold
4.12 ops/s x 1e3
(-0.80%)Baseline: 4.16 ops/s x 1e3
3.86 ops/s x 1e3
(93.65%)
🐰 View full continuous benchmarking report in Bencher

@darkgnotic darkgnotic added this pull request to the merge queue Dec 2, 2025
Merged via the queue into main with commit c25e810 Dec 2, 2025
17 of 18 checks passed
@darkgnotic darkgnotic deleted the darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration branch December 2, 2025 00:36
darkgnotic added a commit that referenced this pull request Dec 2, 2025
…er task registration (#5250)

Restore the original `replication-manager` behavior of delaying the
replication stream takeover to allow the task to be registered as a
healthy target by the load balancer (i.e. after a minimum number of
health checks). This fixes the temporary unreachability of the
replication-manager when the handoff happens before the load-balancer
has recognized the new replication-manager as healthy.

This original functionality was simplified away with the introduction of
auto-discovery (#4335), since that
replaced the dns and proxying component, but never restored when
proxy-based routing was reintroduced in
#4584 (and is now the recommended
configuration).

This new implementation is more compartmentalized than the original
implementation, encapsulating all of the logic in the
ChangeStreamerHttpService, so that the ChangeStreamerService itself is
agnostic to the details of health checks and startup delays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant