fix(zero-cache): delay the replication-manager handoff for loadbalancer task registration#5250
Merged
darkgnotic merged 2 commits intoDec 2, 2025
Conversation
…er task registration
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
| Branch | darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration |
| Testbed | Linux |
Click to view all benchmark results
| Benchmark | File Size | Benchmark Result kilobytes (KB) (Result Δ%) | Upper Boundary kilobytes (KB) (Limit %) |
|---|---|---|---|
| zero-package.tgz | 📈 view plot 🚷 view threshold | 1,752.78 KB(+0.05%)Baseline: 1,751.87 KB | 1,786.91 KB (98.09%) |
| zero.js | 📈 view plot 🚷 view threshold | 237.77 KB(0.00%)Baseline: 237.77 KB | 242.52 KB (98.04%) |
| zero.js.br | 📈 view plot 🚷 view threshold | 65.69 KB(0.00%)Baseline: 65.69 KB | 67.01 KB (98.04%) |
|
| Branch | darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration |
| Testbed | self-hosted |
🚨 1 Alert
| Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Lower Boundary (Limit %) |
|---|---|---|---|---|
| zpg: scan with one depth related | Throughput operations / second (ops/s) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 362.62 ops/s(-14.51%)Baseline: 424.17 ops/s | 378.26 ops/s (104.32%) |
Click to view all benchmark results
| Benchmark | Throughput | Benchmark Result operations / second (ops/s) (Result Δ%) | Lower Boundary operations / second (ops/s) (Limit %) |
|---|---|---|---|
| zpg: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 835.60 ops/s(-7.56%)Baseline: 903.91 ops/s | 749.21 ops/s (89.66%) |
| zpg: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 876.55 ops/s(-6.69%)Baseline: 939.37 ops/s | 806.02 ops/s (91.95%) |
| zpg: (table scan) select * from album | 📈 view plot 🚷 view threshold | 708.37 ops/s(+0.03%)Baseline: 708.18 ops/s | 610.60 ops/s (86.20%) |
| zpg: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 877.58 ops/s(+5.87%)Baseline: 828.96 ops/s | 697.06 ops/s (79.43%) |
| zpg: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 606.45 ops/s(-11.52%)Baseline: 685.41 ops/s | 558.16 ops/s (92.04%) |
| zpg: all playlists | 📈 view plot 🚷 view threshold | 5.83 ops/s(+0.85%)Baseline: 5.78 ops/s | 5.64 ops/s (96.65%) |
| zpg: scan with one depth related | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 362.62 ops/s(-14.51%)Baseline: 424.17 ops/s | 378.26 ops/s (104.32%) |
| zql: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 131,034.83 ops/s(+3.62%)Baseline: 126,452.34 ops/s | 104,353.72 ops/s (79.64%) |
| zql: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 1,582.60 ops/s(-27.29%)Baseline: 2,176.46 ops/s | 1,544.86 ops/s (97.62%) |
| zql: (table scan) select * from album | 📈 view plot 🚷 view threshold | 6,523.84 ops/s(-6.22%)Baseline: 6,956.22 ops/s | 6,359.23 ops/s (97.48%) |
| zql: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 53,103.55 ops/s(-7.24%)Baseline: 57,251.13 ops/s | 40,131.28 ops/s (75.57%) |
| zql: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 12,137.60 ops/s(-2.71%)Baseline: 12,475.93 ops/s | 9,911.23 ops/s (81.66%) |
| zql: all playlists | 📈 view plot 🚷 view threshold | 4.65 ops/s(+3.02%)Baseline: 4.51 ops/s | 4.05 ops/s (86.99%) |
| zql: edit for limited query, inside the bound | 📈 view plot 🚷 view threshold | 244,699.31 ops/s(+2.47%)Baseline: 238,804.65 ops/s | 220,156.39 ops/s (89.97%) |
| zql: edit for limited query, outside the bound | 📈 view plot 🚷 view threshold | 272,831.63 ops/s(+7.92%)Baseline: 252,812.22 ops/s | 211,395.24 ops/s (77.48%) |
| zql: push into limited query, inside the bound | 📈 view plot 🚷 view threshold | 120,245.19 ops/s(+1.65%)Baseline: 118,294.34 ops/s | 111,569.03 ops/s (92.78%) |
| zql: push into limited query, outside the bound | 📈 view plot 🚷 view threshold | 463,634.31 ops/s(+0.20%)Baseline: 462,714.98 ops/s | 400,166.31 ops/s (86.31%) |
| zql: push into unlimited query | 📈 view plot 🚷 view threshold | 371,597.73 ops/s(+0.37%)Baseline: 370,214.44 ops/s | 340,880.68 ops/s (91.73%) |
| zql: scan with one depth related | 📈 view plot 🚷 view threshold | 511.91 ops/s(+2.47%)Baseline: 499.58 ops/s | 416.81 ops/s (81.42%) |
| zqlite: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 46,576.17 ops/s(-1.56%)Baseline: 47,314.39 ops/s | 40,711.20 ops/s (87.41%) |
| zqlite: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 11,249.15 ops/s(-3.56%)Baseline: 11,664.17 ops/s | 10,279.58 ops/s (91.38%) |
| zqlite: (table scan) select * from album | 📈 view plot 🚷 view threshold | 1,299.94 ops/s(-6.43%)Baseline: 1,389.24 ops/s | 1,271.16 ops/s (97.79%) |
| zqlite: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 18,623.45 ops/s(-4.34%)Baseline: 19,467.97 ops/s | 15,747.44 ops/s (84.56%) |
| zqlite: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 5,457.20 ops/s(-4.77%)Baseline: 5,730.59 ops/s | 4,612.87 ops/s (84.53%) |
| zqlite: all playlists | 📈 view plot 🚷 view threshold | 1.52 ops/s(+0.64%)Baseline: 1.51 ops/s | 1.41 ops/s (92.55%) |
| zqlite: edit for limited query, inside the bound | 📈 view plot 🚷 view threshold | 127,618.84 ops/s(-0.08%)Baseline: 127,723.93 ops/s | 117,662.89 ops/s (92.20%) |
| zqlite: edit for limited query, outside the bound | 📈 view plot 🚷 view threshold | 130,496.21 ops/s(-0.30%)Baseline: 130,887.82 ops/s | 119,870.89 ops/s (91.86%) |
| zqlite: push into limited query, inside the bound | 📈 view plot 🚷 view threshold | 4,203.50 ops/s(-1.77%)Baseline: 4,279.31 ops/s | 4,116.88 ops/s (97.94%) |
| zqlite: push into limited query, outside the bound | 📈 view plot 🚷 view threshold | 156,210.36 ops/s(+2.38%)Baseline: 152,576.70 ops/s | 132,911.72 ops/s (85.09%) |
| zqlite: push into unlimited query | 📈 view plot 🚷 view threshold | 138,121.54 ops/s(+2.99%)Baseline: 134,116.62 ops/s | 122,930.62 ops/s (89.00%) |
| zqlite: scan with one depth related | 📈 view plot 🚷 view threshold | 142.92 ops/s(-13.77%)Baseline: 165.74 ops/s | 131.64 ops/s (92.11%) |
|
| Branch | darkgnotic/delay-replication-stream-takeover-for-loadbalancer-target-registration |
| Testbed | self-hosted |
Click to view all benchmark results
| Benchmark | Throughput | Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%) | Lower Boundary operations / second (ops/s) x 1e3 (Limit %) |
|---|---|---|---|
| src/client/custom.bench.ts > big schema | 📈 view plot 🚷 view threshold | 923.75 ops/s x 1e3(+6.04%)Baseline: 871.12 ops/s x 1e3 | 652.32 ops/s x 1e3 (70.62%) |
| src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 2.96 ops/s x 1e3(-0.11%)Baseline: 2.96 ops/s x 1e3 | 2.81 ops/s x 1e3 (95.09%) |
| src/client/zero.bench.ts > pk compare > pk = N | 📈 view plot 🚷 view threshold | 47.53 ops/s x 1e3(+1.64%)Baseline: 46.76 ops/s x 1e3 | 41.87 ops/s x 1e3 (88.09%) |
| src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 4.12 ops/s x 1e3(-0.80%)Baseline: 4.16 ops/s x 1e3 | 3.86 ops/s x 1e3 (93.65%) |
darkgnotic
added a commit
that referenced
this pull request
Dec 2, 2025
…er task registration (#5250) Restore the original `replication-manager` behavior of delaying the replication stream takeover to allow the task to be registered as a healthy target by the load balancer (i.e. after a minimum number of health checks). This fixes the temporary unreachability of the replication-manager when the handoff happens before the load-balancer has recognized the new replication-manager as healthy. This original functionality was simplified away with the introduction of auto-discovery (#4335), since that replaced the dns and proxying component, but never restored when proxy-based routing was reintroduced in #4584 (and is now the recommended configuration). This new implementation is more compartmentalized than the original implementation, encapsulating all of the logic in the ChangeStreamerHttpService, so that the ChangeStreamerService itself is agnostic to the details of health checks and startup delays.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restore the original
replication-managerbehavior of delaying the replication stream takeover to allow the task to be registered as a healthy target by the load balancer (i.e. after a minimum number of health checks). This fixes the temporary unreachability of the replication-manager when the handoff happens before the load-balancer has recognized the new replication-manager as healthy.This original functionality was simplified away with the introduction of auto-discovery (#4335), since that replaced the dns and proxying component, but never restored when proxy-based routing was reintroduced in #4584 (and is now the recommended configuration).
This new implementation is more compartmentalized than the original implementation, encapsulating all of the logic in the ChangeStreamerHttpService, so that the ChangeStreamerService itself is agnostic to the details of health checks and startup delays.