feat(zero-cache)!: automatic replication-manager discovery / routing#4335
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
| Branch | darkgnotic/rep-mgr-discovery |
| Testbed | Linux |
Click to view all benchmark results
| Benchmark | File Size | Benchmark Result kilobytes (KB) (Result Δ%) | Upper Boundary kilobytes (KB) (Limit %) |
|---|---|---|---|
| zero-package.tgz | 📈 view plot 🚷 view threshold | 1,158.75 KB(+0.09%)Baseline: 1,157.73 KB | 1,180.88 KB (98.13%) |
| zero.js | 📈 view plot 🚷 view threshold | 194.79 KB(0.00%)Baseline: 194.79 KB | 198.68 KB (98.04%) |
| zero.js.br | 📈 view plot 🚷 view threshold | 54.60 KB(0.00%)Baseline: 54.60 KB | 55.69 KB (98.04%) |
|
| Branch | darkgnotic/rep-mgr-discovery |
| Testbed | Linux |
Click to view all benchmark results
| Benchmark | Throughput | Benchmark Result operations / second (ops/s) (Result Δ%) | Lower Boundary operations / second (ops/s) (Limit %) |
|---|---|---|---|
| src/client/custom.bench.ts > big schema | 📈 view plot 🚷 view threshold | 339,168.11 ops/s(+345.28%)Baseline: 76,169.69 ops/s | -110,558.45 ops/s (-32.60%) |
| src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 1,582.73 ops/s(+325.08%)Baseline: 372.34 ops/s | -443.29 ops/s (-28.01%) |
| src/client/zero.bench.ts > pk compare > pk = N | 📈 view plot 🚷 view threshold | 31,774.00 ops/s(+198.54%)Baseline: 10,643.00 ops/s | -3,271.53 ops/s (-10.30%) |
| src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 2,495.00 ops/s(+339.48%)Baseline: 567.72 ops/s | -727.57 ops/s (-29.16%) |
arv
reviewed
May 15, 2025
| } | ||
|
|
||
| export function getPreferredIp( | ||
| interfaces: NodeJS.Dict<NetworkInterfaceInfo[]>, |
Contributor
There was a problem hiding this comment.
NodeJS.Dict seems a bit odd to use here but 🤷🏼
Comment on lines
+78
to
+81
| // Check if start() was already called. | ||
| if (this.#fastify.addresses().length === 0) { | ||
| await this.start(); | ||
| } |
Contributor
There was a problem hiding this comment.
If start was already called, this will stop? Is that the intended behavior?
| ): Promise<string | null> { | ||
| const result = await sql<{ownerAddress: string | null}[]>/*sql*/ ` | ||
| SELECT "ownerAddress" FROM ${sql(cdcSchema(shard))}."replicationState"`; | ||
| return result[0].ownerAddress; |
Contributor
There was a problem hiding this comment.
At some point it would make sense to start using values() more.
| await db` | ||
| UPDATE ${db(schema)}."replicationConfig" | ||
| await sql` | ||
| UPDATE ${sql(schema)}."replicationConfig" |
Contributor
There was a problem hiding this comment.
OOC, how do you get VSCode to syntax highlight these?
Contributor
Author
There was a problem hiding this comment.
Sorry I missed this question!
I'm using this (on Matt's recommendation):
tjenkinson
added a commit
to tjenkinson/mono
that referenced
this pull request
Jun 30, 2025
Adds a `changeStreamer.protocol` (`ZERO_CHANGE_STREAMER_PROTOCOL`) option, which can be set to `https` Before rocicorp#4335 we were able to use a https url.
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Dec 2, 2025
…er task registration (#5250) Restore the original `replication-manager` behavior of delaying the replication stream takeover to allow the task to be registered as a healthy target by the load balancer (i.e. after a minimum number of health checks). This fixes the temporary unreachability of the replication-manager when the handoff happens before the load-balancer has recognized the new replication-manager as healthy. This original functionality was simplified away with the introduction of auto-discovery (#4335), since that replaced the dns and proxying component, but never restored when proxy-based routing was reintroduced in #4584 (and is now the recommended configuration). This new implementation is more compartmentalized than the original implementation, encapsulating all of the logic in the ChangeStreamerHttpService, so that the ChangeStreamerService itself is agnostic to the details of health checks and startup delays.
darkgnotic
added a commit
that referenced
this pull request
Dec 2, 2025
…er task registration (#5250) Restore the original `replication-manager` behavior of delaying the replication stream takeover to allow the task to be registered as a healthy target by the load balancer (i.e. after a minimum number of health checks). This fixes the temporary unreachability of the replication-manager when the handoff happens before the load-balancer has recognized the new replication-manager as healthy. This original functionality was simplified away with the introduction of auto-discovery (#4335), since that replaced the dns and proxying component, but never restored when proxy-based routing was reintroduced in #4584 (and is now the recommended configuration). This new implementation is more compartmentalized than the original implementation, encapsulating all of the logic in the ChangeStreamerHttpService, so that the ChangeStreamerService itself is agnostic to the details of health checks and startup delays.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fargate / most multi-node configurations (e.g. host / awsvpc networking)
To update a multi-node configuration without disruption:
replication-managerview-syncer, replacing the:ZERO_CHANGE_STREAMER_URI=http://{host}option withZERO_CHANGE_STREAMER_MODE=discoverview-syncers are rollback-safe, remove the internal load balancer that was previously used forview-syncertoreplication-managerroutingSingle-node configurations
Single-node configurations are unaffected
Uncommon multi-node configurations
For container setups in which the process does not have access to the externally visible ip address or port (e.g. using docker a.k.a. "bridge" mode networking), an external routing or proxying mechanism is still needed. In such configurations, add the
ZERO_CHANGE_STREAMER_ADDRESS={host}option to thereplication-manager, where{host}is the hostname that was formerly part ofZERO_CHANGE_STREAMER_URI, e.g.Before:
view-syncer:ZERO_CHANGE_STREAMER_URI=http://internal-prod-repmgr-125468.us-east-1.elb.amazonaws.comAfter:
view-syncer:ZERO_CHANGE_STREAMER_MODE=discoverreplication-manager:ZERO_CHANGE_STREAMER_ADDRESS=internal-prod-repmgr-125468.us-east-1.elb.amazonaws.comNote: For configurations that continue to use an explicit load-balancing mechanism, the
replication-managerhealth check should be configured to on the/keepalivepath, and not the root/path.Feature
The discovery and routing of the
replication-manageris now facilitated by the Postgres Change DB, using the same row-level locking mechanism used to enforce single-writer access to the change log.This obviates the need for an external addressing or proxying mechanism such as Service Discovery, Service Connect, or an Internal Load Balancer.