Partitioned Gateway Envelopes #1279

mkysel · 2025-10-27T23:29:30Z

Shard gateway envelopes and add transactional auto-partitioned inserts with SAVEPOINT retries across V2 schema and APIs

Introduce a V2 sharded gateway envelope schema with partitioned meta/blob tables and a joined view, replace legacy queries with V2 selectors, and add db.InsertGatewayEnvelopeWithChecksTransactional/db.InsertGatewayEnvelopeWithChecksStandalone to auto-create partitions and retry inserts using SAVEPOINTs; update services, workers, indexers, and tests to use V2 params, views, and a configurable publish retry sleep.

📍Where to Start

Start with the insert flow in db.InsertGatewayEnvelopeAndIncrementUnsettledUsage and the new helpers db.InsertGatewayEnvelopeWithChecksTransactional and db.InsertGatewayEnvelopeWithChecksStandalone in gateway_envelope.go, then review the V2 schema and queries in 00021_sharded_gateway_envelopes.up.sql and envelopes_v2.sql.

Changes since #1279 opened

Cleaned up test implementation by removing debug logging and ignoring unused return values [0091da3]
Migrated database schema from V2-suffixed to base names for gateway envelope storage [ce375d2]
Updated SQLC query definitions and generated code to use non-V2 database objects [ce375d2]
Replaced V2-suffixed query parameter and return types throughout the application code [ce375d2]
Updated test helpers and mock implementations to use non-V2 types and query methods [ce375d2]
Fixed field initialization in publishWorker struct creation [32cb6c9]
Updated SQL table references in test files from 'gateway_envelopes_meta_v2' to 'gateway_envelopes_meta' [93bfb0f]

📊 Macroscope summarized 93bfb0f. 14 files reviewed, 29 issues evaluated, 27 issues filtered, 0 comments posted

🗂️ Filtered Issues

pkg/api/message/publish_worker.go — 0 comments posted, 4 evaluated, 4 filtered

line 114: Context cancellation cannot stop the worker once it begins processing a batch because the inner retry loop in publishWorker.start does not observe the context. When p.ctx is canceled during publishing, publishStagedEnvelope returns false (it checks p.ctx.Err() and returns false), causing the outer loop to keep retrying indefinitely. The outer select on p.ctx.Done() is not reached until the inner loop exits, which never happens after cancellation. This results in a stuck worker that cannot terminate gracefully on shutdown. [ Out of scope ]
line 117: Potential tight CPU loop on publish retry when sleepOnFailureTime is zero. The change replaces a fixed time.Sleep(time.Second) with time.Sleep(p.sleepOnFailureTime). If sleepOnFailureTime is zero (or very small), the inner retry loop in publishWorker.start will spin with minimal or no delay upon repeated failures, causing excessive CPU usage and preventing backoff under error conditions. This is reachable because sleepOnFailureTime is supplied by callers and is not validated. [ Low confidence ]
line 141: Permanent validation failures in publishStagedEnvelope (e.g., topic parsing error, malformed payer envelope, signature recovery failure, fee calculation errors) cause the function to return false and the caller to retry indefinitely without changing any state. This leads to an infinite retry loop that permanently blocks processing of subsequent envelopes in the batch. Examples: [ Out of scope ]
line 217: Inconsistent handling of context cancellation in publishStagedEnvelope can cause the worker to never exit on cancellation. After the insert step, the function checks p.ctx.Err() and returns false (lines 217–219), which the caller interprets as a failure and retries indefinitely. Later, if the context is cancelled before or during the delete step, the function returns true (lines 229–231), signaling success and allowing progress. [ Out of scope ]

pkg/api/message/service.go — 0 comments posted, 4 evaluated, 4 filtered

line 95: NewReplicationAPIService starts the publish worker goroutine before attempting to start the subscribe worker, but if startSubscribeWorker fails, the function returns an error without stopping/cleaning up the already-started publish worker. This leaks the goroutine and any associated subscription resources, leaving background work running with no owner and potentially causing further side effects. To fix, ensure that on any subsequent failure after starting the publish worker, you stop/cancel the publish worker (and any resources it acquired) before returning. [ Previously rejected ]
line 362: Supplying both topics and originator_node_ids in message_api.EnvelopesQuery now silently ignores originator_node_ids whenever topics is non-empty. The new logic in Service.fetchEnvelopes prioritizes the topics branch (if len(query.GetTopics()) != 0 { ... return rows, nil }) and returns early, never applying the originator filter. Previously, a single combined SelectGatewayEnvelopes call accepted both filters. This is a contract change: callers that expect both filters to apply will receive envelopes filtered only by topics, which can lead to incorrect results. [ Already posted ]
line 372: Possible nil database handle: queries.New(s.store) is called with s.store across all branches of fetchEnvelopes. If s.store is nil at runtime, the resulting Queries will have a nil db and calling QueryContext inside the query methods will panic. There is no guard in fetchEnvelopes ensuring s.store is non-nil. [ Low confidence ]
line 387: Unsigned-to-signed cast for originator IDs in fetchEnvelopes: uint32 values from EnvelopesQuery.GetOriginatorNodeIds() are converted to int32 and stored in params.OriginatorNodeIds. If any originator node ID exceeds math.MaxInt32, this will wrap to a negative number and cause incorrect filtering in SelectGatewayEnvelopesByOriginators. [ Low confidence ]

pkg/db/gateway_envelope.go — 0 comments posted, 1 evaluated, 1 filtered

line 66: Concurrent use of a single SQL transaction (sql.Tx) from multiple goroutines inside InsertGatewayEnvelopeAndIncrementUnsettledUsage is unsafe and can cause runtime errors or deadlocks. The function launches two goroutines that both call txQueries methods (IncrementUnsettledUsage and IncrementOriginatorCongestion) within the same transaction context. Per Go's database/sql contract, a sql.Tx is not safe for concurrent use across goroutines. This can lead to driver-level errors like "driver: bad connection", serialization failures, or blocked execution due to contention on the single pinned connection. [ Out of scope ]

pkg/db/types.go — 0 comments posted, 2 evaluated, 2 filtered

line 29: Potential integer overflow/truncation when converting uint32 node IDs and uint64 sequence IDs from the cursor to signed types used in SQL params. In SetVectorClockByTopics, SetVectorClockByOriginators, and SetVectorClockUnfiltered, nodeID is cast from uint32 to int32 and sequenceID from uint64 to int64. Similarly, in fetchEnvelopes, originator IDs are cast from uint32 to int32. If any nodeID > math.MaxInt32 or sequenceID > math.MaxInt64, these casts will wrap to negative or truncated values, causing incorrect filtering or vector clock behavior in queries. [ Previously rejected ]
line 55: Unsigned-to-signed cast for sequence IDs in vector clock setters: uint64 sequenceID values from the cursor are cast to int64 in SetVectorClockByTopics (lines 29–31), SetVectorClockByOriginators (lines 42–44), and SetVectorClockUnfiltered (lines 55–56). If a sequence ID exceeds math.MaxInt64, it will be truncated to a negative int64, corrupting the vector clock used in queries. [ Low confidence ]

pkg/indexer/app_chain/contracts/group_message_storer.go — 0 comments posted, 1 evaluated, 1 filtered

line 72: StoreLog only validates that the client envelope payload type matches the topic kind via clientEnvelope.TopicMatchesPayload(), but it never verifies that the client envelope’s target topic identifier (the bytes after the kind) matches the on-chain GroupId from the MessageSent event. The code constructs topicStruct from msgSent.GroupId[:] and later stores to that topic, regardless of what topic identifier the client envelope carries. This can lead to storing an envelope under a topic derived from the event even if the envelope’s own target topic identifier differs. To preserve integrity, also check that clientEnvelope.TargetTopic().Bytes() (or identifier) matches topicStruct.Bytes()/msgSent.GroupId before storing; otherwise, reject the log. [ Low confidence ]

pkg/indexer/app_chain/contracts/identity_update_storer.go — 0 comments posted, 4 evaluated, 4 filtered

line 111: Misclassification of transient database errors as non-recoverable in StoreLog: errors from querier.GetLatestSequenceId are wrapped with re.NewNonRecoverableError(ErrGetLatestSequenceID, err) (lines 106–112). If the error is a transient database issue, returning a non-recoverable error will prevent retry and may lead to dropped events. Consider classifying database operation errors as recoverable (or propagate raw errors to be wrapped as recoverable at the outer level), consistent with other DB operations in this function. [ Out of scope ]
line 144: Misclassification of validation errors as non-recoverable may erroneously mark transient DB errors as non-retryable: StoreLog wraps all errors from validateIdentityUpdate with re.NewNonRecoverableError(ErrValidateIdentityUpdate, err) (lines 136–145). validateIdentityUpdate performs a DB query (SelectGatewayEnvelopesByTopics) and may return errors due to transient database issues. Treating these as non-recoverable will prevent retry, potentially dropping events. Consider distinguishing between validation failures (non-recoverable) and underlying IO/DB errors (recoverable). [ Out of scope ]
line 149: Potential nil pointer dereference: the code accesses associationState.StateDiff.NewMembers and associationState.StateDiff.RemovedMembers without checking whether associationState or associationState.StateDiff are non-nil. Since mlsvalidate.AssociationStateResult.StateDiff is a pointer, it can be nil (e.g., if there are no changes or the validator returns a state without a diff). Accessing a field on a nil pointer will panic. Add a guard such as if associationState == nil || associationState.StateDiff == nil { ... } before dereferencing. [ Out of scope ]
line 290: Defensive validation gap: validateIdentityUpdate passes identityUpdate.IdentityUpdate to MLSValidationService.GetAssociationStateFromEnvelopes without checking for nil. While the outer type assertion ensures the payload is an identity-update wrapper, the inner IdentityUpdate pointer can still be nil in protobuf-generated types. Passing nil may cause downstream logic to panic or misbehave if the service implementation assumes a non-nil value. Add a check like if identityUpdate.IdentityUpdate == nil { return nil, fmt.Errorf("identity update is nil") }. [ Out of scope ]

pkg/migrator/writer.go — 0 comments posted, 2 evaluated, 2 filtered

line 60: Silent integer overflow risk when casting env.OriginatorSequenceID() from uint64 to int64 for multiple DB parameters. OriginatorSequenceID is derived from a uint64 (originator sequence), but the code passes it to the database as int64 for InsertGatewayEnvelopeParams.OriginatorSequenceID (line 60) and again for IncrementUnsettledUsage.SequenceID (line 81) and UpdateMigrationProgress.LastMigratedID (line 89). If the sequence ID exceeds math.MaxInt64, the conversion will wrap to a negative number silently, resulting in incorrect keys/progress and potential data corruption or failed lookups. [ Low confidence ]
line 65: Silent integer overflow risk when casting expiry time from uint64 to int64. Expiry is built from env.UnsignedOriginatorEnvelope.Proto().GetExpiryUnixtime() (returns uint64) and cast to int64 (lines 65–67). If expiry exceeds math.MaxInt64, conversion will wrap negative, producing invalid expiry timestamps in the database. [ Previously rejected ]

pkg/mlsvalidate/service.go — 0 comments posted, 1 evaluated, 1 filtered

line 110: Potential nil newUpdate passed to GetAssociationState, yielding a gRPC request with a nil element in NewUpdates. In GetAssociationStateFromEnvelopes, newUpdate is forwarded without a nil check (line 110). Callers like IdentityUpdateStorer.validateIdentityUpdate do not verify identityUpdate.IdentityUpdate != nil, so a nil can be passed under realistic conditions if the client envelope has the oneof wrapper set but inner IdentityUpdate is nil. This may cause marshalling errors or runtime panics in gRPC/protobuf when serializing a request containing a nil message. [ Low confidence ]

pkg/server/server.go — 0 comments posted, 1 evaluated, 1 filtered

line 388: In startAPIServer, serviceRegistrationFunc creates and starts a CursorUpdater via metadata.NewCursorUpdater before constructing replicationService. If NewReplicationAPIService returns an error, the function returns that error without stopping the CursorUpdater. This leaks the cursor updater goroutine and its resources. Any subsequent failure in the registration function should clean up already-started background components (e.g., call CursorUpdater.Stop() or cancel its context) before returning. [ Previously rejected ]

pkg/sync/envelope_sink.go — 0 comments posted, 5 evaluated, 4 filtered

line 128: originatorID := int32(env.OriginatorNodeID()) may overflow if the originator node ID exceeds math.MaxInt32. In that case, the resulting originatorID becomes negative, which will be propagated to unsettled usage/congestion accounting and persisted to the database. Add a bounds check (reject IDs > MaxInt32, or change types to int64/uint32 through to storage). [ Low confidence ]
line 141: storeEnvelope converts expiry from uint64 (GetExpiryUnixtime()) to int64 with int64(expiry) and writes it to the database via queries.InsertGatewayEnvelopeParams{ Expiry: int64(expiry) }. If expiry exceeds math.MaxInt64, the conversion silently wraps to a negative int64. This can corrupt stored expiry and lead to incorrect behavior downstream. Add an upper-bound check (e.g., if expiry > math.MaxInt64 then clamp/reject) and decide policy for zero/negative values. [ Previously rejected ]
line 141: Behavior change: previously storeEnvelope only persisted an expiry when expiry > 0 (writing a SQL NULL otherwise). The new code always persists an Expiry value, including 0. This changes the external contract/semantics from “no expiry stored” (NULL) to “expiry = 0”, which many schemas or queries treat differently. If consumers differentiate NULL vs 0, this can cause incorrect behavior. To preserve parity, either keep NULL semantics for non-positive expiry or explicitly migrate downstream logic to treat 0 equivalently and document the change. [ Low confidence ]
line 146: MinutesSinceEpoch returns an int32, and storeEnvelope passes it through as MinutesSinceEpoch: utils.MinutesSinceEpoch(originatorTime). If originatorTime is far in the future (or far past), the minute count can overflow int32, truncating to an incorrect value. Since OriginatorNs comes from the envelope, a malformed/malicious envelope could trigger this. Add bounds checks or clamp to a safe range, and consider rejecting envelopes with unreasonable timestamps. [ Previously rejected ]

pkg/testutils/store.go — 0 comments posted, 4 evaluated, 3 filtered

line 53: Unsafe SQL string concatenation for database identifier. The code builds SQL statements using "CREATE DATABASE " + dbName and "DROP DATABASE " + dbName without quoting or validating dbName. If dbName contains characters that are not valid in unquoted PostgreSQL identifiers (e.g., hyphen, space) or contains SQL metacharacters, this can cause runtime SQL errors or even SQL injection in tests. At minimum, the name should be validated to match allowed identifier characters or wrapped with proper identifier quoting (e.g., using pgx.Identifier.Sanitize/pgx.Identifier or a helper to quote identifiers). This affects both the create and drop statements. [ Low confidence ]
line 57: Using raw string concatenation for DROP DATABASE cleanup also lacks IF EXISTS, so if the database was never created or was already dropped (e.g., partial failures or external interference), the cleanup will error and, due to require.NoError in the cleanup, abort remaining cleanups. Consider using DROP DATABASE IF EXISTS to make cleanup idempotent and robust. [ Low confidence ]
line 76: In NewDBs, an empty slice is created with zero capacity and appended to in a loop. While not a functional bug, for large count this can cause avoidable reallocations. Preallocating with make([]*sql.DB, 0, count) would prevent repeated allocations. Note: this is a performance micro-optimization and does not affect correctness. [ Code style ]

graphite-app · 2025-10-27T23:29:38Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

Queue - adds this PR to the back of the merge queue
Hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

pkg/api/message/service.go

pkg/migrations/00023_shard_management.up.sql

mkysel · 2025-11-04T00:21:15Z

Keep the Hot Path Fast by Handling Unusual Partition Errors Separately

In normal operation, all necessary partitions already exist, and the vast majority of inserts succeed immediately. Handling the “missing partition” case as a rare exception instead of building defensive partition-creation logic into every insert keeps the hot path as lean as possible:

Avoids extra round-trips and locks.
If we were to check for or create partitions preemptively before every insert, each write would need to touch metadata tables and possibly acquire DDL locks. That adds latency and contention to every single insert, even though missing partitions almost never happen in steady state.

Optimizes for the common case.
The “no partition of relation …” error occurs only when a new node or sequence band is first seen. By treating it as an exceptional path, we let the normal insert remain a single SQL call — the fastest possible path for the 99.99 % case.

Isolates the slow, rare logic.
When the rare partition error does occur, we handle it immediately and locally:

detect the specific “no partition” message,
call EnsureGatewayParts in the same connection,
and retry once.
This ensures forward progress without polluting the fast path with conditional checks or metadata lookups.

Improves scalability under load.
Systems ingesting millions of envelopes per node benefit from minimizing per-insert overhead. Every microsecond saved on the hot path compounds into tangible throughput gains, while the cold path for new partitions happens rarely enough to be negligible.

Maintains correctness and safety.
The retry path is still fully deterministic and idempotent: the partition is created exactly once, then subsequent inserts go straight through.

mkysel · 2025-11-04T16:12:17Z

The is no migration path. This assumes a DB wipe. Our testnet-dev DBs are unreadable and we would never be able to migrate them anyway.

pkg/api/message/newest_envelope_test.go

pkg/migrations/00021_sharded_gateway_envelopes.up.sql

neekolas · 2025-11-04T20:07:27Z

pkg/db/gateway_envelope.go

+// This function runs inside a managed transaction created by RunInTxWithResult().
+//
+// Steps:
+//  1. Calls InsertGatewayEnvelopeWithChecksTransactional() to insert the envelope,


I thought we were trying to avoid mixing the DDL operations with DML workflows?

This at least handles the flow quite nicely, only paying a performance penalty when the partitions are missing and handling rollbacks nicely. But still, it has a bit of an ick to it.

Makes performance harder to reason about (some inserts take much longer than others)

Scatters any errors in this flow across the logs of normal writes (maybe we can help address that by emitting a specific metric on these failures)

The alternative is to have some worker pre-creating the partitions, which has its own problems and complexities. So IDK

yeah. Perfect intuition here.

I dont like the background worker option. The worker then has to listen to the registry. It also has to run frequenty enough to pre-fill it. And its a nightmare for tests, since some of them create random originators. And there are special originators such as 10-13 which are not even in the registry and you have to remember they exist.

the error flow is not so bad. If it fails with "missing partition" it will create the partition and retry, without showing any errors or printing any logs.

We could of course at least print the fact that we indeed did create a new partition for nodeid/seq-range.

If the DDL fails, then we might see the error in rather unexpected places. But the DDL is super simple, with IF NOT EXISTS so it should be pretty safe.

I'm alright with saying this is the least-bad option. Agree the worker would have its own ick. At least this should be consistent and reliable.

pkg/api/message/publish_worker.go

neekolas

This looks great. Let's start wiping things

mkysel added 3 commits October 27, 2025 13:56

migrated

37e0487

file

15a2b2a

remove unused table

bd24351

macroscopeapp bot reviewed Oct 27, 2025

View reviewed changes

pkg/api/message/service.go Show resolved Hide resolved

mkysel commented Oct 27, 2025

View reviewed changes

pkg/migrations/00023_shard_management.up.sql Outdated Show resolved Hide resolved

mkysel added 4 commits November 3, 2025 15:31

Merge branch 'main' into mkysel/sharded

815edc6

good state

e17105a

good state

2e103f6

better

b4f5c04

mkysel added 2 commits November 3, 2025 19:21

tests

d8a9cdd

cleanup

8919147

mkysel marked this pull request as ready for review November 4, 2025 01:25

mkysel requested a review from a team as a code owner November 4, 2025 01:25

mkysel added 2 commits November 3, 2025 20:53

rollbacks

9566d7c

golangci

deab7d2

mkysel changed the title ~~Sharded Gateway Envelopes~~ Partitioned Gateway Envelopes Nov 4, 2025

neekolas reviewed Nov 4, 2025

View reviewed changes

pkg/api/message/newest_envelope_test.go Outdated Show resolved Hide resolved

neekolas reviewed Nov 4, 2025

View reviewed changes

pkg/migrations/00021_sharded_gateway_envelopes.up.sql Outdated Show resolved Hide resolved

no need for this diff

0091da3

neekolas reviewed Nov 4, 2025

View reviewed changes

no V2 names

ce375d2

macroscopeapp bot reviewed Nov 4, 2025

View reviewed changes

pkg/api/message/publish_worker.go Show resolved Hide resolved

mkysel added 2 commits November 4, 2025 15:18

thannks bot

32cb6c9

fix migrator tests

93bfb0f

mkysel requested a review from neekolas November 4, 2025 21:08

neekolas approved these changes Nov 4, 2025

View reviewed changes

mkysel merged commit a320a21 into main Nov 4, 2025
12 checks passed

mkysel deleted the mkysel/sharded branch November 4, 2025 22:13

This was linked to issues Nov 5, 2025

Error querying for DB subscription: ERROR: canceling statement due to statement timeout (SQLSTATE 57014) #1213

Closed

Enhance SelectGatewayEnvelopes performance #1162

Closed

Partitioned Gateway Envelopes #1279

Partitioned Gateway Envelopes #1279

Uh oh!

Conversation

mkysel commented Oct 27, 2025 • edited by macroscopeapp bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Shard gateway envelopes and add transactional auto-partitioned inserts with SAVEPOINT retries across V2 schema and APIs

📍Where to Start

Changes since #1279 opened

🗂️ Filtered Issues

Uh oh!

graphite-app bot commented Oct 27, 2025

How to use the Graphite Merge Queue

Uh oh!

Uh oh!

Uh oh!

mkysel commented Nov 4, 2025

Uh oh!

mkysel commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

neekolas Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

mkysel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

mkysel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

neekolas Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

mkysel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neekolas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkysel commented Oct 27, 2025 •

edited by macroscopeapp bot

Loading