Skip to content

Conversation

@fbac
Copy link
Collaborator

@fbac fbac commented Oct 23, 2025

Standardize payer report logging and route workers and store to utils.PayerReportMainLoggerName and utils.PayerReportStoreLoggerName in server.go and store.go

This PR standardizes logging keys and message casing across payer report and blockchain admin components, and wires payer report workers and store to named sub-loggers.

  • Route payer report workers to utils.PayerReportMainLoggerName and the store to utils.PayerReportStoreLoggerName in server.NewReplicationServer and payerreport.NewStore (server.go, store.go)

  • Unify admin log field keys and "no update needed" messaging in settlement/app chain admins (settlement_chain_admin.go, app_chain_admin.go)

  • Convert multiple logs to lowercase phrasing and standardized field names across indexer, gateway examples, stress tools, and watchers

  • Add a structured zap error for semver parsing in replication main (main.go) and a deferred debug duration log in blockchain.WaitForTransaction (client.go)

📍Where to Start

Start with logger wiring in server.NewReplicationServer in server.go, then review the store logger change in payerreport.NewStore in store.go.


📊 Macroscope summarized 0948726. 20 files reviewed, 48 issues evaluated, 44 issues filtered, 0 comments posted

🗂️ Filtered Issues

cmd/xmtpd-cli/commands/node_registry.go — 0 comments posted, 2 evaluated, 2 filtered
  • line 341: The handler uses context.Background() for all downstream blockchain operations (setupNodeRegistryAdmin, setupNodeRegistryCaller, and the subsequent calls). In a CLI command, this prevents cancellation (e.g., on SIGINT) and can lead to indefinite hangs if the RPC calls block or the network is slow/unresponsive. The Cobra command provides a context via RunE that should be used to allow cancellation propagation. Consider threading a cancellable context from the command (cmd.Context() or a context with a timeout) into the handler and subsequent calls. [ Low confidence ]
  • line 351: The handler logs "set new max canonical size" unconditionally after a successful call to admin.SetMaxCanonical, but SetMaxCanonical may return nil in the "no update needed" case (it internally checks err.IsNoChange() and returns nil). In that case, no change is applied, yet the handler will still log that a new value was set, producing misleading output. Consider either having SetMaxCanonical return an explicit changed/no-change indicator, or suppressing the "set new" log unless the update is actually applied (e.g., by relying on an event or return value). [ Low confidence ]
pkg/api/message/subscribe_worker.go — 0 comments posted, 3 evaluated, 3 filtered
  • line 49: Global listener determination occurs before topic validation, using raw query.Topics and query.OriginatorNodeIds lengths. If query.Topics is non-empty but all entries are invalid (and originators empty), l.isGlobal is left as false, preventing the listener from being treated as global even though it has no valid keys. This mismatches expected semantics and contributes to the unregistered listener condition. [ Out of scope ]
  • line 49: newListener can return a listener that is neither global nor keyed, resulting in a listener that is never registered anywhere and whose channel is never closed. This happens when query.Topics contains one or more entries but all are invalid per topic.ParseTopic, and query.OriginatorNodeIds is empty. In this case l.isGlobal remains false (because it is decided based on raw input lengths before validation), l.topics stays empty (all invalid topics were skipped), and l.originators stays empty. The caller (listen) then returns the created channel without registering the listener, resulting in a leaked, never-closed channel and no delivery. The root cause is the global determination at the top of newListener based on unvalidated inputs and lack of fallback when validation yields no keys. [ Out of scope ]
  • line 64: newListener allows both l.topics and l.originators to be populated when the query supplies both filters. However, downstream listen only registers the listener on one dimension, preferring topics over originators (else if len(l.topics) > 0 { ... } else if len(l.originators) > 0 { ... }). This silently ignores the originators filter when topics are present, creating a contract asymmetry between the input query and the actual registration and potentially unexpected routing semantics. [ Out of scope ]
pkg/authn/claims.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 29: Possible nil pointer dereference on logger in NewClaimValidator. The function calls logger.Debug(...) (lines 29-32) without checking whether logger is nil. Since NewRegistryVerifier passes a *zap.Logger provided by callers and there is no guard ensuring non-nil, a nil logger would cause a runtime panic at the Debug call. Add a nil-check or ensure non-nil logger is enforced by callers before invoking NewClaimValidator. [ Out of scope ]
pkg/blockchain/app_chain_admin.go — 0 comments posted, 16 evaluated, 16 filtered
  • line 324: Passing a potentially nil a.client to ExecuteTransaction can lead to a nil pointer dereference inside ExecuteTransaction when it calls client.BalanceAt. This method does not validate that a.client is non-nil before passing it. [ Low confidence ]
  • line 326: Possible nil pointer dereference of a.groupMessageBroadcaster when invoking UpdateMaxPayloadSize and ParseMaxPayloadSizeUpdated. The method does not validate that a.groupMessageBroadcaster is non-nil before use. If appChainAdmin is constructed with a nil groupMessageBroadcaster, calling a.groupMessageBroadcaster.UpdateMaxPayloadSize(opts) or a.groupMessageBroadcaster.ParseMaxPayloadSizeUpdated(*log) will panic. [ Low confidence ]
  • line 329: Potential panic by dereferencing log in the event parser closures: ParseMaxPayloadSizeUpdated(*log) / ParseMinPayloadSizeUpdated(*log) dereference the *types.Log pointer without nil check. If any entry in receipt.Logs is nil, this will panic. While go-ethereum typically provides non-nil log pointers, the code does not defensively guard against nil. [ Code style ]
  • line 334: Possible nil pointer dereference of a.logger when calling a.logger.Error or a.logger.Info. The method does not validate a.logger before use; a nil logger will cause a panic. [ Low confidence ]
  • line 362: Passing a potentially nil a.client to ExecuteTransaction can cause a nil pointer dereference inside ExecuteTransaction. No local validation is present. [ Low confidence ]
  • line 364: Possible nil pointer dereference of a.groupMessageBroadcaster in UpdateGroupMessageMinPayloadSize when invoking UpdateMinPayloadSize and ParseMinPayloadSizeUpdated. No nil check is performed before use. [ Low confidence ]
  • line 367: Potential panic by dereferencing log in the event parser closure: ParseMinPayloadSizeUpdated(*log) without nil check. If receipt.Logs contains a nil entry, this will panic. While go-ethereum typically provides non-nil log pointers, the code does not defensively guard against nil. [ Code style ]
  • line 372: Possible nil pointer dereference of a.logger in UpdateGroupMessageMinPayloadSize when calling a.logger.Error or a.logger.Info. No guard exists. [ Low confidence ]
  • line 400: Passing a potentially nil a.client to ExecuteTransaction can cause a nil pointer dereference inside ExecuteTransaction. No guard exists here. [ Low confidence ]
  • line 402: Possible nil pointer dereference of a.identityUpdateBroadcaster when invoking UpdateMaxPayloadSize and ParseMaxPayloadSizeUpdated. No nil check is performed. [ Low confidence ]
  • line 405: Potential panic by dereferencing log in the event parser closure: ParseMaxPayloadSizeUpdated(*log) without nil check in UpdateIdentityUpdateMaxPayloadSize. If a nil entry exists in receipt.Logs, this will panic. [ Code style ]
  • line 410: Possible nil pointer dereference of a.logger when logging in UpdateIdentityUpdateMaxPayloadSize. No nil guard exists. [ Low confidence ]
  • line 438: Passing a potentially nil a.client to ExecuteTransaction can cause a nil pointer dereference inside ExecuteTransaction. No guard in this method. [ Low confidence ]
  • line 440: Possible nil pointer dereference of a.identityUpdateBroadcaster when invoking UpdateMinPayloadSize and ParseMinPayloadSizeUpdated. No nil guard is present. [ Low confidence ]
  • line 443: Potential panic by dereferencing log in the event parser closure: ParseMinPayloadSizeUpdated(*log) without nil check in UpdateIdentityUpdateMinPayloadSize. If a nil entry exists in receipt.Logs, this will panic. [ Code style ]
  • line 448: Possible nil pointer dereference of a.logger when logging within UpdateIdentityUpdateMinPayloadSize. No nil guard exists. [ Low confidence ]
pkg/blockchain/client.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 207: In WaitForTransaction, any error returned by client.TransactionReceipt that is not ethereum.NotFound is collapsed into ErrTxFailed (return nil, ErrTxFailed). This misclassifies transport, provider, or context-related errors (e.g., RPC connectivity issues, rate limits, or context.DeadlineExceeded) as transaction failures. Upstream callers (e.g., ExecuteTransaction) treat ErrTxFailed specially by attempting to trace revert reasons, which can be incorrect and wasteful when the transaction did not actually fail but the call itself did. It also discards the original error detail, hampering diagnosis. [ Low confidence ]
pkg/blockchain/settlement_chain_admin.go — 0 comments posted, 7 evaluated, 7 filtered
  • line 291: None of the update methods validate that s.client and s.logger are non-nil before passing them to ExecuteTransaction. Inside ExecuteTransaction, client.BalanceAt and logger.Debug/Info are invoked unconditionally. If s.client or s.logger are nil, this will cause a panic. The risk exists at each call site where s.client and s.logger are passed: UpdateDistributionManagerProtocolFeesRecipient (line 291), UpdatePayerRegistryMinimumDeposit (line 330), UpdatePayerRegistryWithdrawLockPeriod (line 368), UpdatePayerReportManagerProtocolFeeRate (line 406), and UpdateNodeRegistryAdmin (line 442). [ Out of scope ]
  • line 293: UpdateDistributionManagerProtocolFeesRecipient does not validate that s.distributionManager is non-nil before dereferencing it in both the transaction function (s.distributionManager.UpdateProtocolFeesRecipient) and the event parser (s.distributionManager.ParseProtocolFeesRecipientUpdated). If s.distributionManager is nil, these calls will panic at runtime. [ Out of scope ]
  • line 311: In UpdateDistributionManagerProtocolFeesRecipient, the handling of ErrNoChange is inconsistent with the other update methods. When ExecuteTransaction returns a BlockchainError that IsNoChange(), the method logs the "no update needed" message but still returns the error. This causes callers (e.g., CLI handlers) to treat a no-op as a failure, unlike the other methods which return nil on no-change. This inconsistency can produce meaningfully incorrect behavior and user messaging parity issues across admin operations. [ Out of scope ]
  • line 332: UpdatePayerRegistryMinimumDeposit does not validate that s.payerRegistry is non-nil before dereferencing it in s.payerRegistry.UpdateMinimumDeposit and s.payerRegistry.ParseMinimumDepositUpdated. If s.payerRegistry is nil, these calls will panic. [ Out of scope ]
  • line 370: UpdatePayerRegistryWithdrawLockPeriod does not validate that s.payerRegistry is non-nil before dereferencing it in s.payerRegistry.UpdateWithdrawLockPeriod and s.payerRegistry.ParseWithdrawLockPeriodUpdated. If s.payerRegistry is nil, these calls will panic. [ Out of scope ]
  • line 408: UpdatePayerReportManagerProtocolFeeRate does not validate that s.payerReportManager is non-nil before dereferencing it in s.payerReportManager.UpdateProtocolFeeRate and s.payerReportManager.ParseProtocolFeeRateUpdated. If s.payerReportManager is nil, these calls will panic. [ Out of scope ]
  • line 444: UpdateNodeRegistryAdmin does not validate that s.nodeRegistry is non-nil before dereferencing it in s.nodeRegistry.UpdateAdmin and s.nodeRegistry.ParseAdminUpdated. If s.nodeRegistry is nil, these calls will panic. [ Out of scope ]
pkg/gateway/examples/jwt/main.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 81: jwtIdentityFn is invoked in main with publicKey defined as []byte, and the identity function’s Keyfunc returns that []byte to the JWT parser while requiring an ECDSA signing method. For ECDSA, github.com/golang-jwt/jwt/v5 expects a key type compatible with *ecdsa.PublicKey, not a raw []byte. Returning a []byte will cause jwt.ParseWithClaims to fail verification for all tokens, making authentication always fail at runtime. This manifests from the use at WithIdentityFn(jwtIdentityFn(publicKey)) with a []byte public key. [ Out of scope ]
pkg/indexer/common/log_handler.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 59: In IndexLogs, when event.Removed is true (indicating a reorged/removed log), the code invokes contract.HandleLog(ctx, event) but then continues to the normal storage path and may attempt to store the removed log via contract.StoreLog. This conflates reorg handling and normal storage, risking incorrect persistence of removed logs or double-application of effects. A continue after handling removed events is likely required to avoid storing reorged logs. [ Low confidence ]
pkg/indexer/settlement_chain/contracts/payer_registry_storer.go — 0 comments posted, 3 evaluated, 3 filtered
  • line 109: parsedEvent.Amount.Int64() silently truncates/overflows when the on-chain amount exceeds the range of int64, and then currency.FromMicrodollars multiplies by 1e6, which can further overflow int64. This leads to incorrect or negative amount values being recorded in the ledger or classified as invalid. On-chain event fields like Amount are typically uint256 and can be greater than math.MaxInt64, making this path reachable. The conversion at amount := currency.FromMicrodollars(currency.MicroDollar(parsedEvent.Amount.Int64())) must validate range and convert using big integers to avoid data loss and overflow. [ Out of scope ]
  • line 151: parsedEvent.Amount.Int64() in handleWithdrawalRequested can silently truncate/overflow for on-chain uint256 amounts larger than int64, and the subsequent multiplication by 1e6 in FromMicrodollars can overflow int64. This results in incorrect ledger withdrawal amounts or misclassification as invalid events. Use safe big integer to picodollar conversion with range checks before narrowing to int64. [ Out of scope ]
  • line 193: parsedEvent.Amount.Int64() in handleUsageSettled risks silent truncation/overflow when the on-chain Amount exceeds int64. The subsequent FromMicrodollars multiplication by 1e6 can also overflow int64. This can corrupt settlement amounts in the ledger or incorrectly flag events as invalid. Implement safe conversion from *big.Int microdollars to picodollars with bounds checking, and reject or cap values exceeding int64 capacity. [ Out of scope ]
pkg/indexer/settlement_chain/contracts/payer_report_manager_storer.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 115: Brittle error classification using substring matching may swallow unrelated errors or fail to recognize the intended sentinel. In StoreLog, when setReportSubmitted returns an error, the code checks if strings.Contains(err.Error(), ErrReportAlreadyExists) to decide whether to suppress the error and continue. Using a substring on the formatted error message is unsafe: (1) an unrelated error message that happens to contain "report already present in database" will be incorrectly treated as non-fatal and suppressed; (2) if the underlying NonRecoverableError changes its formatting/wrapping, .Error() might not contain the exact substring and the already-exists condition would incorrectly bubble up as an error. This can cause incorrect behavior (skipping real errors or surfacing expected idempotent conditions as failures). Prefer a typed/sentinel error with errors.Is or a structured discriminator on the RetryableError (e.g., a method or error code), or match against a well-known exported error variable rather than string contents. [ Low confidence ]
pkg/metrics/docs/generator.go — 0 comments posted, 2 evaluated, 2 filtered
  • line 87: dumpToMarkdown writes Markdown table rows directly using fmt.Sprintf("| %s|%s| %s |%s |\n", m.Name, m.Type, desc, m.File) at line 87 without escaping Markdown-sensitive characters in desc (and potentially m.Name/m.File). If m.Description contains the pipe character |, backticks, or newlines, the generated table becomes malformed or breaks rendering. The code should escape or sanitize table cell content (e.g., replace | with \|, backticks with escaped equivalents, and normalize newlines) to ensure valid Markdown output for all metric descriptions. [ Low confidence ]
  • line 91: dumpToMarkdown writes to MARKDOWN_OUTPUT ("doc/metrics_catalog.md") without ensuring that the parent directory (doc/) exists. On a fresh checkout or in environments where the doc directory is absent, os.WriteFile at line 91 will fail with a "no such file or directory" error, causing log.Fatalf to terminate the program. The code should create the directory tree (e.g., using os.MkdirAll(filepath.Dir(MARKDOWN_OUTPUT), 0o755)) before attempting to write the file. [ Low confidence ]
pkg/payerreport/verifier.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 125: IsValidReport returns true for empty reports (where StartSequenceID == EndSequenceID) without validating that PayersMerkleRoot equals the canonical hash for an empty set. The comment explicitly states the merkle root "must always be the hash of an empty set", but the implementation skips any check and accepts potentially invalid merkle roots. This can lead to incorrect acceptance/attestation of malformed empty reports contrary to the stated contract. [ Out of scope ]
pkg/server/server.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 175: NewReplicationServer starts multiple long-running components (metrics server, indexer, migrator, and potentially API server) but if a subsequent step fails, it returns an error without stopping previously started components. This violates paired cleanup and can leave goroutines and listeners running when construction fails. Concrete examples: metrics server is started at metrics.NewMetricsServer (lines 175-184) and indexer is started via s.indx.StartIndexer() (lines 233-239). If a later section (e.g., starting API server at lines 265-276 or building payer report workers at lines 339-356) fails, the function returns early without stopping the already started metrics server (s.metrics), indexer (s.indx), or migrator (s.migratorServer). This creates partial initialization with no cleanup in error paths, violating required invariants like single paired cleanup and no leaks. [ Previously rejected ]
pkg/stress/stress.go — 0 comments posted, 4 evaluated, 3 filtered
  • line 81: Nil logger will cause a runtime panic when calling methods like logger.Info/logger.Error. StressIdentityUpdates accepts *zap.Logger without a guard and uses it at multiple points (e.g., first at logger.Info("starting transaction", ...)). If a caller passes nil, the method call will dereference a nil pointer and panic. Add a non-nil check at function entry or default to zap.NewNop(). [ Low confidence ]
  • line 83: Nil ctx leads to a panic in CastSendCommand.Run when it does context.WithTimeout(ctx, 30*time.Second). StressIdentityUpdates passes its ctx directly to cs.Run(ctx) without guarding against nil. If the caller provides a nil context.Context, this causes a panic in the callee. Add a non-nil context guard (e.g., default to context.Background() if ctx == nil) or document and enforce non-nil at entry. [ Low confidence ]
  • line 119: Division by zero when n == 0 causes a runtime panic and invalid metrics. Specifically, avgDuration := totalDuration / time.Duration(n) will panic with integer divide-by-zero, and success_rate := float64(successCount)/float64(n) will produce NaN (or panic depending on context) when n is zero. The function accepts any int for n with no guards, and the loop will not run when n == 0, making this path reachable. Add an explicit guard for n <= 0 to return early with a defined outcome or compute metrics based on successes only. [ Previously rejected ]
pkg/utils/log.go — 0 comments posted, 1 evaluated, 1 filtered
  • line 116: Misconfigured zap encoder keys: EncoderConfig.NameKey is set to "caller" (line 116), while EncodeCaller is set (line 117) but CallerKey is not configured. This causes the logger "name" (e.g., xmtpd.api.publish-worker) to be emitted under the field key caller, and actual caller information will not be included at all because CallerKey is empty. This mismatches intended semantics, can confuse log parsing/analytics that expect caller to be a file:line, and contradicts the documented guidance for readable name chains. Correct configuration should use distinct keys, e.g., set NameKey to "logger" (or similar) and set CallerKey to "caller" if caller info is desired. [ Out of scope ]

@fbac fbac requested a review from a team as a code owner October 23, 2025 07:57
@graphite-app
Copy link

graphite-app bot commented Oct 23, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • Queue - adds this PR to the back of the merge queue
  • Hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@fbac fbac force-pushed the 10-23-fix_payer_report_logger branch from 7eec0ae to 0948726 Compare October 23, 2025 10:40
@fbac fbac merged commit e42ff0c into main Oct 23, 2025
11 checks passed
@fbac fbac deleted the 10-23-fix_payer_report_logger branch October 23, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants