-
Notifications
You must be signed in to change notification settings - Fork 39
Indexer metrics + indexer refactors #639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request introduces a new timer mechanism in the blockchain log streamer's Changes
Sequence Diagram(s)sequenceDiagram
participant RS as RpcLogStreamer
participant T as Timer
participant BN as Blockchain Node
participant L as Logger
RS->>T: Start timer for maximum disconnect time
RS->>BN: Request logs
alt Logs received or re-org detected
RS->>T: Reset timer
RS->>L: Log successful operation or warning (if re-org)
else No logs / timeout condition
T->>RS: Trigger timeout event
RS->>L: Emit fatal shutdown log
end
sequenceDiagram
participant IL as IndexLogs
participant R as Retry Function
participant F as Operation Function
participant L as Logger
IL->>R: Invoke retry with operation, sleep duration, & contract address
loop Retry attempts
R->>F: Execute storage operation
alt Operation succeeds
R->>IL: Return success result
else Operation fails with retryable error
R->>L: Emit metric for retryable error
Note over R: Wait for sleep duration before retrying
end
end
R->>IL: Return error after maximum retries
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 golangci-lint (1.62.2)Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24) Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/indexer/storer/identityUpdate.go (1)
107-109: Logging new members is clear, but confirm verbosity.
While logging new members is helpful, consider whetherInfois the appropriate level if these updates occur frequently.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
pkg/blockchain/rpcLogStreamer.go(5 hunks)pkg/indexer/indexer.go(5 hunks)pkg/indexer/indexer_test.go(5 hunks)pkg/indexer/storer/error.go(1 hunks)pkg/indexer/storer/groupMessage.go(5 hunks)pkg/indexer/storer/identityUpdate.go(11 hunks)pkg/metrics/indexer.go(2 hunks)pkg/metrics/metrics.go(1 hunks)pkg/testutils/config.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Test (Node)
- GitHub Check: Build pre-baked anvil-xmtpd
🔇 Additional comments (54)
pkg/metrics/metrics.go (1)
72-74: New metrics added for better observability.The addition of these three metrics enhances the observability of the indexer component:
indexerMaxBlocklikely tracks the highest block processedindexerCurrentBlockLagprovides visibility into how far behind the indexer is from the chain headindexerCountRetryableStorageErrorshelps monitor storage failures that can be retriedThese metrics align well with the PR's objective of adding metrics related to issue #568.
pkg/testutils/config.go (1)
22-22: MaxChainDisconnectTime configuration added for watchContract.This new configuration parameter sets a reasonable default timeout (10 seconds) for blockchain disconnect scenarios in test environments. This directly supports the PR objective of refactoring the self-termination functionality of
watchContract.pkg/indexer/storer/groupMessage.go (1)
46-46: Error handling improved with explicit error types.The error handling has been refactored to use more specific error types:
NewUnrecoverableLogStorageErrorfor parsing, validation, and marshaling errorsNewRetryableLogStorageErrorfor database operations that might succeed on retryThis change improves code readability and error classification, supporting the metrics for tracking retryable errors.
Also applies to: 54-54, 65-67, 73-73, 82-82, 88-88, 100-100, 111-111
pkg/indexer/indexer.go (4)
9-9: Added metrics import for error tracking.The addition of the metrics import supports the emission of metrics for retryable storage errors, aligning with the PR objective to add metrics.
107-107: Contract address parameter added for metrics tracking.Adding the contract address as a parameter to
indexLogsallows for more granular tracking of errors by contract, enhancing observability.Also applies to: 137-137
343-354: Retry logic refactored for better maintainability.The old retry loop structure has been replaced with a dedicated
retryfunction, which:
- Improves code readability by centralizing retry logic
- Provides a clearer control flow
- Enables metric emission for retryable errors
This is a good refactoring that maintains the same functionality while making the code more maintainable.
359-377: Well-designed retry function with metrics integration.The new
retryfunction:
- Takes the necessary context (logger, sleep duration, address)
- Uses a clean loop structure for retries
- Handles error classification via the
ShouldRetry()method- Integrates metrics emission for retryable errors
This implementation is clean, maintainable, and supports the metrics objectives of the PR.
pkg/indexer/storer/identityUpdate.go (12)
6-6: Import is appropriate for new error handling logic.
This import is required for the recently introducederrors.Asusage.
59-59: Switching toUnrecoverableLogStorageErroris correct.
Creating a distinct error type clarifies that parse failures cannot be retried.
69-69: Proper use ofUnrecoverableLogStorageError.
Indicating an irrecoverable condition for sequence ID retrieval aligns with the new categorization.
119-119: Correctly marks insertion errors as retryable.
Database insertion failures often merit a retry if they are transient in nature.
147-147: Consistent retryable error usage for revocation failures.
Revocation logic can also fail for temporary DB reasons. This is aligned with the new approach.
161-161:UnrecoverableLogStorageErrorfor originator envelope errors.
Building envelopes is typically logic-based rather than transient, thus not retryable.
171-171: Same pattern for signing envelope failures.
Signing errors also likely indicate a permanent issue.
177-177: Unrecoverable on marshalling originator envelope.
Marshalling failures often signal malformed data rather than a transient condition.
187-188: Switch to retryable error for gateway envelope insertion.
This is correct if an insertion failure might be due to a temporary DB condition.
198-198: Retryable insertion error for blockchain messages.
Storing blockchain messages could fail due to transient issues, fitting the retryable pattern well.
206-207: Leveragingerrors.Asfor custom error interface.
This pattern neatly distinguishes internal errors from system/database exceptions.
211-212: Default to retryable for unknown errors.
A safe fallback approach if the error type is not recognized.pkg/indexer/indexer_test.go (9)
6-6: Introducedsyncpackage.
WaitGroupusage replaces arbitrary sleep for more robust test synchronization.
45-47: Set upWaitGroupfor two asynchronous calls.
Appropriately increments the counter to trackStoreLogandUpdateLatestBlockcompletions.
51-53: Signalingwg.Done()inside the mock run function.
Ensures that the test knows exactly when the mock call completes.
78-89: Replaces sleep withWaitGroupand channel signaling.
This approach is less error-prone and more deterministic than usingtime.Sleep.
118-120:WaitGroupusage inTestIndexLogsRetryableError.
Consistent approach for tracking asynchronous test operations.
127-127: Properly invokingwg.Done()afterStoreLog.
Ensures that the test waits for the storer mock call to finish.
129-133: Simulating a retryable error and then an unrecoverable error.
This is a realistic test scenario that aligns perfectly with the new error infrastructure.
147-148: PassingtestContractparameter.
Verifies that the contract reference is correctly assigned in the test.
150-161: WaitGroup-based synchronization with channel closure.
Again, a robust replacement for fixed sleeps in tests.pkg/indexer/storer/error.go (8)
8-9: NewUnrecoverableLogStorageErrorstruct.
Helps differentiate permanent failures from transient ones.
12-14: Implements theError()method properly.
Exposes the wrapped error’s message as expected.
16-18:ShouldRetry()returning false for unrecoverable errors.
Clearly encodes that these errors cannot be retried.
20-21: Constructor forUnrecoverableLogStorageError.
Simple and clear factory function.
24-26:RetryableLogStorageErrorstruct for transient errors.
Separating concerns for retry logic is a smart design move.
28-28:Error()implementation for retryable errors.
Directly wraps the underlying error message.
32-34:ShouldRetry()set to true for retryable scenario.
Ensures that callers can differentiate between permanent vs. temporary conditions.
36-37: Constructor forRetryableLogStorageError.
Maintains parallelism with the unrecoverable error constructor.pkg/metrics/indexer.go (7)
26-32: Good addition of max block metric for tracking indexer progress.This gauge metric will help monitor the maximum block on the chain to be processed, providing better visibility into the indexer's progress.
34-40: Great addition of block lag metric.The block lag metric provides valuable information about how far behind the indexer is, which is critical for monitoring and alerting on indexer health.
42-48: Good implementation of storage error tracking.This counter will help monitor and analyze retryable storage errors, which aligns with improving the indexer's error handling and reliability.
72-75: Proper type safety improvement from int to uint64.Changing the block parameter from
inttouint64is a good improvement as block numbers are always positive and this aligns with Ethereum's native block number representation.
77-80: Good implementation of max block emission function.This function complements the new
indexerMaxBlockmetric and follows the established pattern for the metrics emitter functions.
87-90: Well-designed block lag emission function.This implementation correctly follows the pattern of other metric emitters and will be useful for tracking indexing progress.
92-95: Good implementation of error counter increment.This function provides a clean way to track retryable storage errors by contract address, which will be valuable for debugging and monitoring.
pkg/blockchain/rpcLogStreamer.go (11)
132-134: Good implementation of self-termination timer.The timer implementation is a cleaner approach than using manual timestamp tracking. The
defer timer.Stop()ensures proper cleanup.
140-144: Improved shutdown handling for prolonged disconnections.Using a timer-based approach with a fatal log provides a more graceful and deterministic way to handle excessive disconnect times rather than panicking.
151-156: Better log level for blockchain reorganization.Changing from Info to Warn is appropriate as a reorg is an exceptional event that operators should be aware of. The timer reset ensures proper tracking of disconnect time after a reorg.
169-170: Good timer reset after successful operation.Properly resetting the timer after successfully getting logs ensures accurate tracking of disconnect time.
171-174: Simplified no-logs handling logic.The simplified conditional is cleaner and more maintainable than the previous implementation, while preserving the same behavior.
201-202: Good integration of max block metric.This properly ties the blockchain client's block number to the metrics system, providing visibility into chain progress.
205-210: Well-implemented block lag metrics.The code correctly handles both the case when the indexer is caught up (lag = 0) and when it's lagging behind, providing accurate metrics.
226-227: Proper updating of current block metric.Using the
toBlockvalue for the current block metric is correct as it represents the highest block that has been fully processed in the current iteration.
229-230: Clear next block calculation.The code clearly defines the next block to process as one past the current range's end.
240-242: Good type consistency for block numbers.Changing from
int64touint64for block numbers provides better type consistency throughout the codebase and aligns with Ethereum's native block number representation.Also applies to: 250-252
215-220:Details
❓ Verification inconclusive
Verify error handling for FilterLogs call.
The error handling for the FilterLogs call looks good, but you might want to consider specific handling for different types of RPC errors that might occur, such as rate limiting or temporary network issues.
🏁 Script executed:
#!/bin/bash # Check how RPC errors are handled across the codebase rg -A 10 -B 5 "FilterLogs.*err\s*!=" --glob "*.go"Length of output: 52
ACTION: Review and potentially refine error handling for the FilterLogs RPC call
The current implementation wraps the call in
metrics.MeasureGetLogs, and while this provides basic error propagation, it's unclear if different RPC error cases—such as rate limiting or transient network errors—are handled distinctly. Please verify manually whether:
- Specialized error handling for these scenarios is needed at this call site or within
MeasureGetLogs- The broader codebase already addresses such distinctions elsewhere
If such differentiation would improve robustness, consider adding explicit checks for these error types.
Adds all metrics from #568 On top of that: - refactors self-termination watchContract - mutes loud debug logs - uses waitgroups in some tests to speed them up - some other driveby refactors Fixes #568 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - **New Features** - Introduced enhanced monitoring that displays new metrics for key blockchain processing statistics, including maximum block processed, block lag, and storage error counts. - Added a configurable disconnect timeout to help ensure a more graceful recovery during prolonged network interruptions. - **Refactor** - Streamlined error handling and retry processes to improve overall system stability and reliability during blockchain operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Adds all metrics from #568
On top of that:
Fixes #568
Summary by CodeRabbit
New Features
Refactor