Expose metrics for connected node versions #698

mkysel · 2025-04-04T19:07:07Z

Add version tracking metrics to `ValidateVersionClaimIsCompatible` method in `ClaimValidator` to expose metrics for connected node versions

Implements connection metrics tracking by adding new Prometheus metrics and modifying the authentication flow. The changes center around:

Adds new Prometheus metrics in api.go for tracking node connections and requests by version
Modifies JWTVerifier interface in interface.go to return a CloseFunc for connection cleanup
Updates ValidateVersionClaimIsCompatible in claims.go to track metrics using metrics.EmitNewConnectionRequestVersion
Implements connection tracking in authentication interceptors in auth.go

📍Where to Start

Start with the ValidateVersionClaimIsCompatible method in claims.go which implements the core version tracking functionality and returns the new CloseFunc.

Macroscope summarized 397e495.

Summary by CodeRabbit

New Features
- Enhanced connection monitoring with version-specific metrics, allowing improved tracking of incoming connections and requests.
- Updated authentication handling now automatically manages resource cleanup during token verification to ensure more stable and robust connection management.
Tests
- Revised test suites now verify the improved resource management and metric tracking behavior.

coderabbitai · 2025-04-04T19:07:16Z

Walkthrough

This pull request updates several authentication-related components by modifying method signatures to include a new cancellation function type, CloseFunc. The changes affect JWT verification, claims validation, interceptor logic, and corresponding tests and mocks. Additionally, new metrics are introduced to track incoming connection requests by version, along with a helper tracker structure for managing connection counts. Each modified component now returns a cancellation function (or emptyClose in error cases) to ensure proper cleanup of resources.

Changes

File(s)	Change Summary
`pkg/authn/claims.go`, `pkg/authn/interface.go`, `pkg/authn/verifier.go`, `pkg/mocks/authn/mock_JWTVerifier.go`	Updated method signatures to include new `CloseFunc` return value; added `emptyClose()` for cleanup; adjusted version claim validation with metrics emission and connection tracker usage.
`pkg/authn/claims_test.go`, `pkg/authn/verifier_test.go`, `pkg/interceptors/server/auth_test.go`	Modified test cases to capture the additional cancellation function from `Verify` calls and added `defer cancel()` for proper resource cleanup.
`pkg/interceptors/server/auth.go`	Revised interceptor internal logic in `Unary` and `Stream` methods to retrieve and defer the cancellation function from `Verify` calls.
`pkg/metrics/api.go`, `pkg/metrics/metrics.go`	Introduced new version-specific metrics collectors and defined the `IncomingConnectionTracker` with corresponding `Open` and `Close` methods; added `EmitNewConnectionRequestVersion` to log connection requests by version.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as AuthInterceptor
    participant V as Verifier
    participant M as Metrics/Tracker

    C->>A: Send request with token
    A->>V: Verify(token)
    V-->>A: (nodeId, cancel, error)
    alt Verification Failed
        A->>C: Return error
        note right of A: defer cancel() called
    else Verification Succeeded
        A->>M: Emit new connection request (by version)
        M-->>A: Tracker opened (increment metric)
        A->>C: Process request
        note right of A: defer cancel() to close tracker
    end

Possibly related PRs

JWT Version Checks #364: Modifications to the Verify method’s return types and version claim handling; aligns with changes in JWT processing.
Update to newest protos #422: Adjustments in the pkg/mocks/authn/mock_JWTVerifier.go signature to include CloseFunc, mirroring updates in the main PR.
Improve version checking system #506: Changes to the ValidateVersionClaimIsCompatible method in ClaimValidator with updated return types and control flow modifications.

Suggested reviewers

fbac

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

pkg/authn/claims.go (1)
41-65:
Details

❓ Verification inconclusive

Validate the correct usage of the returned CloseFunc.

The approach of returning emptyClose on errors avoids nil checks in callers. Good use of a no-op function.

Creating and opening a tracker, then returning a closing function is a solid pattern for pairing resource allocation and deallocation.

Confirm that all callers of ValidateVersionClaimIsCompatible correctly invoke the returned CloseFunc to prevent leaking resources.

Consider adding a test to ensure that when an error is returned, subsequent calls on the CloseFunc do not cause issues. This can surface any edge cases around resource management.

🏁 Script executed:
#!/bin/bash
# Description: Search for all usage sites of ValidateVersionClaimIsCompatible and confirm they invoke the returned CloseFunc.
rg -A 5 "ValidateVersionClaimIsCompatible(.*)"
Length of output: 791
Action Required: Verify that all call sites properly invoke the returned CloseFunc

The implementation in pkg/authn/claims.go properly creates a cleanup function—even on error—using a no-op (emptyClose), which is a good design choice to avoid nil checks.

The pattern of resource allocation (tracker creation/opening) paired with returning a closure for cleanup is sound.

However, our grep results show that in pkg/authn/verifier.go the call to ValidateVersionClaimIsCompatible is simply returned, meaning the actual invocation of the CloseFunc depends on the caller further up the chain. Please verify that every consumer of this API eventually calls the returned CloseFunc to prevent resource leaks.

Additionally, consider adding a test case to cover scenarios where an error is returned—ensure that subsequent invocations of the CloseFunc (including repeated calls on the no-op function) do not cause issues.
pkg/metrics/api.go (1)
26-26: Fix incorrect help text for the counter metric.

The help text for apiNodeConnectionRequestsByVersionCounter incorrectly states "Number of incoming node connections by version" which is the same as the gauge metric. This should be updated to reflect that it counts connection requests rather than active connections.
-		Help: "Number of incoming node connections by version",
+		Help: "Number of incoming node connection requests by version",

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69011f9 and 397e495.

📒 Files selected for processing (10)

pkg/authn/claims.go (2 hunks)
pkg/authn/claims_test.go (3 hunks)
pkg/authn/interface.go (1 hunks)
pkg/authn/verifier.go (5 hunks)
pkg/authn/verifier_test.go (8 hunks)
pkg/interceptors/server/auth.go (2 hunks)
pkg/interceptors/server/auth_test.go (4 hunks)
pkg/metrics/api.go (2 hunks)
pkg/metrics/metrics.go (1 hunks)
pkg/mocks/authn/mock_JWTVerifier.go (4 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

pkg/mocks/authn/mock_JWTVerifier.go (1)

pkg/authn/interface.go (1)

CloseFunc (3-3)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: Push Docker Images to GitHub Packages (xmtpd-cli)
GitHub Check: Push Docker Images to GitHub Packages (xmtpd)
GitHub Check: Build pre-baked anvil-xmtpd
GitHub Check: Upgrade Tests
GitHub Check: Test (Node)
GitHub Check: Code Review

🔇 Additional comments (30)

pkg/authn/interface.go (2)

3-3: Introducing a dedicated CloseFunc is a clear approach.
By defining a no-op function type for resource cleanup, you reduce the risk of nil-pointer checks for callers. Just ensure that the implementation of any returned CloseFunc fully closes or releases associated resources.

6-6:
Details

✅ Verification successful

Be mindful of downstream breakage from the changed signature.
Changing Verify(tokenString string) (uint32, CloseFunc, error) may break existing callers. Verify that each invocation of Verify properly handles the new CloseFunc return.

Run this script to locate and confirm that all calls handle the CloseFunc:

🏁 Script executed:
#!/bin/bash
rg -A 3 $'Verify\$.*\$' 
Length of output: 6602
Downstream Usage Verified: Updated Verify Signature Handled Correctly

All invocations of Verify across the codebase have been updated to properly handle the new CloseFunc return value. The grep results confirm that:

In the mocks (e.g., pkg/mocks/authn/mock_JWTVerifier.go), the return value now supplies a placeholder (func() {}) for CloseFunc.

In test files (e.g., pkg/interceptors/server/auth_test.go, pkg/authn/verifier_test.go, and pkg/authn/claims_test.go), callers capture cancel and appropriately defer its execution.

The changes in pkg/authn/interface.go and corresponding production code (e.g., pkg/interceptors/server/auth.go) consistently unpack and handle the new signature.

It appears that no additional modifications are needed to manage the CloseFunc return, and downstream breakage risks have been mitigated.
pkg/metrics/metrics.go (1)

87-88: New version-based metrics look good.
Registering apiIncomingNodeConnectionByVersionGauge and apiNodeConnectionRequestsByVersionCounter will help track node versions effectively. Ensure you’ve documented these metrics in your monitoring guide for easy reference.

pkg/authn/claims.go (1)

6-7: New metrics import is consistent with the code changes.
Importing metrics is appropriate now that the function logs version usage and instantiates a connection tracker.

pkg/interceptors/server/auth_test.go (4)

39-39: LGTM: Updated mock expectation to match new method signature.

The mock expectation now includes an empty function as the second return value to match the updated Verify method signature.

74-74: LGTM: Consistency with the new method signature.

The mock expectation for the invalid token case correctly includes the new function parameter in the return values.

134-134: LGTM: Updated mock expectation in stream interceptor test.

The mock expectations for the stream interceptor tests have been correctly updated to match the new method signature.

159-159: LGTM: Consistent implementation across all test cases.

The invalid token case for stream interceptor has been properly updated to match the new method signature.

pkg/authn/claims_test.go (3)

57-58: LGTM: Properly handling the new CloseFunc return value.

Good implementation of capturing and deferring the cancellation function to ensure resources are properly cleaned up after verification.

120-121: LGTM: Consistent resource cleanup pattern.

The test correctly captures and defers the cancel function, maintaining a consistent pattern across all verification calls.

206-207: LGTM: Proper resource cleanup in validation tests.

The test correctly handles the new cancel function return value with appropriate deferred execution.

pkg/interceptors/server/auth.go (4)

98-99: LGTM: Updated Unary interceptor to handle the new return value.

The Unary interceptor now correctly extracts the cancellation function from the Verify method call.

106-106: LGTM: Proper resource cleanup with defer.

Good practice to defer the cancellation function to ensure resources are properly cleaned up after verification.

129-130: LGTM: Updated Stream interceptor for consistent behavior.

The Stream interceptor has been properly updated to handle the new return value from the Verify method.

137-138: LGTM: Consistent resource cleanup pattern in Stream interceptor.

Matching the pattern in the Unary interceptor, the cancellation function is properly deferred for cleanup.

pkg/authn/verifier.go (9)

26-26: LGTM: Useful empty closure function.

This empty function serves as a placeholder for cases where a CloseFunc is required but no actual cleanup is needed.

46-46: LGTM: Updated method signature to include CloseFunc.

The Verify method now returns a CloseFunc as its second return value, consistent with the PR's objective of exposing metrics for connected node versions.

55-56: LGTM: Consistent error handling with emptyClose.

All error paths now return the emptyClose function alongside the error, ensuring that callers can safely defer the returned function without nil checks.

Also applies to: 58-59, 62-63

65-68: LGTM: Propagating the CloseFunc from validateClaims.

The method now correctly propagates both the CloseFunc and error from validateClaims, maintaining proper resource management across the validation chain.

70-73: LGTM: Consistent error handling for subject node ID.

The error path for getSubjectNodeId now correctly returns emptyClose as the second return value.

75-75: LGTM: Updated return values to include the CloseFunc.

The success path now returns the nodeId and the CloseFunc obtained from validateClaims.

118-122: LGTM: Updated validateClaims to return a CloseFunc.

The validateClaims method signature has been updated to return a CloseFunc, with emptyClose returned for type-related errors.

126-127: LGTM: Consistent error handling for invalid tokens.

This is consistent with the other error paths, returning emptyClose alongside the error.

144-145: LGTM: Simplified return in getSubjectNodeId.

The method now directly returns nodeId without unnecessary type conversion, improving code readability.

pkg/authn/verifier_test.go (1)

81-82: Good implementation of cancellation pattern.

The addition of a cancel function return value from Verify and deferring it immediately is a good practice for resource cleanup. This pattern is consistently implemented across all test cases.

pkg/metrics/api.go (3)

15-29: The new metrics for tracking node connections by version look good.

Adding these metrics will provide valuable insights into the distribution of versions connecting to the system.

51-69: Well-designed connection tracker implementation.

The IncomingConnectionTracker implementation follows good design patterns by encapsulating the version-specific metrics operations. The Open/Close methods make the code more readable and maintainable.

71-74: Good helper function for emitting connection request metrics.

The EmitNewConnectionRequestVersion function provides a clean API for incrementing the connection request counter.

pkg/mocks/authn/mock_JWTVerifier.go (2)

24-58: Updated mock implementation for the new Verify signature.

The mock has been correctly updated to handle the new CloseFunc return type in the Verify method. The implementation handles all the different return value scenarios appropriately.

78-86: Helper method signatures updated consistently.

The Return and RunAndReturn methods have been correctly updated to match the new signature of the Verify method.

pkg/authn/verifier.go

mkysel added 2 commits April 4, 2025 15:02

Expose metrics for connected node versions

ccd91f9

have naked counter too

397e495

mkysel requested a review from a team as a code owner April 4, 2025 19:07

coderabbitai bot reviewed Apr 4, 2025

View reviewed changes

pkg/authn/verifier.go Show resolved Hide resolved

fbac approved these changes Apr 7, 2025

View reviewed changes

mkysel merged commit d67ba88 into main Apr 7, 2025
9 checks passed

mkysel deleted the mkysel/incoming-version-metrics branch April 7, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose metrics for connected node versions #698

Expose metrics for connected node versions #698

Uh oh!

mkysel commented Apr 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 4, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Expose metrics for connected node versions #698

Expose metrics for connected node versions #698

Uh oh!

Conversation

mkysel commented Apr 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add version tracking metrics to ValidateVersionClaimIsCompatible method in ClaimValidator to expose metrics for connected node versions

📍Where to Start

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkysel commented Apr 4, 2025 •

edited by coderabbitai bot

Loading

Add version tracking metrics to `ValidateVersionClaimIsCompatible` method in `ClaimValidator` to expose metrics for connected node versions

coderabbitai bot commented Apr 4, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)