Skip to content

fix: improve fallback tx error handling#3770

Merged
skosito merged 5 commits intodevelopfrom
improve-fallback-tx-error-handling
Mar 28, 2025
Merged

fix: improve fallback tx error handling#3770
skosito merged 5 commits intodevelopfrom
improve-fallback-tx-error-handling

Conversation

@skosito
Copy link
Member

@skosito skosito commented Mar 27, 2025

Description

extended solana examples a bit to enable e2e test zeta-chain/protocol-contracts-solana#97

idea is to check error logs for Program <PROGRAM_ID> invoked logs, and to check if some program is invoked after gateway

  • if only gateway is invoked, we just skip NonceMismatch errors, anything else is requires fallback (eg: token transfer, regular transfer, etc)
  • if something else is invoked after gateway, fallback is needed, error msg from that program is not considered, so it can be anything, including NonceMismatch - e2e test is extended with this case
  • if connected program calls back gateway, it is reentrancy and it will fail, but gateway invoke might appear again, so this should cover that scenario as well

we probably should look into other solana golang libraries, this one just gives error string so we must do some parsing to figure out on our own

How Has This Been Tested?

  • Tested CCTX in localnet
  • Tested in development environment
  • Go unit tests
  • Go integration tests
  • Tested via GitHub Actions

Summary by CodeRabbit

  • Documentation

    • Updated the changelog to reflect improved fallback transaction error handling.
  • Bug Fixes

    • Enhanced the error handling in transactions to trigger fallback processing under additional conditions for improved reliability.
  • Tests

    • Refined test conditions to validate more specific revert triggers.
    • Added tests to verify accurate detection of error conditions based on invocation sequences.

@skosito skosito added the SOLANA_TESTS Run make start-solana-test label Mar 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 27, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The pull request integrates several changes across different modules. It updates the changelog with a note on fallback transaction error handling, amends a test payload for a specific revert condition, introduces a new function to analyze error log sequences in Solana contracts, and adds corresponding tests. Additionally, the error handling in the Solana signer’s broadcastOutbound method is modified to use the new function for determining when to execute a fallback transaction.

Changes

File(s) Change Summary
changelog.md Added a new entry in the "Fixes" section regarding improved fallback transaction error handling with a reference to PR 3770.
e2e/e2etests/test_solana_withdraw_and_call_revert_with_call.go Updated the test payload from "revert" to "revert NonceMismatch", clarifying the reason for the revert condition.
pkg/contracts/solana/instruction.go
pkg/contracts/solana/instruction_test.go
Introduced the new function ProgramInvokedAfterTargetInErrStr using regex to check program invocation order in error logs, and added tests validating its behavior with multiple sub-tests.
zetaclient/chains/solana/signer/signer.go Modified the broadcastOutbound method to remove the explicit "NonceMismatch" check, now leveraging the new function to decide on using a fallback transaction when a program is invoked after the gateway.

Sequence Diagram(s)

sequenceDiagram
    participant S as Signer
    participant N as Solana Network
    participant C as Contracts (Error Checker)

    S->>N: Broadcast transaction
    N-->>S: Return error response
    S->>C: Invoke ProgramInvokedAfterTargetInErrStr(errMsg, targetProgram)
    C-->>S: Return true/false based on error log scan
    alt Fallback condition met
        S->>N: Broadcast fallback transaction
    else
        S->>S: Handle error without fallback
    end
Loading

Possibly related PRs

Suggested labels

bug, zetaclient, chain:solana

Suggested reviewers

  • gartnera
  • lumtis
  • kingpinXD

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Mar 27, 2025

Codecov Report

Attention: Patch coverage is 80.39216% with 10 lines in your changes missing coverage. Please review.

Project coverage is 64.40%. Comparing base (53143b3) to head (c93d83b).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
zetaclient/chains/solana/signer/fallback_tx.go 82.00% 6 Missing and 3 partials ⚠️
zetaclient/chains/solana/signer/signer.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3770      +/-   ##
===========================================
+ Coverage    64.37%   64.40%   +0.03%     
===========================================
  Files          462      463       +1     
  Lines        32915    32961      +46     
===========================================
+ Hits         21188    21229      +41     
- Misses       10755    10757       +2     
- Partials       972      975       +3     
Files with missing lines Coverage Δ
zetaclient/chains/solana/signer/signer.go 10.86% <0.00%> (+0.08%) ⬆️
zetaclient/chains/solana/signer/fallback_tx.go 82.00% <82.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@skosito skosito marked this pull request as ready for review March 27, 2025 15:44
@skosito skosito requested a review from a team as a code owner March 27, 2025 15:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/contracts/solana/instruction_test.go (1)

82-197: Good test coverage, but could benefit from some refinements.

The test function Test_ProgramInvokedAfterTargetInErrStr provides comprehensive coverage for the ProgramInvokedAfterTargetInErrStr function, covering key scenarios including:

  • No program invoked after the target gateway program
  • A different program invoked after the target gateway program
  • The gateway program invoked again after the initial invocation

I recommend the following improvements for better maintainability and robustness:

+// targetGatewayProgramID is the Solana program ID used for testing
+const targetGatewayProgramID = "94U5AHQMKkV5txNJ17QPXWoh474PheGou6cNP2FEuL1d"

 func Test_ProgramInvokedAfterTargetInErrStr(t *testing.T) {
-	t.Run("no program invoked after gateway", func(t *testing.T) {
+	// Define test cases to avoid repetition of test structure
+	testCases := []struct {
+		name          string
+		errorStr      string
+		targetProgram string
+		expected      bool
+	}{
+		{
+			name:          "no program invoked after gateway",
+			targetProgram: targetGatewayProgramID,
+			expected:      false,
+			errorStr: `(*jsonrpc.RPCError)(0x400233b920)({
 		// ...error string content...
+			})`,
+		},
+		{
+			name:          "program invoked after gateway",
+			targetProgram: targetGatewayProgramID,
+			expected:      true,
+			errorStr: `(*jsonrpc.RPCError)(0x40019dc210)({
 		// ...error string content...
+			})`,
+		},
+		{
+			name:          "gateway invoked after gateway",
+			targetProgram: targetGatewayProgramID,
+			expected:      true,
+			errorStr: `(*jsonrpc.RPCError)(0x40019dc210)({
 		// ...error string content...
+			})`,
+		},
+		{
+			name:          "empty error string",
+			targetProgram: targetGatewayProgramID,
+			expected:      false,
+			errorStr:      "",
+		},
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			invoked := contracts.ProgramInvokedAfterTargetInErrStr(tc.errorStr, tc.targetProgram)
+			if tc.expected {
+				require.True(t, invoked)
+			} else {
+				require.False(t, invoked)
+			}
+		})
+	}

Consider adding a comment to explain the purpose of these tests and the structure of the error strings, e.g.:

+// Test_ProgramInvokedAfterTargetInErrStr verifies the ProgramInvokedAfterTargetInErrStr function
+// which analyzes Solana JSON-RPC error logs to determine if a program was invoked after
+// a target program. This is used for transaction fallback decisions.
 func Test_ProgramInvokedAfterTargetInErrStr(t *testing.T) {
changelog.md (1)

21-21: Changelog Entry for PR 3770 – Fallback TX Error Handling:
The new entry is clear and properly formatted, aligning with the other changelog items in the Fixes section. It succinctly indicates the improvement in fallback transaction error handling. Consider whether additional context (such as mentioning that this change alters how errors from a fallback scenario are differentiated based on program invocation) might further aid future readers.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52a57c6 and 358bb2d.

⛔ Files ignored due to path filters (2)
  • contrib/localnet/solana/connected.so is excluded by !**/*.so
  • contrib/localnet/solana/connected_spl.so is excluded by !**/*.so
📒 Files selected for processing (5)
  • changelog.md (1 hunks)
  • e2e/e2etests/test_solana_withdraw_and_call_revert_with_call.go (1 hunks)
  • pkg/contracts/solana/instruction.go (2 hunks)
  • pkg/contracts/solana/instruction_test.go (1 hunks)
  • zetaclient/chains/solana/signer/signer.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.go`: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

**/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

  • e2e/e2etests/test_solana_withdraw_and_call_revert_with_call.go
  • pkg/contracts/solana/instruction.go
  • zetaclient/chains/solana/signer/signer.go
  • pkg/contracts/solana/instruction_test.go
🧠 Learnings (1)
zetaclient/chains/solana/signer/signer.go (1)
Learnt from: gartnera
PR: zeta-chain/node#3632
File: zetaclient/chains/solana/signer/signer.go:304-304
Timestamp: 2025-03-27T14:00:41.939Z
Learning: The Solana signer implementation in zetaclient/chains/solana/signer/signer.go has limited test coverage, particularly for the transaction broadcasting logic with fallback scenarios. Adding this coverage has been acknowledged as a potential future improvement outside the scope of immediate fixes.
🧬 Code Definitions (2)
zetaclient/chains/solana/signer/signer.go (1)
pkg/contracts/solana/instruction.go (1)
  • ProgramInvokedAfterTargetInErrStr (589-612)
pkg/contracts/solana/instruction_test.go (1)
pkg/contracts/solana/instruction.go (1)
  • ProgramInvokedAfterTargetInErrStr (589-612)
🪛 GitHub Check: codecov/patch
zetaclient/chains/solana/signer/signer.go

[warning] 347-353: zetaclient/chains/solana/signer/signer.go#L347-L353
Added lines #L347 - L353 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: start-solana-test / e2e
🔇 Additional comments (3)
e2e/e2etests/test_solana_withdraw_and_call_revert_with_call.go (1)

54-54: More specific test case with NonceMismatch error handling.

The test payload has been expanded to include "NonceMismatch", which aligns with the enhanced error handling in the broadcastOutbound function that now differentiates between regular errors and NonceMismatch errors.

pkg/contracts/solana/instruction.go (1)

587-612: Good implementation of program invocation analysis.

This function provides a necessary utility to determine if other programs were invoked after a specific target program, which enables more precise error handling in Solana transactions. The implementation uses a clean regex approach to identify program invocations in error logs.

zetaclient/chains/solana/signer/signer.go (1)

347-353:

Details

❓ Verification inconclusive

Improved fallback transaction handling logic for program invocations.

The conditional logic now handles two distinct scenarios for using fallback transactions:

  1. When a program is invoked after the gateway, regardless of error type
  2. When an error occurs that is not a NonceMismatch error

This enhancement allows for more robust handling of Solana transaction failures.

This implementation lacks test coverage. Consider adding unit tests specifically for this error handling logic to ensure it behaves as expected in various scenarios.


🏁 Script executed:

#!/bin/bash
# Check if there are tests covering the broadcastOutbound function in the Solana signer
rg -A 10 -B 10 "Test.*broadcastOutbound" --type go

Length of output: 52


Enhanced Fallback Transaction Handling – Test Coverage Needed

The updated conditional logic in zetaclient/chains/solana/signer/signer.go properly refines the fallback transaction mechanism for Solana. It differentiates between cases where a program is invoked after the gateway—ensuring that the fallback is used regardless of the error—and when NonceMismatch errors should be bypassed to accommodate multiple relay attempts.

However, our investigation indicates that there is currently no unit test covering this logic (as verified by the absence of matching tests for broadcastOutbound). To bolster confidence in these changes, please add dedicated tests that simulate:

  • An error message containing "Error processing Instruction" with a valid fallback transaction, including scenarios where a program is invoked after the gateway.
  • A case where the error is a "NonceMismatch" and the fallback transaction should not be applied when no post-target program invocation is detected.

Once these tests are in place, we can ensure the robustness of error handling in production.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 347-353: zetaclient/chains/solana/signer/signer.go#L347-L353
Added lines #L347 - L353 were not covered by tests

Copy link
Contributor

@lumtis lumtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering, do we have a case where we produce a NonceMismatch from signers in E2E tests?

Is it something that could be reproduced?

@skosito
Copy link
Member Author

skosito commented Mar 27, 2025

Wondering, do we have a case where we produce a NonceMismatch from signers in E2E tests?

Is it something that could be reproduced?

issue reported is reproduced in e2e test in this repo, connected program is reverting with NonceMismatch error

@lumtis
Copy link
Contributor

lumtis commented Mar 27, 2025

Wondering, do we have a case where we produce a NonceMismatch from signers in E2E tests?
Is it something that could be reproduced?

issue reported is reproduced in e2e test in this repo, connected program is reverting with NonceMismatch error

The test check that the false positive is handled, but doesn't check for the actual NonceMismatch from ZetaClient that should be retried and not reverted?

@skosito
Copy link
Member Author

skosito commented Mar 27, 2025

Wondering, do we have a case where we produce a NonceMismatch from signers in E2E tests?
Is it something that could be reproduced?

issue reported is reproduced in e2e test in this repo, connected program is reverting with NonceMismatch error

The test check that the false positive is handled, but doesn't check for the actual NonceMismatch from ZetaClient that should be retried and not reverted?

those are happening constantly as we have 2 relayers locally that are submitting txs, so that is implicitly tested out with solana outbounds working

Copy link
Contributor

@ws4charlie ws4charlie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@skosito skosito added this pull request to the merge queue Mar 28, 2025
Merged via the queue into develop with commit 1f09edc Mar 28, 2025
46 checks passed
@skosito skosito deleted the improve-fallback-tx-error-handling branch March 28, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SOLANA_TESTS Run make start-solana-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants