Skip to content

Enhanced MPC Cluster Resilience and Error Handling#91

Merged
anhthii merged 8 commits intofystack:devfrom
nann-cheng:dev
Aug 21, 2025
Merged

Enhanced MPC Cluster Resilience and Error Handling#91
anhthii merged 8 commits intofystack:devfrom
nann-cheng:dev

Conversation

@nann-cheng
Copy link
Collaborator

@nann-cheng nann-cheng commented Aug 18, 2025

Enhanced MPC Cluster Resilience and Error Handling

Summary

This PR introduces significant improvements to the MPC (Multi-Party Computation) cluster's resilience and error handling capabilities. The changes focus on making the system more robust when dealing with node disconnections, rejoin scenarios, and ensuring proper error handling during key generation and signing operations.

Key Changes

ECDH Service Refactoring

  • Refactored ECDH service to become resilient to node disconnect and rejoin scenarios
  • Improved key exchange session management with better cleanup and state handling
  • Enhanced registry functionality with more robust node management

Enhanced Error Handling

  • Added cluster readiness validation - now returns appropriate errors when cluster is not ready during key generation
  • Implemented majority node validation for signing operations with proper error handling when insufficient nodes are available
  • Added context cancellation checks to prevent operations on cancelled contexts

Improved Robustness

  • Fixed rejoining ECDH bugs with proper cache key cleaning
  • Enhanced node identity management with better state tracking
  • Streamlined main application logic by moving complexity to appropriate service layers

Files Modified

  • cmd/mpcium/main.go - Simplified main application logic
  • pkg/event/types.go - Added new error types for better error classification
  • pkg/eventconsumer/keygen_consumer.go - Enhanced key generation with cluster readiness checks
  • pkg/eventconsumer/sign_consumer.go - Improved signing with majority validation and error handling
  • pkg/identity/identity.go - Better identity state management
  • pkg/mpc/key_exchange_session.go - Refactored for improved session handling
  • pkg/mpc/node.go - Simplified node management logic
  • pkg/mpc/registry.go - Significantly enhanced registry with better node lifecycle management

Impact

  • Improved system stability during node disconnection/reconnection scenarios
  • Better error reporting for cluster state issues
  • Reduced likelihood of hanging operations through proper context handling
  • Enhanced debugging capabilities with more descriptive error messages

Breaking Changes

None - all changes are backward compatible improvements to existing functionality.

Enhanced MPC Cluster Resilience and Error Handling
@anhthii anhthii changed the title fix rejoining ecdh bug and add ecdh cache key cleaning Enhanced MPC Cluster Resilience and Error Handling Aug 21, 2025
@anhthii anhthii merged commit acec1fa into fystack:dev Aug 21, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments