Skip to content

Conversation

@ratkokostov7
Copy link

@ratkokostov7 ratkokostov7 commented Jul 24, 2025

During AWS RabbitMQ maintenance windows, applications experienced connection recovery failures with errors like:

  • AlreadyClosedException: connection is already closed
  • TopologyRecoveryException: Caught an exception while recovering channel
  • Duplicate consumers created on queues after recovery
  • Service disruptions during planned maintenance

Root Cause: Manual recovery logic in shutdown callbacks was racing with RabbitMQ's built-in auto-recovery mechanism, causing conflicts when both tried to recreate channels and consumers simultaneously.

How I tested the changes:
Environment: 3-node RabbitMQ cluster with HAProxy load balancer simulating AWS MQ Multi-AZ maintenance scenarios using Docker containers.
Test Scenario: Simulated AZ failures with connection termination using custom maintenance messages (CONNECTION_FORCED - Node was put into maintenance mode) and monitored consumer recovery behavior during planned maintenance windows.
One service using the extension on 2 different ports (8080 and 8081) to simulate Kubernetes pods behavior, allowing us to test concurrent connection recovery scenarios across multiple application instances.

Results:

Before Changes:

  • Duplicate consumers created on queues
  • Inconsistent queue state during recovery
  • Service disruption and downtime during node transitions
  • AlreadyClosedException and recovery errors in logs
  • Application instability during maintenance windows

After Changes:

  • Consumer per queue maintained consistently
  • Queues remain stable throughout maintenance
  • Clean automatic recovery without errors
  • Seamless maintenance windows with no application impact

@ctomc ctomc merged commit a2fad06 into iris-events:main Jul 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants