Skip to content

StackExchange.Redis GetStatus() returns multiple masters on failover in Sentinel cluster #1910

@thestonehead

Description

@thestonehead

Hi. I didn't see a similar question here or anywhere.
I was trying to see why it took a long time ( ~ 30s ) for an application to start working with Redis successfully again after a failover is triggered (debug sleep 30). Sentinel and Redis instances show the correct replication info (a new master) within 5-6 seconds (as expected with failover timeout set to 5s).
However, I created a simple console application which reads a string from Redis in a loop just to see what's going on and this confuses me and I'd be grateful if someone explained it to me.
Setup is 3 Redis instances and 5 Sentinel instances. I have changed IPs and hostnames so that security from my company doesn't haunt me.

// Before failover

13:40:44.5727: 10.100.100.100:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5727: 10.100.100.100:6379: int ops=57, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5727: 10.100.100.100:6379: Circular op-count snapshot; int: 0+19+6+7+5+6+7+6+1=57 (5,70 ops/s; spans 10s); sub:
13:40:44.5727: 10.100.100.300:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5727: 10.100.100.300:6379: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5728: 10.100.100.300:6379: Circular op-count snapshot; int: 0+10=10 (1,00 ops/s; spans 10s); sub:
13:40:44.5728: 10.100.100.200:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5728: 10.100.100.200:6379: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5728: 10.100.100.200:6379: Circular op-count snapshot; int: 0+10=10 (1,00 ops/s; spans 10s); sub:
13:40:44.5728: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: 0s ago

StackExchange.Redis is conected to three Redis instances and the first one is master. All good.

// Failover starts happening

SocketClosed (ReadEndOfStream, last-recv: 0) on 10.100.100.200:6379/Interactive, Idle/MarkProcessed, last: GET, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 13s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.2.88.56325, SocketClosed

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5797ms elapsed, timeout is 5000ms), command=GET, next: GET RedisFailoverTest.TestKey, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.100.100.100:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: LocalComputerHostname, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767), v: 2.2.88.56325 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

RedisConnectionRestored: 10.100.100.200:6379, , None

SocketClosed (ReadEndOfStream, last-recv: 0) on 10.100.100.300:6379/Interactive, Idle/MarkProcessed, last: PING, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.2.88.56325, SocketClosed

RedisConnectionRestored: 10.100.100.300:6379, , None

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5813ms elapsed, timeout is 5000ms), command=GET, next: GET RedisFailoverTest.TestKey, inst: 0, qu: 0, qs: 4, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.100.100.100:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: LocalComputerHostname, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767), v: 2.2.88.56325 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

This is pretty much what one expects, StackExchange.Redis reconnects to live instances and the commands that were in the queue fail with timeout. Still all good.

// After failover

13:41:02.7888: 10.100.100.100:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.100:6379: int ops=62, qu=0, qs=5, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.100:6379: Circular op-count snapshot; int: 60+1+1=62 (0,20 ops/s; spans 10s); sub:
13:41:02.7888: 10.100.100.300:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.300:6379: int ops=22, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.300:6379: Circular op-count snapshot; int: 21+1=22 (0,10 ops/s; spans 10s); sub:
13:41:02.7888: 10.100.100.200:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.200:6379: int ops=25, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.200:6379: Circular op-count snapshot; int: 20+1+2+2=25 (0,50 ops/s; spans 10s); sub:
13:41:02.7888: Sync timeouts: 0; async timeouts: 5; fire and forget: 0; last heartbeat: 0s ago

This puzzles me. After failover, both the new master and the old master are marked as "master" and only the one slave that remained a slave is marked as "replica".
Is this expected or a bug? How does it work? Am I doing something wrong?

After ~20s command to Redis fails on the "wrong" master, it gets reconnected and from that point forward it's marked as replica.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions