Hi. I didn't see a similar question here or anywhere.
I was trying to see why it took a long time ( ~ 30s ) for an application to start working with Redis successfully again after a failover is triggered (debug sleep 30). Sentinel and Redis instances show the correct replication info (a new master) within 5-6 seconds (as expected with failover timeout set to 5s).
However, I created a simple console application which reads a string from Redis in a loop just to see what's going on and this confuses me and I'd be grateful if someone explained it to me.
Setup is 3 Redis instances and 5 Sentinel instances. I have changed IPs and hostnames so that security from my company doesn't haunt me.
// Before failover
13:40:44.5727: 10.100.100.100:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5727: 10.100.100.100:6379: int ops=57, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5727: 10.100.100.100:6379: Circular op-count snapshot; int: 0+19+6+7+5+6+7+6+1=57 (5,70 ops/s; spans 10s); sub:
13:40:44.5727: 10.100.100.300:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5727: 10.100.100.300:6379: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5728: 10.100.100.300:6379: Circular op-count snapshot; int: 0+10=10 (1,00 ops/s; spans 10s); sub:
13:40:44.5728: 10.100.100.200:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:40:44.5728: 10.100.100.200:6379: int ops=10, qu=0, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:40:44.5728: 10.100.100.200:6379: Circular op-count snapshot; int: 0+10=10 (1,00 ops/s; spans 10s); sub:
13:40:44.5728: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: 0s ago
StackExchange.Redis is conected to three Redis instances and the first one is master. All good.
// Failover starts happening
SocketClosed (ReadEndOfStream, last-recv: 0) on 10.100.100.200:6379/Interactive, Idle/MarkProcessed, last: GET, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 13s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.2.88.56325, SocketClosed
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5797ms elapsed, timeout is 5000ms), command=GET, next: GET RedisFailoverTest.TestKey, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.100.100.100:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: LocalComputerHostname, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767), v: 2.2.88.56325 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
RedisConnectionRestored: 10.100.100.200:6379, , None
SocketClosed (ReadEndOfStream, last-recv: 0) on 10.100.100.300:6379/Interactive, Idle/MarkProcessed, last: PING, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.2.88.56325, SocketClosed
RedisConnectionRestored: 10.100.100.300:6379, , None
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5813ms elapsed, timeout is 5000ms), command=GET, next: GET RedisFailoverTest.TestKey, inst: 0, qu: 0, qs: 4, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.100.100.100:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: LocalComputerHostname, IOCP: (Busy=0,Free=1000,Min=12,Max=1000), WORKER: (Busy=3,Free=32764,Min=12,Max=32767), v: 2.2.88.56325 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
This is pretty much what one expects, StackExchange.Redis reconnects to live instances and the commands that were in the queue fail with timeout. Still all good.
// After failover
13:41:02.7888: 10.100.100.100:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.100:6379: int ops=62, qu=0, qs=5, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.100:6379: Circular op-count snapshot; int: 60+1+1=62 (0,20 ops/s; spans 10s); sub:
13:41:02.7888: 10.100.100.300:6379: Standalone v6.2.5, replica; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.300:6379: int ops=22, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.300:6379: Circular op-count snapshot; int: 21+1=22 (0,10 ops/s; spans 10s); sub:
13:41:02.7888: 10.100.100.200:6379: Standalone v6.2.5, master; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: n/a
13:41:02.7888: 10.100.100.200:6379: int ops=25, qu=0, qs=0, qc=0, wr=0, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0
13:41:02.7888: 10.100.100.200:6379: Circular op-count snapshot; int: 20+1+2+2=25 (0,50 ops/s; spans 10s); sub:
13:41:02.7888: Sync timeouts: 0; async timeouts: 5; fire and forget: 0; last heartbeat: 0s ago
This puzzles me. After failover, both the new master and the old master are marked as "master" and only the one slave that remained a slave is marked as "replica".
Is this expected or a bug? How does it work? Am I doing something wrong?
After ~20s command to Redis fails on the "wrong" master, it gets reconnected and from that point forward it's marked as replica.
Hi. I didn't see a similar question here or anywhere.
I was trying to see why it took a long time ( ~ 30s ) for an application to start working with Redis successfully again after a failover is triggered (debug sleep 30). Sentinel and Redis instances show the correct replication info (a new master) within 5-6 seconds (as expected with failover timeout set to 5s).
However, I created a simple console application which reads a string from Redis in a loop just to see what's going on and this confuses me and I'd be grateful if someone explained it to me.
Setup is 3 Redis instances and 5 Sentinel instances. I have changed IPs and hostnames so that security from my company doesn't haunt me.
// Before failover
StackExchange.Redis is conected to three Redis instances and the first one is master. All good.
// Failover starts happening
This is pretty much what one expects, StackExchange.Redis reconnects to live instances and the commands that were in the queue fail with timeout. Still all good.
// After failover
This puzzles me. After failover, both the new master and the old master are marked as "master" and only the one slave that remained a slave is marked as "replica".
Is this expected or a bug? How does it work? Am I doing something wrong?
After ~20s command to Redis fails on the "wrong" master, it gets reconnected and from that point forward it's marked as replica.