-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
deprecated/questionQuestions should happened in GitHub DiscussionsQuestions should happened in GitHub Discussions
Description
@merlimat we had to replace our current local ZK leader in a cluster and this seems to cause a LOT of issues in the cluster.
Brokers seem to have shut down all at the same time, leaving the cluster unable to handle traffic until restarted.
Also, a lot of consumers seem to have been reset to a previous moment in time, generating a huge amount of backlog.
We see these logs before the broker apparently shut down:
March 22nd 2017, 15:31:43.156 2017-03-22 18:31:43,155 - INFO - [main-SendThread(ip-10-64-102-223.ec2.internal:2181):ClientCnxn$SendThread@1158] - Unable to read additional data from server sessionid 0x35a384d476e093b, likely server has closed socket, closing socket connection and attempting reconnect
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - WARN - [main-EventThread:LeaderElectionService$1@111] - Got something wrong on watch: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.258 2017-03-22 18:31:43,258 - WARN - [main-EventThread:LeaderElectionService$1@92] - Type of the event is [None] and path is [null]
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,258 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:1 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - [main-EventThread:ZooKeeperDataCache@131] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - [main-EventThread:ZooKeeperSessionWatcher@87] - Received zookeeper notification, eventType=None, eventState=Disconnected
March 22nd 2017, 15:31:43.259 2017-03-22 18:31:43,259 - INFO - [main-EventThread:ZooKeeperCache@346] - [State:CONNECTED Timeout:30000 sessionid:0x35a384d476e093b local:null remoteserver:null lastZxid:17254516425 xid:1622313 sent:1622313 recv:1791041 queuedpkts:2 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:Disconnected type:None path:null
March 22nd 2017, 15:31:43.587 2017-03-22 18:31:43,586 - INFO - [main-SendThread(ip-10-64-102-117.ec2.internal:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server ip-10-64-102-117.ec2.internal/10.64.102.117:2181. Will not attempt to authenticate using SASL (unknown error)
A couple of questions:
1 - Why do brokers shutdown on ZK Leader disconnection?
2 - Why could this affect the backlog of consumers? we're running with a branch that persists individualDeletedMessages.
Metadata
Metadata
Assignees
Labels
deprecated/questionQuestions should happened in GitHub DiscussionsQuestions should happened in GitHub Discussions