Fix Router race condition and use default broker service name for invalid priority#5050
Conversation
| tierConfig.getTierToBrokerMap().values(), | ||
| tierConfig.getDefaultBrokerServiceName() | ||
| ) | ||
| tierConfig.getDefaultBrokerServiceName() |
There was a problem hiding this comment.
Logging here might be helpful.
There was a problem hiding this comment.
TieredBrokerSelectorStrategy doesn't have a logger. Should I add a logger to this class?
There was a problem hiding this comment.
is the log for queries which don't fall in min-max priority range configured ? that log might end up being too noisy. in many clusters, cluster operators and user teams are different and users might send such queries.
| @@ -50,18 +49,10 @@ public Optional<String> getBrokerServiceName(TieredBrokerConfig tierConfig, Quer | |||
|
|
|||
| if (priority < minPriority) { | |||
There was a problem hiding this comment.
Two branches of this if block are the same now.
There was a problem hiding this comment.
combined the two branches into one
| ) | ||
| tierConfig.getDefaultBrokerServiceName() | ||
| ); | ||
| } else if (priority >= maxPriority) { |
There was a problem hiding this comment.
Not sure if it should be >= or >
There was a problem hiding this comment.
hmm, I guess it should be >?
| { | ||
| ImmutableList<Server> currNodes = nodes; | ||
|
|
||
| int index = roundRobinIndex.getAndIncrement(); |
There was a problem hiding this comment.
It will produce negative results when index overflows, suggested to replace with a method which wraps around currNodes.size() explicitly, with use of compareAndSet(), for sanity
There was a problem hiding this comment.
Or allow the overflow, but make sure the chosen index is positive (use absolute value).
|
added 0.11.0 milestone due to race condition fix. |
| } | ||
|
|
||
| return currNodes.get(roundRobinIndex++); | ||
| return currNodes.get(index); |
There was a problem hiding this comment.
Now this index could be incorrect, because it was set during a pervious call of this method.
There was a problem hiding this comment.
I added index %= currNodes.size() back to fix this, is there a better way to do it?
There was a problem hiding this comment.
I suggest to return to this version: ab61f83 and just use nextIndex instead of index below in the code, it's not a problem if indexing will starts from 1 instead of 0. Or the initial value in roundRobinIndex could be set -1.
There was a problem hiding this comment.
Ok, I wasn't really sure about using nextIndex (for the name's sake) but I will revert the change to that and set the initial value of roundRobinIndex to -1.
|
|
||
| while (true) { | ||
| int nextIndex = index + 1; | ||
| if (nextIndex < 0) nextIndex %= currNodes.size(); |
There was a problem hiding this comment.
It won't help, the result of mod operation when the first argument is negative is zero or negative
| int index = roundRobinIndex.get(); | ||
| int nextIndex = index + 1; | ||
|
|
||
| while (true) { |
There was a problem hiding this comment.
Could you please extract this block as a method, it will allow to not repeat
int index = roundRobinIndex.get();
int nextIndex = index + 1;block
| int nextIndex = index + 1; | ||
|
|
||
| while (true) { | ||
| if (nextIndex < 0 || nextIndex >= currNodes.size()) { |
There was a problem hiding this comment.
ok, will remove that
|
👍 |
|
CI failed because http://www.us.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz is no longer available. #5060 includes a fix. |
…alid priority (apache#5050) * use default brokerServiceName when priority is not valid * use AtomicInteger for NodesHolder.roundRobinIndex * revert inspectionProfiles change * adjust TieredBrokerHostSelectorTest * combine if statements and ensure index does not become negative * set next index with mod if overflows * fix codestyle * use nextIndex * extract the while loop to a method
…alid priority (#5050) (#5066) * use default brokerServiceName when priority is not valid * use AtomicInteger for NodesHolder.roundRobinIndex * revert inspectionProfiles change * adjust TieredBrokerHostSelectorTest * combine if statements and ensure index does not become negative * set next index with mod if overflows * fix codestyle * use nextIndex * extract the while loop to a method
…alid priority (apache#5050) (apache#5066) * use default brokerServiceName when priority is not valid * use AtomicInteger for NodesHolder.roundRobinIndex * revert inspectionProfiles change * adjust TieredBrokerHostSelectorTest * combine if statements and ensure index does not become negative * set next index with mod if overflows * fix codestyle * use nextIndex * extract the while loop to a method
This PR suggests two changes:
TieredBrokerHostSelector, specifically inNodeHolder.pick()where multiple threads can accessroundRobinIndex. For example, two threads can simultaneously accessroundRobinIndexwhenroundRobinIndex == currNodes.size() - 1, but then one of the threads will get index out of bounds error when callingcurrNodes.get(roundRobinIndex++)because the other thread had increased the value ofroundRobinIndexby 1 already. I have encountered this situation in production, for example:DefaultBrokerServiceNameinstead of the last value oftierToBrokerMapwhen the priority is out of range, as I understand that the default value is the one that's used whenever there's no specific broker service name for the given query. What is the reason behind using the last entry oftierToBrokerMapwhen the priority is out of range?