Add follower read support to TiDB#11347
Conversation
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
|
|
||
| const ( | ||
| // ReplicaReadLeader stands for 'read from leader'. | ||
| ReplicaReadLeader ReplicaReadType = iota |
There was a problem hiding this comment.
I prefer using 1 << iota here, so we can use bit operations to easily check the type is set.
There was a problem hiding this comment.
I though about this initially but doubt if it is rational to read from all kind of replicas. Read from different type of replicas may have different latency characteristics and pose different burden to leader, we can have more discussion about this.
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Codecov Report
@@ Coverage Diff @@
## master #11347 +/- ##
=========================================
Coverage 81.727% 81.727%
=========================================
Files 434 434
Lines 94878 94878
=========================================
Hits 77541 77541
Misses 11879 11879
Partials 5458 5458 |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
…_read Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
|
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
|
I have made some changes according to what we agreed in last discussion. Please take a look, thanks. @overvenus @5kbpers |
| nextIdx := (currentPeerIdx + 1) % len(rs.stores) | ||
| newRegionStore := rs.clone() | ||
| newRegionStore.workStoreIdx = int32(nextIdx) | ||
| newRegionStore.initFollowers() |
There was a problem hiding this comment.
When switchNextPeer is called, the workStoreIdx may point to a follower, so initFollowers does not do what it supposed to do.
There was a problem hiding this comment.
Isn't it trying to predict that the next peer is going to be the new leader? If that's the case workStoreIdx will actually become new leader to make non replica read work.
There was a problem hiding this comment.
When we failed a request, we change the workStoreIdx, then access the follower to wake up the hibernated region.
It's not always the leader.
If the workStoreIdx points to a follower, it may be the only valid follower, as we avoid to access it, all requests will be sent to the real leader.
It works but the code it's misleading.
There was a problem hiding this comment.
Ok, I see. Looks like simply removing initFollowers here will work, am I right?
| workStoreIdx int32 // point to current work peer in meta.Peers and work store in stores(same idx) | ||
| stores []*Store // stores in this region | ||
| storeFails []uint32 // snapshots of store's fail, need reload when `storeFails[curr] != stores[cur].fail` | ||
| followers []int32 // followers' index in this region |
There was a problem hiding this comment.
We can just use a leader index instead.
When we trying to find a follower, we can just skip the leader.
There was a problem hiding this comment.
It will have to check current store index every single time that way. Wouldn't it be even more cumbersome?
There was a problem hiding this comment.
The followers don't have more information, we can remove it to save memory.
And the computation cost to avoid workStoreIdx is very little.
There was a problem hiding this comment.
We are using seed passed in here. Which one are we going to use when it's pointing to leader? Whichever follower it chooses as a fallback, how to make sure it can really balance the load? In addition, arrays for a million regions only cost dozens of MB. Compare to other places, the footprint of this array is actually negligible.
There was a problem hiding this comment.
numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
followerIdx++
if followerIdx == numPeers {
followerIdx = 0
}
}There was a problem hiding this comment.
Brilliant, let's do it this way.
There was a problem hiding this comment.
Actually, followerIdx == numPeers will always be false.
This will do
numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
followerIdx++
}Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
coocood
left a comment
There was a problem hiding this comment.
If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.
So we will keep requesting the failed store for follower read.
Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.
For the first issue, it seems the old way of storing followers array makes better sense. When a follower is failed, it can be removed from valid followers array. Without storing such array, every single time trying to get a follower we need to check storeFails to skip it. About the next issue, we can skip switching peer if current failure is caused by follower read If I understand it correctly. |
|
@sunxiaoguang If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache. |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
I made some changes to try different follower if selected one had failed. Please take a look, thanks. |
|
@sunxiaoguang |
| } | ||
| }) | ||
|
|
||
| var replicaRead kv.ReplicaReadType |
There was a problem hiding this comment.
how about adding a metric for ReplicaReadType to let follower read time be observable in granfana
There was a problem hiding this comment.
Sure, let me add it.
There was a problem hiding this comment.
I was about to add a new label replica_type to those duration metrics. But I'm not sure if is the right way to do it. Any suggestions? Thanks.
There was a problem hiding this comment.
Looks like many metrics are populated with specific label statically in code. Adding a new label to it would make it harder to do so.
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
| // BeginWithStartTS begins a transaction with startTS. | ||
| func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) { | ||
| txn, err := newTikvTxnWithStartTS(s, startTS) | ||
| txn, err := newTikvTxnWithStartTS(s, startTS, s.nextReplicaReadSeed()) |
There was a problem hiding this comment.
this will make ReadSeed be changed for each txn in same session, it seem conflict to #11347 (comment)
There was a problem hiding this comment.
Since it seems to be complicated to make it consistent over txn and coprocessor. And the whole discussion would be more efficient over instant messaging, therefore we had some discussion on WeChat group and agreed that we can use different policy for coprocessor and txn. Sorry for not giving out the context and clue about it here.
|
|
||
| func (s *tikvStore) GetSnapshot(ver kv.Version) (kv.Snapshot, error) { | ||
| snapshot := newTiKVSnapshot(s, ver) | ||
| snapshot := newTiKVSnapshot(s, ver, s.nextReplicaReadSeed()) |
|
LGTM |
|
/run-all-tests |
cherry pick pingcap#11347 to 3.1 Fix code conflicts from 4.0 to 3.1. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise. Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.
Signed-off-by: Xiaoguang Sun sunxiaoguang@zhihu.com
What problem does this PR solve?
Add replica read support to TiDB. Users can use
tidb_replica_readsession variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise.What is changed and how it works?
Add a session scope variable
tidb_replica_readto specify if TiDB should read data from leader or follower.Check List
Tests
Will add later
Will add later
Code changes
Side effects
NA
Related changes