Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.tikv</groupId>
<artifactId>tikv-client-java</artifactId>
<version>3.3.0-SNAPSHOT</version>
<version>3.3.4-SNAPSHOT</version>
<packaging>jar</packaging>
<name>TiKV Java Client</name>
<description>A Java Client for TiKV</description>
Expand Down
17 changes: 15 additions & 2 deletions src/main/java/org/tikv/common/region/RegionManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -228,12 +228,25 @@ public Pair<TiRegion, TiStore> getRegionStorePairByKey(

TiStore store = null;
if (storeType == TiStoreType.TiKV) {
Peer peer = region.getCurrentReplica();
store = getStoreById(peer.getStoreId(), backOffer);
// check from the first replica in case it recovers
List<Peer> replicaList = region.getReplicaList();
for (int i = 0; i < replicaList.size(); i++) {
Peer peer = replicaList.get(i);
store = getStoreById(peer.getStoreId(), backOffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make replicas not balance ? As we always try the replica 0 at first.

How about start from getCurrentReplica(), then if the current replica is not available, get next one by getNextReplica and retry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two problems if we getNextReplica.

  1. code will trap into endless loop when there is no available store.
  2. assume we set follower,leader, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, the original code always select the first one. also has the issue of balance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. code will trap into endless loop when there is no available store.

It is not difficult to be solved by loop no more than replicaList.size() times.

  1. assume we set follower,leader, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader

Why we should not "miss the recover follower and request the leader" ? If we don't want to read on leader, we can set the replica selector as "followers only". Otherwise, if "leader and follower" is used, it should be OK to read on leader.

Besides, the original code always select the first one. also has the issue of balance

I also feel strange that why getNextReplica is unused, and except which replicaIdx is changed nowhere. Any idea @iosmanthus ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customer wants a "fallback" strategy. They want to read followers first to isolate dump traffic. But if all the followers are down, fallback to leaders as a cover.

So if "follower and leader" is specified, it means use followers first, and then leaders. The order matters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stale read enabled ? If not, when all followers are down, the leader is also unavailable (neither readable nor writable as well, see https://www.pingcap.com/blog/lease-read/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I think we take some terms wrong, it's not follower, it's learner, who will not attend voting. And yes, this is for stale read. The normal TiKV nodes will still take leader and follower and handle online traffic. And there will be extra learner nodes dedicated for offline traffic. When learners are down, goes for the normal nodes.

if (store.isReachable()) {
// update replica's index
region.setReplicaIdx(i);
break;
}
logger.info("Store {} is unreachable, try to get the next replica", peer.getStoreId());
}
} else {
List<TiStore> tiflashStores = new ArrayList<>();
for (Peer peer : region.getLearnerList()) {
TiStore s = getStoreById(peer.getStoreId(), backOffer);
if (!s.isReachable()) {
continue;
}
for (Metapb.StoreLabel label : s.getStore().getLabelsList()) {
if (label.getKey().equals(storeType.getLabelKey())
&& label.getValue().equals(storeType.getLabelValue())) {
Expand Down
8 changes: 8 additions & 0 deletions src/main/java/org/tikv/common/region/TiRegion.java
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,14 @@ public Peer getNextReplica() {
return getCurrentReplica();
}

public void setReplicaIdx(int idx) {
replicaIdx = idx;
}

public List<Peer> getReplicaList() {
return replicaList;
}

private boolean isLeader(Peer peer) {
return getLeader().equals(peer);
}
Expand Down