-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Bug Report
1. TxnKV Scan lose data when table has more than one regions
2. Minimal reproduce step (Required)
(1)prepare a TiDB Table which has two regions
for example:
total count of tha table is 200000:
mysql> select count(1) from tikv_client_test;
+----------+
| count(1) |
+----------+
| 200000 |
+----------+
1 row in set (0.08 sec)table regions:
mysql> show table tikv_client_test regions \G;
*************************** 1. row ***************************
REGION_ID: 15097
START_KEY: t_172_
END_KEY: t_172_r_535012
LEADER_ID: 15100
LEADER_STORE_ID: 5
PEERS: 15098, 15099, 15100
SCATTERING: 0
WRITTEN_BYTES: 0
READ_BYTES: 0
APPROXIMATE_SIZE(MB): 1
APPROXIMATE_KEYS: 0
*************************** 2. row ***************************
REGION_ID: 2
START_KEY: t_172_r_535012
END_KEY:
LEADER_ID: 66
LEADER_STORE_ID: 4
PEERS: 3, 66, 76
SCATTERING: 0
WRITTEN_BYTES: 63913588
READ_BYTES: 0
APPROXIMATE_SIZE(MB): 78
APPROXIMATE_KEYS: 51395
2 rows in set (0.01 sec)(2)useing TxnKV to scan the table
code
notice: set endKey(The following code) larger than 535012(which is the END_KEY of REGION_ID: 15097 in this table), it will Stable repetition: data loss
public class TikvScanTemplateRepeat {
public static void main(String[] args) throws Exception {
TiConfiguration conf = TiConfiguration.createDefault("****");
TiSession session = TiSession.create(conf);
KVClient scanClient = session.createKVClient();
long startTs = session.getTimestamp().getVersion();
final String database = "****";
final String tableName = "tikv_client_test";
long tableId = session.getCatalog().getTable(database, tableName).getId();
long startPos = 2L;//the minKey of the table
ByteString startKey = RowKey.toRowKey(tableId, startPos).toByteString();
//when the endKey larger than 535012(which is the END_KEY of REGION_ID: 15097 in this table), it will Stable repetition
ByteString endKey = RowKey.toRowKey(tableId, Long.MAX_VALUE).toByteString();
int totalSize = 0;
try {
while (true) {
final List<Kvrpcpb.KvPair> segment =
scanClient.scan(startKey, endKey, startTs);
if (segment.isEmpty()) {
break;
}
System.out.println("scan segment size:" + segment.size());
totalSize+=segment.size();
startKey =
RowKey.toRawKey(segment.get(segment.size() - 1).getKey())
.next()
.toByteString();
}
} finally {
scanClient.close();
session.close();
}
System.out.println("scan total size: "+totalSize);
}
}result:
scan total size: 138082
total count of tha table is 200000. the however,the table scan only 138082 rows
3. What did you see instead (Required)
with debug , I find the reason is :
org.tikv.common.operation.iterator.ScanIterator
function: cacheLoadFails()
when scan REGION_ID: 15097, the currentCache is 10240(whic is control by the conf:tikv.grpc.scan_batch_size).
if (currentCache.size() < limit) {
startKey = curRegionEndKey;
lastKey = Key.toRawKey(curRegionEndKey);
} else if (currentCache.size() > limit) {
throw new IndexOutOfBoundsException(
"current cache size = "
+ currentCache.size()
+ ", larger than "
+ conf.getScanBatchSize());
} else {
// Start new scan from exact next key in current region
lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
startKey = lastKey.next().toByteString();
}
the startKey would be set as END_KEY of REGION_ID: 15097(curRegionEndKey)
then it will scan the table from the new startKey(curRegionEndKey), which causes loss data(from currentCache.get(currentCache.size()-1) to END_KEY of REGION_ID: 15097)
the key source is :https://github.com/tikv/client-java/blob/v3.2.0/src/main/java/org/tikv/common/operation/iterator/ScanIterator.java#L94
4. What did you expect to see? (Required)
(1) Could you please teel me What is the intent of this design?or it's a bug?
if (currentCache.size() < limit) {
startKey = curRegionEndKey;
lastKey = Key.toRawKey(curRegionEndKey);
} else if (currentCache.size() > limit) {
throw new IndexOutOfBoundsException(
"current cache size = "
+ currentCache.size()
+ ", larger than "
+ conf.getScanBatchSize());
} else {
// Start new scan from exact next key in current region
lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
startKey = lastKey.next().toByteString();
}
(2)Maybe startKey should be set to(in this situation) :
if (currentCache.size() < conf.getScanBatchSize()) {
startKey = curRegionEndKey;
lastKey = Key.toRawKey(curRegionEndKey);
} else if (currentCache.size() > conf.getScanBatchSize()) {
throw new IndexOutOfBoundsException(
"current cache size = "
+ currentCache.size()
+ ", larger than "
+ conf.getScanBatchSize());
} else {
// Start new scan from exact next key in current region
lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
startKey = lastKey.next().toByteString();
}(3) if it's not a bug, is there some good way to scan all the data when the table has more than one region
5. What are your Java Client and TiKV versions? (Required)
- Client Java:3.2.0
- TiKV:5.1
I'm looking forward to your reply, thank you so much!
