Skip to content

TxnKV Scan lose data when table has more than one regions  #600

@xieyi888

Description

@xieyi888

Bug Report

1. TxnKV Scan lose data when table has more than one regions

2. Minimal reproduce step (Required)

(1)prepare a TiDB Table which has two regions

for example:
total count of tha table is 200000:

mysql> select count(1) from tikv_client_test;
+----------+
| count(1) |
+----------+
|   200000 |
+----------+
1 row in set (0.08 sec)

table regions:

mysql> show table tikv_client_test  regions \G;
*************************** 1. row ***************************
           REGION_ID: 15097
           START_KEY: t_172_
             END_KEY: t_172_r_535012
           LEADER_ID: 15100
     LEADER_STORE_ID: 5
               PEERS: 15098, 15099, 15100
          SCATTERING: 0
       WRITTEN_BYTES: 0
          READ_BYTES: 0
APPROXIMATE_SIZE(MB): 1
    APPROXIMATE_KEYS: 0
*************************** 2. row ***************************
           REGION_ID: 2
           START_KEY: t_172_r_535012
             END_KEY:
           LEADER_ID: 66
     LEADER_STORE_ID: 4
               PEERS: 3, 66, 76
          SCATTERING: 0
       WRITTEN_BYTES: 63913588
          READ_BYTES: 0
APPROXIMATE_SIZE(MB): 78
    APPROXIMATE_KEYS: 51395
2 rows in set (0.01 sec)

(2)useing TxnKV to scan the table

code
notice: set endKey(The following code) larger than 535012(which is the END_KEY of REGION_ID: 15097 in this table), it will Stable repetition: data loss

public class TikvScanTemplateRepeat {
    public static void main(String[] args) throws Exception {
        TiConfiguration conf = TiConfiguration.createDefault("****");
        TiSession session = TiSession.create(conf);
        KVClient scanClient = session.createKVClient();
        long startTs = session.getTimestamp().getVersion();

        final String database = "****";
        final String tableName = "tikv_client_test";
        long tableId = session.getCatalog().getTable(database, tableName).getId();

        long startPos = 2L;//the minKey of the table
        ByteString startKey = RowKey.toRowKey(tableId, startPos).toByteString();
//when the endKey larger than 535012(which is the END_KEY of REGION_ID: 15097 in this table), it will Stable repetition
        ByteString endKey = RowKey.toRowKey(tableId, Long.MAX_VALUE).toByteString();

        int totalSize = 0;
        try {
            while (true) {
                final List<Kvrpcpb.KvPair> segment =
                        scanClient.scan(startKey, endKey, startTs);

                if (segment.isEmpty()) {
                    break;
                }
                System.out.println("scan segment size:" + segment.size());
                totalSize+=segment.size();
                startKey =
                        RowKey.toRawKey(segment.get(segment.size() - 1).getKey())
                                .next()
                                .toByteString();
            }
        } finally {
            scanClient.close();
            session.close();
        }
        System.out.println("scan total size: "+totalSize);
    }
}

result:
scan total size: 138082

total count of tha table is 200000. the however,the table scan only 138082 rows

3. What did you see instead (Required)

with debug , I find the reason is :
org.tikv.common.operation.iterator.ScanIterator
function: cacheLoadFails()

when scan REGION_ID: 15097, the currentCache is 10240(whic is control by the conf:tikv.grpc.scan_batch_size).

      if (currentCache.size() < limit) {
        startKey = curRegionEndKey;
        lastKey = Key.toRawKey(curRegionEndKey);
      } else if (currentCache.size() > limit) {
        throw new IndexOutOfBoundsException(
            "current cache size = "
                + currentCache.size()
                + ", larger than "
                + conf.getScanBatchSize());
      } else {
        // Start new scan from exact next key in current region
        lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
        startKey = lastKey.next().toByteString();
      }

the startKey would be set as END_KEY of REGION_ID: 15097(curRegionEndKey)
then it will scan the table from the new startKey(curRegionEndKey), which causes loss data(from currentCache.get(currentCache.size()-1) to END_KEY of REGION_ID: 15097)

image

the key source is :https://github.com/tikv/client-java/blob/v3.2.0/src/main/java/org/tikv/common/operation/iterator/ScanIterator.java#L94

4. What did you expect to see? (Required)

(1) Could you please teel me What is the intent of this design?or it's a bug?

if (currentCache.size() < limit) {
        startKey = curRegionEndKey;
        lastKey = Key.toRawKey(curRegionEndKey);
      } else if (currentCache.size() > limit) {
        throw new IndexOutOfBoundsException(
            "current cache size = "
                + currentCache.size()
                + ", larger than "
                + conf.getScanBatchSize());
      } else {
        // Start new scan from exact next key in current region
        lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
        startKey = lastKey.next().toByteString();
      }

(2)Maybe startKey should be set to(in this situation) :

      if (currentCache.size() < conf.getScanBatchSize()) {
        startKey = curRegionEndKey;
        lastKey = Key.toRawKey(curRegionEndKey);
      } else if (currentCache.size() > conf.getScanBatchSize()) {
        throw new IndexOutOfBoundsException(
            "current cache size = "
                + currentCache.size()
                + ", larger than "
                + conf.getScanBatchSize());
      } else {
        // Start new scan from exact next key in current region
        lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
        startKey = lastKey.next().toByteString();
      }

(3) if it's not a bug, is there some good way to scan all the data when the table has more than one region

5. What are your Java Client and TiKV versions? (Required)

  • Client Java:3.2.0
  • TiKV:5.1

I'm looking forward to your reply, thank you so much!

Metadata

Metadata

Assignees

Labels

type/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions