Skip to content

PDClient and RegionStoreClient are slow down massively in heavy workload. #639

@iosmanthus

Description

@iosmanthus

Bug Report

1. Describe the bug

Every request will create a RegionStoreClient by asking the RegionManager about the region info before it is sent to TiKV servers.

public synchronized RegionStoreClient build(
ByteString key, TiStoreType storeType, BackOffer backOffer) throws GrpcException {
Pair<TiRegion, TiStore> pair =
regionManager.getRegionStorePairByKey(key, storeType, backOffer);
return build(pair.first, pair.second, storeType);
}

However, this method is declared synchronized, which might block the entire client every time it launches a request. It's safe to remove the synchronized keyword since the underlying code of RegionManager is already synced.

Another code path that suffers from the synchronized is:

public synchronized void updateLeaderOrForwardFollower(BackOffer backOffer) {

This code is executed when there is a request error or response error while interacting with PD servers. Under heavy retry, this code path might be slowed down by the lock. It is also safe to remove the synchronized keyword since the PD server is not required to update to the latest one, we might use AtomicReference to wrap and update the pdClientWrapper. Even if we get a stale leader in this code path, there is a thread that constantly updates the PD leader every 10 seconds.

2. Minimal reproduce step (Required)

  1. Launch a workload with 32 threads.
  2. Kill all the PD servers and TiKV servers
  3. You will find two types of slow logs that are blocked in the code paths above.

3. What are your Java Client and TiKV versions? (Required)

  • Client Java: master
  • TiKV: any version

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions