Skip to content

BE do_tablet_meta_checkpoint retain _meta_lock for a long time #2419

@liutang123

Description

@liutang123

Describe the bug
After starting stream load for some time, the transactions’ publish version will become slower and slower.

To Reproduce
We have a cluster with 3 FE and 8 BE, version: 0.11, OS: CentOS 7
FE:CPU cores: 16; Mem: 32
BE: CPU cores: 40; Mem: 128; Disk: 3.6T * 12

This problem occurs when doing large data stream load

  • More than 20M messages per minutes.
  • Use 20 clients send data to Doris.
  • Each client Sends 250K messages every batch.

Expected behavior
After starting stream load for some time, the transactions’ publish version will become very slow.
Through add some logs, I find that Tablet::do_tablet_meta_checkpoint will retain the _meta_lock for a long time.
I1210 17:47:23.343849 191992 tablet.cpp:1270] 26239 do_tablet_meta_checkpoint retain _meta_lock cost: 1570 s, check_rowset_meta cost: 1569 s, remove cost: 0 s, all_rs_metas: 39973

And through perf, I find that rocksdb cost much time in Get:
image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions