Skip to content

Conversation

@morningman
Copy link
Contributor

backport #24491

…use it at sql-cache. (apache#24491)

Now FE does not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache.
The cache will be missed if any partition has changed at hive side.

Use System.currentTimeMillis() but not the event time of hms event because we would better keep the same measurement with the schemaUpdateTime of external table. Add this value to ExternalObjectLog and let slave FEs replay it because it is better to keep the same value with all FEs, so the sql-cache can be hit by the querys through different FEs.
@xiaokang
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.7 seconds
stream load tsv: 555 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 30.0 seconds inserted 10000000 Rows, about 333K ops/s
storage size: 17164460399 Bytes

@xiaokang xiaokang merged commit 51eee3e into apache:branch-2.0 Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants