-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Enhancement](sql-cache) Add partition update time for hms table and use it at sql-cache. #24491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement](sql-cache) Add partition update time for hms table and use it at sql-cache. #24491
Conversation
|
run buildall |
1 similar comment
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
ced8fb8 to
3aa5afb
Compare
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
3aa5afb to
8420019
Compare
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
xinyiZzz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run buildall |
…use it at sql-cache.
…use it at sql-cache.
…use it at sql-cache.
…use it at sql-cache.
0ad7aae to
72885f3
Compare
|
run buildall |
…use it at sql-cache.
|
run buildall |
…use it at sql-cache.
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
xinyiZzz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…use it at sql-cache. (apache#24491) Now FE does not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache. The cache will be missed if any partition has changed at hive side. Use System.currentTimeMillis() but not the event time of hms event because we would better keep the same measurement with the schemaUpdateTime of external table. Add this value to ExternalObjectLog and let slave FEs replay it because it is better to keep the same value with all FEs, so the sql-cache can be hit by the querys through different FEs.
…use it at sql-cache. (#24491) (#25382) Now FE does not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache. The cache will be missed if any partition has changed at hive side. Use System.currentTimeMillis() but not the event time of hms event because we would better keep the same measurement with the schemaUpdateTime of external table. Add this value to ExternalObjectLog and let slave FEs replay it because it is better to keep the same value with all FEs, so the sql-cache can be hit by the querys through different FEs. Co-authored-by: Xiangyu Wang <dut.xiangyu@gmail.com>
…use it at sql-cache. (apache#24491) Now FE does not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache. The cache will be missed if any partition has changed at hive side. Use System.currentTimeMillis() but not the event time of hms event because we would better keep the same measurement with the schemaUpdateTime of external table. Add this value to ExternalObjectLog and let slave FEs replay it because it is better to keep the same value with all FEs, so the sql-cache can be hit by the querys through different FEs.
Proposed changes
Relevant pr #23391 #21873
This pr mainly has these changes:
rename
ExternalTable.getLatestUpdateTime()toExternalTable.getUpdateTime(), because there is already a method calledgetUpdateTime()existed atTableIf, and the meaning is the same, better to merge these two methods to avoid ambiguity.rename the field
ExternalTable.lastestUpdateTimetoschemaUpdateTime, and the default impl ofExternalTable.getUpdateTime()is just get the value ofschemaUpdateTime, bacauseschemaUpdateTimeis the timestamp after scheme loading of external tables.add a field named
partitionUpdateTimeatHMSExternalTable, updatepartitionUpdateTimewhen processing hms partition events, overridegetUpdateTime()ofHMSExternalTable, return the max value betweenschemaUpdateTimeandpartitionUpdateTime. ThepartitionUpdateTimewill be refreshed when (1. add partitions 2. delete partitions 3. alter partitions) with hms event listener enabled.Now
FEdoes not record the update time of hms tbl's partitons, so the sql cache may be hit even the hive table's partitions have changed. This pr add a field to record the partition update time, and use it when enable sql-cache.The cache will be missed if any partition has changed at hive side.
Use
System.currentTimeMillis()but not the event time of hms event because we would better keep the same measurement with theschemaUpdateTimeof external table. Add this value toExternalObjectLogand let slaveFEs replay it because it is better to keep the same value with allFEs, so the sql-cache can be hit by the querys through differentFEs.I have test with following steps:
lh_test_pis a hive partition-table):The second time will hit the sql cache:

The last update time of this hive table:

Wait some time for processing hms events, execute

select count(0) from hive_safe_lycc.test.lh_test_p;again and the cache will be missed:And the last update time has changed too:

Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...