-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](hive) add catalog level partition cache property #50724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34618 ms |
TPC-DS: Total hot run time: 189623 ms |
ClickBench: Total hot run time: 28.79 s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for configuring the hive partition cache at the catalog level via a new property ("partition.cache.ttl-second") and updates existing cache-related configurations. Key changes include:
- Removing a drop command in one of the Hive test suites.
- Introducing new tests and modifying existing ones for both file meta cache and partition cache.
- Updating cache initialization logic and configuration validations in the FE code.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/external_table_p0/hive/test_hive_star_qualifier.groovy | Commenting out a drop catalog command in the test suite. |
| regression-test/suites/external_table_p0/hive/test_hive_meta_cache.groovy | Adding new tests for invalid TTL values and proper caching behavior. |
| regression-test/suites/external_table_p0/export/hive_read/orc/test_hive_read_orc.groovy | Changing test suite tags to include additional external docker tags. |
| regression-test/data/external_table_p0/hive/test_hive_meta_cache.out | Updating generated expected output for the meta cache tests. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java | Changing method access modifiers and updating cache initialization logic. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSExternalCatalog.java | Introducing new constants and validations for the partition cache TTL property. |
| fe/fe-common/src/main/java/org/apache/doris/common/Config.java | Updating configuration descriptions to include an additional table type. |
Comments suppressed due to low confidence (2)
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:134
- The access level of init() has been changed from private to public. Please ensure that this change is intended and document its external usage if applicable.
public void init() {
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:173
- The method setNewFileCache() has been changed from public to private; ensure that this does not affect any intended external calls or testing, and update the documentation if necessary.
private void setNewFileCache() {
|
run buildall |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 33648 ms |
TPC-DS: Total hot run time: 191154 ms |
ClickBench: Total hot run time: 29.15 s |
|
PR approved by at least one committer and no changes requested. |
### What problem does this PR solve? Problem Summary: Support enable or disable hive partition cache at Catalog level for hive catalog. Previously, if user want to disable the hive partition cache, they can only set the `max_hive_partition_table_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `partition.cache.ttl-second`. If set to 0, the hive partition cache will be disabled, so if new partitioned is added, Doris will read the new partition immediately.
### What problem does this PR solve? Problem Summary: Support enable or disable hive partition cache at Catalog level for hive catalog. Previously, if user want to disable the hive partition cache, they can only set the `max_hive_partition_table_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `partition.cache.ttl-second`. If set to 0, the hive partition cache will be disabled, so if new partitioned is added, Doris will read the new partition immediately.
### What problem does this PR solve? Problem Summary: Just same as #50724, support enable or disable schema cache at Catalog level for all kinds of external catalogs. Previously, if user want to disable the schema cache, they can only set the `max_external_schema_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `schema.cache.ttl-second`. If set to 0, the schema cache will be disabled, so if schema is changed Doris will read the new schema immediately.
### What problem does this PR solve? Problem Summary: Just same as #50724, support enable or disable schema cache at Catalog level for all kinds of external catalogs. Previously, if user want to disable the schema cache, they can only set the `max_external_schema_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `schema.cache.ttl-second`. If set to 0, the schema cache will be disabled, so if schema is changed Doris will read the new schema immediately.
### What problem does this PR solve? Problem Summary: Support enable or disable hive partition cache at Catalog level for hive catalog. Previously, if user want to disable the hive partition cache, they can only set the `max_hive_partition_table_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `partition.cache.ttl-second`. If set to 0, the hive partition cache will be disabled, so if new partitioned is added, Doris will read the new partition immediately.
### What problem does this PR solve? Problem Summary: Just same as apache#50724, support enable or disable schema cache at Catalog level for all kinds of external catalogs. Previously, if user want to disable the schema cache, they can only set the `max_external_schema_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `schema.cache.ttl-second`. If set to 0, the schema cache will be disabled, so if schema is changed Doris will read the new schema immediately.
### What problem does this PR solve? Problem Summary: Just same as #50724, support enable or disable schema cache at Catalog level for all kinds of external catalogs. Previously, if user want to disable the schema cache, they can only set the `max_external_schema_cache_num=0` in fe.conf and restart FE. And this config will effect all catalogs. In this PR, I add a new catalog property `schema.cache.ttl-second`. If set to 0, the schema cache will be disabled, so if schema is changed Doris will read the new schema immediately.
What problem does this PR solve?
Problem Summary:
Support enable or disable hive partition cache at Catalog level for hive catalog.
Previously, if user want to disable the hive partition cache, they can only set the
max_hive_partition_table_cache_num=0in fe.conf and restart FE.And this config will effect all catalogs.
In this PR, I add a new catalog property
partition.cache.ttl-second.If set to 0, the hive partition cache will be disabled, so if new partitioned is added,
Doris will read the new partition immediately.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)