-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature-wip](multi-catalog) support automatic sync hive metastore events #15401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TeamCity pipeline, clickbench performance test result: |
fe/fe-core/src/main/java/org/apache/doris/datasource/HMSExternalCatalog.java
Outdated
Show resolved
Hide resolved
| LOG.info("Event id not updated when pulling events on catalog [{}]", hmsExternalCatalog.getName()); | ||
| return null; | ||
| } | ||
| return client.getNextNotification(lastSyncedEventId, Config.hms_events_batch_size_per_rpc, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use currentEventId here instead of lastSyncedEventId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The currentEventId represents the latest event ID. If we use the currentEventId to get events, we will never get events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/event/MetastoreEventFactory.java
Outdated
Show resolved
Hide resolved
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| LOG.info("Event id not updated when pulling events on catalog [{}]", hmsExternalCatalog.getName()); | ||
| return null; | ||
| } | ||
| return client.getNextNotification(lastSyncedEventId, Config.hms_events_batch_size_per_rpc, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
I found that just |
This is a WIP pr, not finished |
…ents (apache#15401) Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency. By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove) so that represents the latest information available in metastore. We keep track of the last synced event id in each polling iteration so the next batch can be requested appropriately.
…ents (#15401) Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency. By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove) so that represents the latest information available in metastore. We keep track of the last synced event id in each polling iteration so the next batch can be requested appropriately.
…using meta cache and add parameters to the Hive catalog. (#39239) before #15401 and #38244 ## Proposed changes 1. Add the parameter `hive.enable_hms_events_incremental_sync` to the hive catalog, which is used to switch the catalog to read hive notification events (default is false). The default value is `enable_hms_events_incremental_sync` in fe.conf 2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive catalog, which is used to set the size of notification events read by the catalog each time. The default value is `hms_events_batch_size_per_rpc` in fe.conf (default is 500) 3. append hms event notification case . 4. Remove the `use_meta_cache` setting in catalog that is forced to true. Example : ``` create catalog if not exists catalog_name properties ( "type"="hms", 'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port', "hive.enable_hms_events_incremental_sync" ="true", "hive.hms_events_batch_size_per_rpc" = "1000" ); ```
…using meta cache and add parameters to the Hive catalog. (apache#39239) before apache#15401 and apache#38244 1. Add the parameter `hive.enable_hms_events_incremental_sync` to the hive catalog, which is used to switch the catalog to read hive notification events (default is false). The default value is `enable_hms_events_incremental_sync` in fe.conf 2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive catalog, which is used to set the size of notification events read by the catalog each time. The default value is `hms_events_batch_size_per_rpc` in fe.conf (default is 500) 3. append hms event notification case . 4. Remove the `use_meta_cache` setting in catalog that is forced to true. Example : ``` create catalog if not exists catalog_name properties ( "type"="hms", 'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port', "hive.enable_hms_events_incremental_sync" ="true", "hive.hms_events_batch_size_per_rpc" = "1000" ); ```
…using meta cache and add parameters to the Hive catalog. (#39239) before #15401 and #38244 ## Proposed changes 1. Add the parameter `hive.enable_hms_events_incremental_sync` to the hive catalog, which is used to switch the catalog to read hive notification events (default is false). The default value is `enable_hms_events_incremental_sync` in fe.conf 2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive catalog, which is used to set the size of notification events read by the catalog each time. The default value is `hms_events_batch_size_per_rpc` in fe.conf (default is 500) 3. append hms event notification case . 4. Remove the `use_meta_cache` setting in catalog that is forced to true. Example : ``` create catalog if not exists catalog_name properties ( "type"="hms", 'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port', "hive.enable_hms_events_incremental_sync" ="true", "hive.hms_events_batch_size_per_rpc" = "1000" ); ```
Proposed changes
Issue Number: close #xxx
Problem summary
Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency. By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove) so that represents the latest information
available in metastore. We keep track of the last synced event id in each polling
iteration so the next batch can be requested appropriately.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...