Skip to content

Conversation

@zddr
Copy link
Contributor

@zddr zddr commented Dec 27, 2022

Proposed changes

Issue Number: close #xxx

Problem summary

Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency. By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove) so that represents the latest information
available in metastore. We keep track of the last synced event id in each polling
iteration so the next batch can be requested appropriately.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@hello-stephen
Copy link
Contributor

hello-stephen commented Dec 27, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.14 seconds
load time: 642 seconds
storage size: 17123143046 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230103035153_clickbench_pr_72556.html

@zddr zddr closed this Dec 29, 2022
@zddr zddr reopened this Dec 30, 2022
LOG.info("Event id not updated when pulling events on catalog [{}]", hmsExternalCatalog.getName());
return null;
}
return client.getNextNotification(lastSyncedEventId, Config.hms_events_batch_size_per_rpc, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use currentEventId here instead of lastSyncedEventId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currentEventId represents the latest event ID. If we use the currentEventId to get events, we will never get events

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@zddr zddr changed the title [WIP]support automatic sync hive metastore events [HMSEvent]support automatic sync hive metastore events Jan 3, 2023
@zddr zddr changed the title [HMSEvent]support automatic sync hive metastore events [feature]support automatic sync hive metastore events Jan 3, 2023
@morningman morningman changed the title [feature]support automatic sync hive metastore events [feature](multi-catalog) support automatic sync hive metastore events Jan 3, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

LOG.info("Event id not updated when pulling events on catalog [{}]", hmsExternalCatalog.getName());
return null;
}
return client.getNextNotification(lastSyncedEventId, Config.hms_events_batch_size_per_rpc, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 893f5f9 into apache:master Jan 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

PR approved by anyone and no changes requested.

@dutyu
Copy link
Contributor

dutyu commented Jan 4, 2023

I found that just DropTableEvent will be handled, is there a plan to handle other kinds of events likes CreateTable event , AlterTable event and so on?

@morningman morningman changed the title [feature](multi-catalog) support automatic sync hive metastore events [feature-wip](multi-catalog) support automatic sync hive metastore events Jan 4, 2023
@morningman
Copy link
Contributor

I found that just DropTableEvent will be handled, is there a plan to handle other kinds of events likes CreateTable event , AlterTable event and so on?

This is a WIP pr, not finished

eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Jan 5, 2023
…ents (apache#15401)

Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency.
By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove)
so that represents the latest information available in metastore.
We keep track of the last synced event id in each polling
iteration so the next batch can be requested appropriately.
morningman pushed a commit that referenced this pull request Jan 16, 2023
…ents (#15401)

Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency.
By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove)
so that represents the latest information available in metastore.
We keep track of the last synced event id in each polling
iteration so the next batch can be requested appropriately.
@zddr zddr deleted the hmsevent branch March 28, 2024 02:30
morningman pushed a commit that referenced this pull request Aug 23, 2024
…using meta cache and add parameters to the Hive catalog. (#39239)

before #15401  and #38244 

## Proposed changes

1. Add the parameter `hive.enable_hms_events_incremental_sync` to the
hive catalog, which is used to switch the catalog to read hive
notification events (default is false). The default value is
`enable_hms_events_incremental_sync` in fe.conf

2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive
catalog, which is used to set the size of notification events read by
the catalog each time. The default value is
`hms_events_batch_size_per_rpc` in fe.conf (default is 500)

3. append hms event notification case . 

4. Remove the `use_meta_cache` setting in catalog that is forced to
true.

Example :
```
create catalog if not exists catalog_name properties (
      "type"="hms",
      'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port',
       "hive.enable_hms_events_incremental_sync" ="true",
       "hive.hms_events_batch_size_per_rpc" = "1000"
);
```
morningman pushed a commit to morningman/doris that referenced this pull request Aug 23, 2024
…using meta cache and add parameters to the Hive catalog. (apache#39239)

before apache#15401  and apache#38244

1. Add the parameter `hive.enable_hms_events_incremental_sync` to the
hive catalog, which is used to switch the catalog to read hive
notification events (default is false). The default value is
`enable_hms_events_incremental_sync` in fe.conf

2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive
catalog, which is used to set the size of notification events read by
the catalog each time. The default value is
`hms_events_batch_size_per_rpc` in fe.conf (default is 500)

3. append hms event notification case .

4. Remove the `use_meta_cache` setting in catalog that is forced to
true.

Example :
```
create catalog if not exists catalog_name properties (
      "type"="hms",
      'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port',
       "hive.enable_hms_events_incremental_sync" ="true",
       "hive.hms_events_batch_size_per_rpc" = "1000"
);
```
dataroaring pushed a commit that referenced this pull request Aug 26, 2024
…using meta cache and add parameters to the Hive catalog. (#39239)

before #15401  and #38244 

## Proposed changes

1. Add the parameter `hive.enable_hms_events_incremental_sync` to the
hive catalog, which is used to switch the catalog to read hive
notification events (default is false). The default value is
`enable_hms_events_incremental_sync` in fe.conf

2. Add the parameter `hive.hms_events_batch_size_per_rpc` to the hive
catalog, which is used to set the size of notification events read by
the catalog each time. The default value is
`hms_events_batch_size_per_rpc` in fe.conf (default is 500)

3. append hms event notification case . 

4. Remove the `use_meta_cache` setting in catalog that is forced to
true.

Example :
```
create catalog if not exists catalog_name properties (
      "type"="hms",
      'hive.metastore.uris' = 'thrift://externalEnvIp:hms_port',
       "hive.enable_hms_events_incremental_sync" ="true",
       "hive.hms_events_batch_size_per_rpc" = "1000"
);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/multi-catalog dev/1.2.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants