Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Oct 10, 2024

Proposed changes

Because ExternalTable will initialize the previously uninitialized table when getCachedRowCount(), which is unnecessary. So for the uninitialized table, we directly return -1.
This will increase the speed of our query information_schema.tables.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@hubgeter
Copy link
Contributor Author

run buildall

1 similar comment
@hubgeter
Copy link
Contributor Author

run buildall

@hubgeter hubgeter force-pushed the opt_infomation_schema_tables branch from 90d6893 to c3535c9 Compare October 10, 2024 14:18
@hubgeter
Copy link
Contributor Author

run buildall

// ExternalTable.getRowCount(), but this is not very meaningful and time-consuming.
// The getCachedRowCount() function is only used when `show table` and querying `information_schema.tables`.
if (!checkInitialized()) {
return -2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will fix it

return false;
}

public boolean checkInitialized() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need this method, we can use check objectCreated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will remove this method.

private boolean enableHmsEventsIncrementalSync = false;

//for "type" = "hms" , but is iceberg table.
HiveCatalog icebergHiveCatalog;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HiveCatalog icebergHiveCatalog;
private HiveCatalog icebergHiveCatalog;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will fix it

}

public Catalog getIcebergHiveCatalog() {
if (icebergHiveCatalog == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method may be called concurrently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for prevent this, i will move icebergHiveCatalog init to initLocalObjectsImpl()

HiveCatalog hiveCatalog = new HiveCatalog();
hiveCatalog.setConf(conf);

if (props.containsKey(HMSExternalCatalog.BIND_BROKER_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is missing?

Copy link
Contributor Author

@hubgeter hubgeter Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this part and removed the else part. As it seemed to me that the two were duplicates. The else part just initialize hivecatalog use uriproperty.

@hubgeter hubgeter force-pushed the opt_infomation_schema_tables branch from c3535c9 to 3d68a83 Compare October 11, 2024 06:40
@hubgeter hubgeter force-pushed the opt_infomation_schema_tables branch from 3d68a83 to 7526b02 Compare October 11, 2024 06:42
@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 11, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Comment on lines +673 to +681
HiveCatalog hiveCatalog = new org.apache.iceberg.hive.HiveCatalog();
hiveCatalog.setConf(externalCatalog.getConfiguration());

Map<String, String> catalogProperties = externalCatalog.getProperties();
String metastoreUris = catalogProperties.getOrDefault(HMSProperties.HIVE_METASTORE_URIS, "");
catalogProperties.put(CatalogProperties.URI, metastoreUris);

hiveCatalog.initialize(name, catalogProperties);
return hiveCatalog;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is same as IcebergHMSExternalCatalog.initCatalog ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes . so now IcebergHMSExternalCatalog.initCatalog() method use this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morningman morningman merged commit 127ac8d into apache:master Oct 13, 2024
qzsee pushed a commit to qzsee/incubator-doris that referenced this pull request Oct 16, 2024
…ount when reading ExternalTable (apache#41659)

## Proposed changes
Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
hubgeter added a commit to hubgeter/doris that referenced this pull request Oct 16, 2024
…ount when reading ExternalTable (apache#41659)

Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
hubgeter added a commit to hubgeter/doris that referenced this pull request Oct 16, 2024
…ount when reading ExternalTable (apache#41659)

## Proposed changes
Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
yiguolei pushed a commit that referenced this pull request Oct 17, 2024
…ount when reading ExternalTable (#41659) (#41959)

bp #41659 
## Proposed changes

Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
morningman pushed a commit that referenced this pull request Oct 18, 2024
…ount when reading ExternalTable (#41659) (#41962)

bp #41659 

## Proposed changes
Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
@gavinchou gavinchou mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants