-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](jdbc catalog) Optimize JdbcCatalog case mapping stability #41510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add need to add more FE unit test to test all these logics
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/IdentifierMapping.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/DefaultIdentifierMapping.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/DefaultIdentifierMapping.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/DefaultIdentifierMapping.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/DefaultIdentifierMapping.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/JdbcExternalCatalog.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/DefaultIdentifierMapping.java
Outdated
Show resolved
Hide resolved
1f71caf to
a833914
Compare
|
run buildall |
a833914 to
a310c8f
Compare
|
run buildall |
a310c8f to
1b1af81
Compare
|
run buildall |
4483c21 to
ac6a485
Compare
|
run buildall |
d440a07 to
313cb95
Compare
72c6f9a to
9185dca
Compare
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java
Outdated
Show resolved
Hide resolved
| String remoteColumnName = jdbcCatalog.getRemoteColumnNames(this.dbName, this.name, column.getName()); | ||
| remoteColumnNames.put(column.getName(), remoteColumnName); | ||
| } | ||
| if (!remoteColumnNames.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the remoteColumnNames is always not empty
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java
Show resolved
Hide resolved
| dbName -> Optional.ofNullable( | ||
| buildDbForInit(dbName, Util.genIdByName(name, dbName), logType, true)), | ||
| (key, value, cause) -> value.ifPresent(v -> v.setUnInitialized(invalidCacheInInit))); | ||
| ignored -> remoteToLocalPairs.stream().map(Pair::value).collect(Collectors.toList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right.
If you do this, the dbnames cache will always be remoteToLocalPairs, even if there are new dbs added.
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java
Show resolved
Hide resolved
| @SerializedName(value = "name") | ||
| protected String name; | ||
| @SerializedName(value = "remoteName") | ||
| protected String remoteName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteName maybe null when upgrading from old version. How to handle it?
| protected long dbId; | ||
| protected boolean objectCreated; | ||
| protected ExternalCatalog catalog; | ||
| protected ExternalDatabase db; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also mark dbName as deprecated?
| @SerializedName(value = "name") | ||
| protected String name; | ||
| @SerializedName(value = "remoteName") | ||
| protected String remoteName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteName maybe null when upgrading from old version. How to handle it?
| public boolean tableExist(SessionContext ctx, String dbName, String tblName) { | ||
| makeSureInitialized(); | ||
| return jdbcClient.isTableExist(dbName, tblName); | ||
| ExternalTable tbl = Objects.requireNonNull(this.getDbNullable(dbName)).getTableNullable(tblName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to make sure tbl is always not null?
fe/fe-core/src/main/java/org/apache/doris/datasource/mapping/IdentifierMapping.java
Outdated
Show resolved
Hide resolved
| import com.google.common.collect.Sets; | ||
| import com.google.gson.annotations.SerializedName; | ||
| import com.google.gson.internal.LinkedTreeMap; | ||
| import jdk.internal.joptsimple.internal.Strings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import, use guava
690a196 to
af1edc6
Compare
| } | ||
| } catch (RuntimeException e) { | ||
| // Handle "Found conflicting" exception explicitly | ||
| if (e.getMessage().contains("Found conflicting")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better define "Found conflicting" as a static field.
and use it here and in RuntimeException msg
| LOG.error(e.getMessage()); | ||
| throw e; // Rethrow to let the caller handle this critical issue | ||
| } else { | ||
| LOG.warn("Failed to check db {} exist in remote system, ignore it.", localDbName, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this can be ignored?
Add comment in code
|
Hi, I found a bug witch can be fixed in this pr, can you add a test case like #45521 in |
af1edc6 to
1e38576
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 39494 ms |
TPC-DS: Total hot run time: 188766 ms |
ClickBench: Total hot run time: 31.32 s |
|
run buildall |
…ty (apache#41510) 1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing 2. Save the remote name in ExternalDatabase/ExternalTable object for get 3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself 4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive. 5. Since Doris's column names are case-insensitive, column names will be checked at any time 6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive. 7. include/exclude_database_lists The remote name must be used 8. meta_names_mapping is used to resolve the conflicts mentioned above 9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping 10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss 11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
…ty (apache#41510) 1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing 2. Save the remote name in ExternalDatabase/ExternalTable object for get 3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself 4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive. 5. Since Doris's column names are case-insensitive, column names will be checked at any time 6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive. 7. include/exclude_database_lists The remote name must be used 8. meta_names_mapping is used to resolve the conflicts mentioned above 9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping 10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss 11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
…ty (apache#41510) 1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing 2. Save the remote name in ExternalDatabase/ExternalTable object for get 3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself 4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive. 5. Since Doris's column names are case-insensitive, column names will be checked at any time 6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive. 7. include/exclude_database_lists The remote name must be used 8. meta_names_mapping is used to resolve the conflicts mentioned above 9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping 10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss 11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
…ty (apache#41510) 1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing 2. Save the remote name in ExternalDatabase/ExternalTable object for get 3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself 4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive. 5. Since Doris's column names are case-insensitive, column names will be checked at any time 6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive. 7. include/exclude_database_lists The remote name must be used 8. meta_names_mapping is used to resolve the conflicts mentioned above 9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping 10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss 11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
### What problem does this PR solve? Related PR: #41510 Problem Summary: We should use remote name to get statistics
### What problem does this PR solve? Related PR: #41510 Problem Summary: We should use remote name to get statistics
### What problem does this PR solve? Related PR: #41510 Problem Summary: In this pr#41510, namesCache changed from `LoadingCache<String, List<String>>` to `LoadingCache<String, List<Pair<String, String>>>`, but the `remove` method of this cache did not change, causing the hms event case to fail.
### What problem does this PR solve? Related PR: #41510 Problem Summary: In this pr#41510, namesCache changed from `LoadingCache<String, List<String>>` to `LoadingCache<String, List<Pair<String, String>>>`, but the `remove` method of this cache did not change, causing the hms event case to fail.
…rom image (#47603) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #41510 Problem Summary: In PR #41510, we added several fields to External Catalog. However, we only handled the upgrade scenario for EditLog but not for Image. This causes Catalogs rebuilt from Image to miss these fields, resulting in NullPointerException during queries. This PR fixes this issue. Specifically: 1. Added null check and initialization for fields in gsonPostProcess 2. Ensured consistent behavior between EditLog replay and Image deserialization 3. Added proper logging for better troubleshooting
…rom image (apache#47603) Issue Number: close #xxx Related PR: apache#41510 Problem Summary: In PR apache#41510, we added several fields to External Catalog. However, we only handled the upgrade scenario for EditLog but not for Image. This causes Catalogs rebuilt from Image to miss these fields, resulting in NullPointerException during queries. This PR fixes this issue. Specifically: 1. Added null check and initialization for fields in gsonPostProcess 2. Ensured consistent behavior between EditLog replay and Image deserialization 3. Added proper logging for better troubleshooting
…rom image (apache#47603) ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#41510 Problem Summary: In PR apache#41510, we added several fields to External Catalog. However, we only handled the upgrade scenario for EditLog but not for Image. This causes Catalogs rebuilt from Image to miss these fields, resulting in NullPointerException during queries. This PR fixes this issue. Specifically: 1. Added null check and initialization for fields in gsonPostProcess 2. Ensured consistent behavior between EditLog replay and Image deserialization 3. Added proper logging for better troubleshooting
…rror after catalog creation apache#45521" due to apache#41510 has fixed this bug. This reverts commit 9f72b3d534f1b6eb53b5198cd3f0f757e71d35f8.
…rom image (apache#47603) Issue Number: close #xxx Related PR: apache#41510 Problem Summary: In PR apache#41510, we added several fields to External Catalog. However, we only handled the upgrade scenario for EditLog but not for Image. This causes Catalogs rebuilt from Image to miss these fields, resulting in NullPointerException during queries. This PR fixes this issue. Specifically: 1. Added null check and initialization for fields in gsonPostProcess 2. Ensured consistent behavior between EditLog replay and Image deserialization 3. Added proper logging for better troubleshooting
…rom image (apache#47603) ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#41510 Problem Summary: In PR apache#41510, we added several fields to External Catalog. However, we only handled the upgrade scenario for EditLog but not for Image. This causes Catalogs rebuilt from Image to miss these fields, resulting in NullPointerException during queries. This PR fixes this issue. Specifically: 1. Added null check and initialization for fields in gsonPostProcess 2. Ensured consistent behavior between EditLog replay and Image deserialization 3. Added proper logging for better troubleshooting
…rror after catalog creation apache#45521" due to apache#41510 has fixed this bug. This reverts commit 9f72b3d534f1b6eb53b5198cd3f0f757e71d35f8.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)