-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](multi-catalog) Fix bug: "Can not create a Path from an empty string" #49382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](multi-catalog) Fix bug: "Can not create a Path from an empty string" #49382
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
6302cde to
0f587fd
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a bug in the HiveMetaStoreCache related to incorrect path parsing caused by using the wrong overload of FileInputFormat.setInputPaths.
- Fixes a bug by replacing a call to finalLocation.get() with finalLocation.getPath() to avoid unwanted comma splitting.
- Adds a comment to clarify the choice of the correct overload.
Files not reviewed (3)
- docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run74.hql: Language not supported
- regression-test/data/external_table_p0/hive/test_hive_partitions.out: Language not supported
- regression-test/suites/external_table_p0/hive/test_hive_partitions.groovy: Language not supported
Comments suppressed due to low confidence (1)
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:414
- Ensuring the use of finalLocation.getPath() is correct for avoiding comma-splitting issues. Verify that finalLocation.getPath() returns a valid Path object as expected by the FileInputFormat.setInputPaths overloaded method.
FileInputFormat.setInputPaths(jobConf, finalLocation.getPath());
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…ring" (apache#49382) Problem Summary: In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors. ```java public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) { setInputPaths(conf, StringUtils.stringToPath( getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths))); } ``` To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly. ```java public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input. FileInputFormat.INPUT_DIR, str.toString()); } ```
…ring" (apache#49382) Problem Summary: In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors. ```java public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) { setInputPaths(conf, StringUtils.stringToPath( getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths))); } ``` To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly. ```java public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input. FileInputFormat.INPUT_DIR, str.toString()); } ```
… an empty string" (#49382) (#49641) ### What problem does this PR solve? Problem Summary: In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors. ```java public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) { setInputPaths(conf, StringUtils.stringToPath( getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths))); } ``` To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly. ```java public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input. FileInputFormat.INPUT_DIR, str.toString()); } ``` ### Release note None
…ring" (apache#49382) ### What problem does this PR solve? Problem Summary: In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors. ```java public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) { setInputPaths(conf, StringUtils.stringToPath( getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths))); } ``` ### Release note To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly. ```java public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input. FileInputFormat.INPUT_DIR, str.toString()); } ```
What problem does this PR solve?
Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors.
Release note
To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)