Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Mar 28, 2025

bp #49382

…ring" (apache#49382)

Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
  public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
    setInputPaths(conf, StringUtils.stringToPath(
                        getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
  }
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
  public static void setInputPaths(JobConf conf, Path... inputPaths) {
    Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
      str.append(StringUtils.COMMA_STR);
      path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
      str.append(StringUtils.escapeString(path.toString()));
    }
    conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
      FileInputFormat.INPUT_DIR, str.toString());
  }
```
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit a6f8b57 into apache:branch-3.0 Apr 22, 2025
21 of 23 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_hive_partition_30 branch July 10, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants