Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Mar 28, 2025

bp #49382

…ring" (apache#49382)

Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
  public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
    setInputPaths(conf, StringUtils.stringToPath(
                        getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
  }
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
  public static void setInputPaths(JobConf conf, Path... inputPaths) {
    Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
      str.append(StringUtils.COMMA_STR);
      path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
      str.append(StringUtils.escapeString(path.toString()));
    }
    conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
      FileInputFormat.INPUT_DIR, str.toString());
  }
```
@suxiaogang223 suxiaogang223 requested a review from yiguolei as a code owner March 28, 2025 11:24
@Thearas
Copy link
Contributor

Thearas commented Mar 28, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 29, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 94986fc into apache:branch-2.1 Mar 29, 2025
23 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_hive_partition_21 branch July 10, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants