Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Mar 3, 2023

Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri. However, this isn't very useful if the user has multiple URIs or if they already have a filesystem. This method allows that same URI handling to be used even if the user already has a filesystem.

What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

Are these changes tested?

Yes, via unit tests

Are there any user-facing changes?

There is a new API method but no changes to any existing APIs

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Mar 3, 2023
@github-actions
Copy link

github-actions bot commented Mar 3, 2023

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

⚠️ GitHub issue #34386 has been automatically assigned in GitHub to PR creator.

@lidavidm
Copy link
Member

lidavidm commented Mar 7, 2023

Oh neat, thank you. I'll try to take a look at this soon

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 17, 2023
@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
@lidavidm
Copy link
Member

Not sure why this was unceremoniously closed. Weston, do you think you would still have time to look at this? It would be useful in #35034

@westonpace
Copy link
Member Author

Yes, I will try and look at this. I think it still needs some changes.

@westonpace
Copy link
Member Author

I unified the various paths using a helper method in util_internal.cc. This highlighted one small difference between the various implementations. The s3 and gcs filesystems remove any trailing slash in the path while the hdfs and local filesystems do not. I have opened #35399 for any discussion on that difference but, for the moment, I have changed all the filesystems to always remove the trailing slash for consistency.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 3, 2023
@lidavidm
Copy link
Member

lidavidm commented May 3, 2023

Hmm, I suppose the trailing slash could change listing behavior in S3.

@westonpace
Copy link
Member Author

Hmm, I suppose the trailing slash could change listing behavior in S3.

S3 was already removing the trailing slash so there is no change to the S3 behavior. This only affected the local filesystem.

@lidavidm
Copy link
Member

lidavidm commented May 3, 2023

Ah, I got that mixed up. LGTM then

@westonpace westonpace force-pushed the feature/GH-34386--path-from-urireset branch from da56f29 to c89e9ca Compare May 4, 2023 15:42
Comment on lines +137 to +142
#ifdef _WIN32
if (preserve_root && key.size() == 3 && key[1] == ':' && key[0] != '/') {
// If the user gives us C:/ then don't return C:
return key;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh fun.

@westonpace
Copy link
Member Author

There are a few errors but they seem to be unrelated. I'm going to merge this.

@westonpace westonpace merged commit 7ca7724 into apache:main May 4, 2023
@ursabot
Copy link

ursabot commented May 5, 2023

Benchmark runs are scheduled for baseline = febd0ff and contender = 7ca7724. 7ca7724 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.27% ⬆️0.58%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.96% ⬆️0.93%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 7ca77241 ec2-t3-xlarge-us-east-2
[Finished] 7ca77241 test-mac-arm
[Finished] 7ca77241 ursa-i9-9960x
[Finished] 7ca77241 ursa-thinkcentre-m75q
[Finished] febd0ff1 ec2-t3-xlarge-us-east-2
[Finished] febd0ff1 test-mac-arm
[Finished] febd0ff1 ursa-i9-9960x
[Finished] febd0ff1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented May 5, 2023

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: apache#34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: apache#34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Handle URI paths for filesystems?

4 participants