-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-41493: [C++][S3] Add a new option to check existence before CreateDir #41822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41493: [C++][S3] Add a new option to check existence before CreateDir #41822
Conversation
|
|
34f71d2 to
4ac9ef5
Compare
4ac9ef5 to
81a4b61
Compare
|
@pitrou @westonpace Could you pls review when you have time? |
cpp/src/arrow/filesystem/s3fs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, don't you want to do the checks by walking up instead of walking down?
For example, when CreateDir("a/b/c") is called, it would seem more logical to first check for a/b/c's existence, then a/b if it doesn't exist, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reflect on it and my 2cents is to keep the walking down logic. I'm doing the checking inside the creation step. I can only build a wall from the bottom up otherwise the world is upside down :) So I must check a then a/b and a/b/c.
when CreateDir("a/b/c") is called, a/b/c's existence will be checked immediately in non recursive mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm proposing is (example in the case of a/b/c):
- check which ancestors exist by walking up the directory tree: first
a/b/cthena/b... until you find the first existing ancestor - create the missing descendents by walking down the directory tree: for example, if you just found that
aexists, createa/bthena/b/c
The idea is that, most of the time, almost the entire directory chain will exist, especially if your workload is writing into a deeply partitioned dataset. So doing the directory checks from leaf to root should issue less requests and have less latency.
More formally, let's call n the depth of the path given to CreateDir, and m the number of directories missing along that path.
- with the current approach, we're calling
HeadObjectO(n) times andPutObjectO(m) times; - with my proposal, we're calling
HeadObjectO(m) times andPutObjectO(m) times.
Assuming that m is on average much smaller than n (m would be 0 or 1 most of the time), the approach I'm proposing should be much more efficient, and it would never be less efficient anyway.
Does that make sense or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou Could you please review again?
In this round:
- I added the above logic.
- Ensure my tests are working as I foget to call
MakeFileSystem() - Add missing tests for non recursive CreateDir scenario.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this @HaochengLIU ! Here are some comments.
81a4b61 to
6becf33
Compare
6becf33 to
ced6a7a
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @HaochengLIU . This looks good to me, bar some minor comments.
|
@bkietz Will we have to support this new option in S3 URIs as well? |
|
Whoa this is insane |
|
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a44b537. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
I have a use case that thousands of jobs are writing hive partitioned parquet files daily to the same bucket via S3FS filesystem. The gist here is a lot of keys are being created at the same time hense jobs hits
AWS Error SLOW_DOWN. during Put Object operation: The object exceeded the rate limit for object mutation operations(create, update, and delete). Please reduce your rate request error.frequently throughout the day since the code is creating directories pessimistically.What changes are included in this PR?
Add a new S3Option to check the existence of the directory before creation in
CreateDir. It's disabled by default.When it's enabled, the CreateDir function will check the existence of the directory first before creation. It ensures that the create operation is only acted when necessary. Though there are more I/O calls, but it avoids hitting the cloud vendor put object limit.
Are these changes tested?
Add test cases when the flag is set to true. Right on top of the mind i donno how to ensure it's working in these tests. But in our production environment, we have very similar code and it worked well.
Are there any user-facing changes?