-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13685: [C++] Cannot write dataset to S3FileSystem if bucket already exists #11136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13685: [C++] Cannot write dataset to S3FileSystem if bucket already exists #11136
Conversation
|
The tests do spawn minio - maybe it would be possible to also ensure/check if |
|
@kszucs Any chance you'd be able to take a look at this integration test? It works but I'm not sure if you have any suggestions for doing it better. |
|
@lidavidm I considered running mc programmatically but since I have to run several different commands and there is quite a bit of boilerplate in s3_test_util.h I worried it would end up being more complex than a solution like this (using a nightly build job). Also, it would add one more step for anyone wanting to run tests for local development. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. The fix looks fine. Just a couple questions about the new CI tests.
ci/scripts/integration_minio.sh
Outdated
|
|
||
| # Run Arrow tests relying on limited permissions user | ||
|
|
||
| python -mpytest ${source_dir}/ci/scripts/integration_minio.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure an integration is deserved for this (after all we're just checking a single regression), though I have no strong opinion. @jorisvandenbossche What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is a bit of overhead for a single regression. My thinking is that it will be a good starting point that can be extended in the future in case we run into future issues that either require running against a real S3 instance / configuration or require specific permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough.
|
@github-actions crossbow submit test-conda-python-minio |
|
Revision: 1c4461ac3c13553a180ec9fb9a007a000a0aefd9 Submitted crossbow builds: ursacomputing/crossbow @ actions-846
|
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you.
|
Couldn't we spin up another properly configured minio server from the pytest test suite and just exercise the regression test on it? That way we would always run that test, so no need for additional scripts, docker-compose service and crossbow task. Similarly like we already do in conftest, but with additional configuration. |
Yes, this was David's point too. My initial reluctance was that it would require anyone that wants to run the test to have the |
1c4461a to
a9d2287
Compare
|
Per everyone's suggestions I have moved the test from a standalone test to a builtin test that is skipped if |
python/pyarrow/tests/test_fs.py
Outdated
| @@ -298,6 +300,99 @@ def subtree_s3fs(request, s3fs): | |||
| ) | |||
|
|
|||
|
|
|||
| __minio_limited_policy = """{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single leading underscore should be sufficient for all of the "protected" variables and functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I changed these to single underscore.
python/pyarrow/tests/test_fs.py
Outdated
| # These commands create a limited user with a specific | ||
| # policy and creates a sample bucket for that user to | ||
| # write to | ||
| __run_mc_command(mcdir, 'admin', 'policy', 'add', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, could we use the minio python client instead of subprocess calls to mc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the python client exposes admin operations like adding policies or users (at least, not that I can tell). Unfortunately, set_bucket_policy is not sufficient because the s3:CreateBucket operation only makes sense as a user permission.
…y instead of relying on CreateBucket to do so as CreateBucket can fail for permission denied reasons
…st suite instead of a dedicated integration test.
a173cba to
249a499
Compare
|
I kind of forgot about this. I rebased and fixed a lint error. Assuming CI passes I will merge this tomorrow. |
…s not to be confused with mc.exe which is the Windows message compiler
…eady exists
I still need to add a regression test. I've been able to test by configuring my server with minio client. I think it'd probably be easiest to create a crossbow test for this situation. Current steps:
```
mc alias set myminio http://localhost:9000 minioadmin minioadmin
mc admin policy add myminio/ no-create-buckets ci/etc/minio-no-create-bucket-policy.json
mc admin user add myminio/ limited limited123
mc admin policy set myminio no-create-buckets user=limited
mc mb myminio/existing-bucket
```
Then, in python:
```
import pyarrow.fs as fs
filesystem = fs.S3FileSystem(access_key='limited', secret_key='limited123', endpoint_override='http://localhost:9000')
filesystem.create_dir('existing-bucket/foo') # This line fails without the change
```
Closes apache#11136 from westonpace/bugfix/ARROW-13685-cannot-write-to-s3-if-bucket-exists
Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
|
Hi team, @westonpace is it possible to open this thread back up? Our environment is tightly locked down and I cannot grant my application the necessary |
|
@JonnyWaffles It's probably best to open a new JIRA ticket for that request. |
I still need to add a regression test. I've been able to test by configuring my server with minio client. I think it'd probably be easiest to create a crossbow test for this situation. Current steps:
Then, in python: