Skip to content

Conversation

@westonpace
Copy link
Member

I still need to add a regression test. I've been able to test by configuring my server with minio client. I think it'd probably be easiest to create a crossbow test for this situation. Current steps:

mc alias set myminio http://localhost:9000 minioadmin minioadmin
mc admin policy add myminio/ no-create-buckets ci/etc/minio-no-create-bucket-policy.json
mc admin user add myminio/ limited limited123
mc admin policy set myminio no-create-buckets user=limited
mc mb myminio/existing-bucket

Then, in python:

import pyarrow.fs as fs
filesystem = fs.S3FileSystem(access_key='limited', secret_key='limited123', endpoint_override='http://localhost:9000')
filesystem.create_dir('existing-bucket/foo') # This line fails without the change

@westonpace westonpace marked this pull request as draft September 11, 2021 03:34
@github-actions
Copy link

@lidavidm
Copy link
Member

The tests do spawn minio - maybe it would be possible to also ensure/check if mc is on the path and run the test that way?

@westonpace westonpace marked this pull request as ready for review September 16, 2021 10:54
@westonpace westonpace requested a review from pitrou September 16, 2021 10:54
@westonpace
Copy link
Member Author

@kszucs Any chance you'd be able to take a look at this integration test? It works but I'm not sure if you have any suggestions for doing it better.

@westonpace
Copy link
Member Author

@lidavidm I considered running mc programmatically but since I have to run several different commands and there is quite a bit of boilerplate in s3_test_util.h I worried it would end up being more complex than a solution like this (using a nightly build job). Also, it would add one more step for anyone wanting to run tests for local development.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. The fix looks fine. Just a couple questions about the new CI tests.


# Run Arrow tests relying on limited permissions user

python -mpytest ${source_dir}/ci/scripts/integration_minio.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure an integration is deserved for this (after all we're just checking a single regression), though I have no strong opinion. @jorisvandenbossche What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is a bit of overhead for a single regression. My thinking is that it will be a good starting point that can be extended in the future in case we run into future issues that either require running against a real S3 instance / configuration or require specific permissions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough.

@pitrou
Copy link
Member

pitrou commented Sep 20, 2021

@github-actions crossbow submit test-conda-python-minio

@github-actions
Copy link

Revision: 1c4461ac3c13553a180ec9fb9a007a000a0aefd9

Submitted crossbow builds: ursacomputing/crossbow @ actions-846

Task Status
test-conda-python-minio Github Actions

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you.

@kszucs
Copy link
Member

kszucs commented Sep 20, 2021

Couldn't we spin up another properly configured minio server from the pytest test suite and just exercise the regression test on it? That way we would always run that test, so no need for additional scripts, docker-compose service and crossbow task.

Similarly like we already do in conftest, but with additional configuration.

@westonpace
Copy link
Member Author

Couldn't we spin up another properly configured minio server from the pytest test suite and just exercise the regression test on it? That way we would always run that test, so no need for additional scripts, docker-compose service and crossbow task.

@kszucs

Yes, this was David's point too. My initial reluctance was that it would require anyone that wants to run the test to have the mc binary in addition to the minio binary but, in retrospect, that may not be too much of an additional burden. I'll try spinning up a version that works this way.

@westonpace westonpace force-pushed the bugfix/ARROW-13685-cannot-write-to-s3-if-bucket-exists branch from 1c4461a to a9d2287 Compare September 21, 2021 23:26
@westonpace
Copy link
Member Author

Per everyone's suggestions I have moved the test from a standalone test to a builtin test that is skipped if mc is not present.

@westonpace westonpace requested review from kszucs and pitrou September 22, 2021 03:07
@@ -298,6 +300,99 @@ def subtree_s3fs(request, s3fs):
)


__minio_limited_policy = """{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single leading underscore should be sufficient for all of the "protected" variables and functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I changed these to single underscore.

# These commands create a limited user with a specific
# policy and creates a sample bucket for that user to
# write to
__run_mc_command(mcdir, 'admin', 'policy', 'add',
Copy link
Member

@kszucs kszucs Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, could we use the minio python client instead of subprocess calls to mc?

https://docs.min.io/docs/python-client-api-reference.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the python client exposes admin operations like adding policies or users (at least, not that I can tell). Unfortunately, set_bucket_policy is not sufficient because the s3:CreateBucket operation only makes sense as a user permission.

@westonpace westonpace force-pushed the bugfix/ARROW-13685-cannot-write-to-s3-if-bucket-exists branch from a173cba to 249a499 Compare September 30, 2021 11:42
@westonpace
Copy link
Member Author

I kind of forgot about this. I rebased and fixed a lint error. Assuming CI passes I will merge this tomorrow.

…s not to be confused with mc.exe which is the Windows message compiler
@westonpace westonpace closed this in 700ac1e Oct 1, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
…eady exists

I still need to add a regression test.  I've been able to test by configuring my server with minio client.  I think it'd probably be easiest to create a crossbow test for this situation.  Current steps:

```
mc alias set myminio http://localhost:9000 minioadmin minioadmin
mc admin policy add myminio/ no-create-buckets ci/etc/minio-no-create-bucket-policy.json
mc admin user add myminio/ limited limited123
mc admin policy set myminio no-create-buckets user=limited
mc mb myminio/existing-bucket
```

Then, in python:

```
import pyarrow.fs as fs
filesystem = fs.S3FileSystem(access_key='limited', secret_key='limited123', endpoint_override='http://localhost:9000')
filesystem.create_dir('existing-bucket/foo') # This line fails without the change
```

Closes apache#11136 from westonpace/bugfix/ARROW-13685-cannot-write-to-s3-if-bucket-exists

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace deleted the bugfix/ARROW-13685-cannot-write-to-s3-if-bucket-exists branch January 6, 2022 08:15
@JonnyWaffles
Copy link

Hi team, @westonpace is it possible to open this thread back up? Our environment is tightly locked down and I cannot grant my application the necessary s3:ListBucket resource permissions. Is it possible to add a keyword argument to skip this check behavior entirely, so I can write my data similar to if I use the pq.write_table functionality?

@westonpace
Copy link
Member Author

@JonnyWaffles It's probably best to open a new JIRA ticket for that request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants