Skip to content

Conversation

@kerneltime
Copy link
Contributor

What changes were proposed in this pull request?

For operations that are not read only we should
set the thread local S3 Auth.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6868

How was this patch tested?

Before patch.
This was manually tested, we need to improve our secure S3 testing and will file a separate jira to improve our test cases. This is an urgent fix that needs to get in.

mc: <ERROR> Failed to copy `/etc/hosts`. Insufficient permissions to access this file `https://localhost:9879/bucket/key`

After patch the object could be uploaded to S3.

@kerneltime kerneltime changed the title HDDS-6868 Add S3Auth information to thread local HDDS-6868. Add S3Auth information to thread local Jun 17, 2022
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kerneltime for working on this.

This was manually tested, we need to improve our secure S3 testing and will file a separate jira to improve our test cases. This is an urgent fix that needs to get in.

Not adding tests and saying "manually tested" is OK if automating those tests would be disproportionately more effort than the fix itself (debugging and implementing it). But then please explicitly list manual test steps so that others can verify the same way.

I believe our acceptance tests cover S3 operations reasonably well, both unsecure and secure environment. HDDS-6868 description shows that the bug happens even with a very basic "upload a file" step. This is definitely covered by the tests. However, those tests are executed as an admin user (testuser). I tried testuser2 (which I think is non-admin), but the tests still passed. Another problem might be that in docker-based environments all servers are run as the same user (hadoop).

I would like to understand the specific environmental difference that exposes this bug.

@kerneltime
Copy link
Contributor Author

kerneltime commented Jun 17, 2022

I have been trying to root cause why the tests pass. I think I know why s3g as a user gets treated as root, will continue debugging further, hopefully this is just an issue with the docker environment. More details tomorrow after I try changing the test environment. Right now, it looks like UGI setup lands up sneaking in root as the short username which leads to acl checks passing.

The manual tests were run using mc as AWS cli and it used systest user in a real cluster with ranger.
We should merge this fix in and I will post a fix for our tests.

@kerneltime kerneltime force-pushed the HDDS-6868 branch 2 times, most recently from 8947586 to b8bbeb0 Compare June 20, 2022 06:48
@adoroszlai
Copy link
Contributor

@kerneltime Thanks for tweaking the patch. However,submitRequestDirectlyToOM is still not fixed.

@kerneltime
Copy link
Contributor Author

@kerneltime Thanks for tweaking the patch. However,submitRequestDirectlyToOM is still not fixed.

Not sure why it was skipped originally, fixed it.

@kerneltime kerneltime force-pushed the HDDS-6868 branch 2 times, most recently from 8b08cb1 to 7f33406 Compare June 21, 2022 12:31
@neils-dev
Copy link
Contributor

I believe our acceptance tests cover S3 operations reasonably well, both unsecure and secure environment. HDDS-6868 description shows that the bug happens even with a very basic "upload a file" step. This is definitely covered by the tests. However, those tests are executed as an admin user (testuser). I tried testuser2 (which I think is non-admin), but the tests still passed. Another problem might be that in docker-based environments all servers are run as the same user (hadoop).

I would like to understand the specific environmental difference that exposes this bug.

I would like to understand the problem better as well, as the upload condition passes our acceptance tests and I cannot see clearly why the error occurs otherwise. To be able to describe the exact difference between our CI tests condition and the condition the error is seen would be great. Also it would help us update our CI workflow if needed.

@kerneltime
Copy link
Contributor Author

I believe our acceptance tests cover S3 operations reasonably well, both unsecure and secure environment. HDDS-6868 description shows that the bug happens even with a very basic "upload a file" step. This is definitely covered by the tests. However, those tests are executed as an admin user (testuser). I tried testuser2 (which I think is non-admin), but the tests still passed. Another problem might be that in docker-based environments all servers are run as the same user (hadoop).
I would like to understand the specific environmental difference that exposes this bug.

I would like to understand the problem better as well, as the upload condition passes our acceptance tests and I cannot see clearly why the error occurs otherwise. To be able to describe the exact difference between our CI tests condition and the condition the error is seen would be great. Also it would help us update our CI workflow if needed.

I have root caused the issue with the smoke tests and am working on the fix for it, it will take a couple of days to iron out all the tests.

@kerneltime
Copy link
Contributor Author

submitRequestDirectlyToOM

Thanks, @adoroszlai for the thorough review I missed that initially. Posting an update.

For operations that are not read only we should
set the thread local S3 Auth.
@errose28 errose28 merged commit c634a87 into apache:master Jun 22, 2022
errose28 added a commit to errose28/ozone that referenced this pull request Jun 23, 2022
* master: (34 commits)
  HDDS-6868 Add S3Auth information to thread local (apache#3527)
  HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510)
  HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528)
  HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513)
  HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533)
  HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522)
  HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499)
  HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319)
  HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517)
  HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523)
  HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525)
  HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442)
  HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504)
  HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482)
  HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521)
  HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484)
  HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503)
  HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506)
  HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488)
  HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512)
  ...
adoroszlai added a commit to adoroszlai/ozone that referenced this pull request Jun 24, 2022
duongkame pushed a commit to duongkame/ozone that referenced this pull request Aug 16, 2022
…3527)

Change-Id: I63ecb84d7037d7b9ac4c645830ae4ff3203f1d08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants