Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Nov 8, 2021

https://issues.apache.org/jira/browse/HDDS-5891

  1. BucketManagerImpl#getBucketInfo can now throw VOLUME_NOT_FOUND if the parent volume doesn't exist, and will only throw BUCKET_NOT_FOUND if the parent volume exists but the bucket doesn't. Before this change it throws BUCKET_NOT_FOUND regardless of whether parent volume exists. Related prior comment: HDDS-5279. OFS mkdir -p does not work when Volume is not pre-created #2412 (comment)
  1. With the patch, BasicRootedOzoneClientAdapterImpl#getBucket will no longer attempt to create the volume if the volume exists. Previously, it can't distinguish VOLUME_NOT_EXIST from BUCKET_NOT_EXIST due to the API limitation mentioned in (1). This fixes the issue mentioned in the jira title, where getBucket can fail if a non-privileged user tries to mkdir -p over a new bucket but fail because the user would fail the volume CREATE permission check -- which doesn't make sense to be checked in the first place in this case.

Testing

  • Some integration/acceptance tests might be broken as a result of (1). Actually the CI goes fine as soon as I fixed the get from VolumeTable logic. In the code logic the volume key was missing the leading /.
  • Added a unit test for the change in behavior in BucketManagerImpl#getBucketInfo in OM, that it could throw VOLUME_NOT_FOUND now if the parent volume doesn't exist (previously getBucketInfo would throw BUCKET_NOT_FOUND in this case).
  • Added a new integration test for the scenario in the jira description. To verify that a regular user will be able to create a bucket with OFS mkdir -p now even though this user don't have volume CREATE permission on the parent volume.

… when volume exists due to volume create ACL check
@fapifta
Copy link
Contributor

fapifta commented Nov 8, 2021

Hi @smengcl,
thank you for working on this forward.

Please help me understand what is the aim here, as I am not sure why are we doing the getBucket call this way. In my head, the algorithm is simply this:

  • get the volume from the objectStore
  • if volume exists get the bucket from the volume
  • if volume does not exist, and if we should create it create it
  • if the bucket does not exist create it

This is better reflected by #2815, the one @dombizita created, based on our earlier discussions with her.

I would like to understand why we are using the ObjectStore's client proxy to directly do OzoneManager protocol calls, when we can get the volume, and from the volume we can get the bucket later on, is there a good reason to turn to the underlying API I don't see, or we can even get rid of using the OzoneManager protocol proxy directly?

On the other hand, I pretty much agree that the BucketManager#getBucketInfo(volName, bucketName) method should throw a VOLUME_NOT_FOUND in case the volume does not exist, that is a much clearer cause in this case.

@smengcl
Copy link
Contributor Author

smengcl commented Nov 8, 2021

Hi @smengcl, thank you for working on this forward.

Please help me understand what is the aim here, as I am not sure why are we doing the getBucket call this way. In my head, the algorithm is simply this:

  • get the volume from the objectStore
  • if volume exists get the bucket from the volume
  • if volume does not exist, and if we should create it create it
  • if the bucket does not exist create it

This is better reflected by #2815, the one @dombizita created, based on our earlier discussions with her.

I would like to understand why we are using the ObjectStore's client proxy to directly do OzoneManager protocol calls, when we can get the volume, and from the volume we can get the bucket later on, is there a good reason to turn to the underlying API I don't see, or we can even get rid of using the OzoneManager protocol proxy directly?

On the other hand, I pretty much agree that the BucketManager#getBucketInfo(volName, bucketName) method should throw a VOLUME_NOT_FOUND in case the volume does not exist, that is a much clearer cause in this case.

Thanks @fapifta for the comment. Thanks @dombizita for #2815 .

This is better reflected by #2815, the one @dombizita created, based on our earlier discussions with her.

Yes #2815 is fine and it is better for compatibility (concern for compatibility is why I posted my PR #2814 in draft). #2814 and #2815 can be seen as two different approaches to the same problem.

I would like to understand why we are using the ObjectStore's client proxy to directly do OzoneManager protocol calls

The reason I used a single proxy.getBucketDetails call when writing OFS was to reduce unnecessary round trip between the client and OM (getVolume then getBucket adds an extra round trip). There were some discussion about the extra round trip offline and IIRC in the original OFS PR as well.

I am totally fine with #2815 in order to unblock other work, and improve on this later. Not sure if adding an extra round trip would really add meaningful perf impact in the first place, happy to discuss about that.

@fapifta
Copy link
Contributor

fapifta commented Nov 10, 2021

Hmm, looking into it further, that perf impact might be huge, as if we do a getVolumeInfo first, then a getBucketInfo based on the volume, and if we use this in almost all operations as we do now, then that is a burden we do not want to take on for sure.

So even though the other solution for me seems easier to understand, it hurts with the performance tradeoff...

If we are concerned about compatibility, we need to use getBucketDetails(vol, buck) in the initial try catch, as otherwise, we are causing performance degradation of a lot of unrelated calls, but if we do not want to change the behaviour of BucketManager, we need to change how we handle errors, and there get volume first, and create only if it does not exist I believe...

On the other hand, I would leave the change of BucketManager still on the table, as if volume does not exists, it would be the correct behaviour to throw a volume not found, instead of the bucket not found or any other resultcode in the OMException. If you check, createBucket already does it this way, and if getBucketInfo(vol, buck) switches to this behaviour, your solution is the one I would vote for...

@smengcl smengcl marked this pull request as ready for review November 11, 2021 01:28
@smengcl
Copy link
Contributor Author

smengcl commented Nov 12, 2021

Hi @fapifta , thanks for the comment. I have updated the PR. It is ready for review. Would you like to take a look?

@smengcl smengcl requested a review from fapifta November 12, 2021 19:44
Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @smengcl,

thank you for working this out, and adding the tests.

Unfortunate fact that the BucketManager test is ignored, so won't be running, can we fix that by removing the @ignore from the test, and add it to just the methods that are failing, and leave the green tests, and the new test as active running test cases?
As I checked, testGetBucketInfo and testCreateBucketWithoutVolume are green, and testGetBucketInfoForInvalidBucket is relevant to this change, and can be fixed, other tests I haven't checked.

I have created HDDS-5986 to fix the TestRootedOzoneFileSystem initialization problem along with some test failures that will come up after the fix.

Also please find one comment inline about an other minor change.

@smengcl
Copy link
Contributor Author

smengcl commented Nov 14, 2021

Thanks @fapifta for filing HDDS-5986. I have already filed HDDS-5969 earlier that has a patch file attached to it to address the same issue so I have marked HDDS-5986 as duplicated.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice... I should have search jira before creating the ticket :)

If CI is green then I think we are good to go here.

Change-Id: Ic38e582cc96c859087b98baad2467af37ba7bdbe
@smengcl
Copy link
Contributor Author

smengcl commented Nov 16, 2021

Thanks @fapifta for the +1.

Finally got a green CI run after 6+ retriggers. master branch tests are quite a bit unstable lately.

Merging in a minute.

@smengcl smengcl merged commit b4a785c into apache:master Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants