HDDS-8437. [FSO] S3A compatibility - dfs -mkdir creates a zero byte file instead of a directory #5124

mladjan-gadzic · 2023-07-27T12:26:13Z

What changes were proposed in this pull request?

Content length is not reliable measure to determine wheter dir should be created. In this case Hadoop does not provide any content, but because body is still chunk signed, length is not 0 but 86. Because of this, zero byte file is created instead of dir.

Proposed solution is to have additional check wheter key path has trailing slash. If it has, dir will be created, otherwise file.

Info on the Marker Retention Property:

In order for Hadoop to be 100% compatible with FSO layout, it is necessary to set property:

<property>
  <name>fs.s3a.directory.marker.retention</name>
  <value>keep</value>
</property>

The following shows how the marker.retention property is needed for FSO to be compatible with Hadoop's S3A fs. It is here for reference purposes but understanding it isn't necessary to review this PR.

To demonstrate this, let's use simple case such as:

create dir a1
create dir b1 under a1

Dir a1 can be safely created. Listing of keys on Ozone side would show key as a1/ with size of 0. Attempt to create b1 as a1/b1/ would succeed on Ozone's side. Listing of keys on it would show keys such as a1/ and a1/b1/ both with size of 0. Issue happens on Hadoop side because issued command would hang (never finish) with warning logs such as:

2023-07-26 13:53:33,937 WARN impl.MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects; failure count = 1
2023-07-26 13:53:33,938 WARN impl.MultiObjectDeleteSupport: InternalError: a1/: Directory is not empty. Key:a1

The reason for this is that Hadoop tries to delete a1/ key and just leave a1/b1/ which is alright for OBS, but FSO does not work like that. Because of this above-mentioned propery is necesarry in order for Hadoop to be compatible with FSO over S3a. After setting above property there are no warning logs, but instead something like this:

2023-07-26 22:30:43,950 INFO impl.DirectoryPolicyImpl: Directory markers will be kept

Full logs from Hadoop side:

ubuntu@ip-172-31-30-88:~/hadoop-3.3.4$ bin/hdfs dfs -Dfs.s3a.access.key=1 -Dfs.s3a.secret.key=1 -Dfs.s3a.endpoint=http://localhost:9878/ -Dfs.s3a.path.style.access=true -mkdir s3a://fso/c1/b1
2023-07-26 22:29:55,358 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2023-07-26 22:29:55,507 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2023-07-26 22:29:55,507 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started
2023-07-26 22:30:43,950 INFO impl.DirectoryPolicyImpl: Directory markers will be kept
2023-07-26 22:30:49,531 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
2023-07-26 22:30:49,531 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
2023-07-26 22:30:49,532 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8437

How was this patch tested?

manual test
For info on steps how to reproduce the issue check linked Apache Jira.

errose28

Thanks for working on the S3a compatibility @mladjan-gadzic.

My understanding of IOUtils#toString is that it is going to do a full buffer copy and load the result into memory. I don't think we can do this for all data coming into s3g put. Can we just say if ozone.s3g.fso.directory.creation is enabled and a file is written with a trailing slash to an FSO bucket, we will treat it as a directory? Another more complicated option may be to seek past the auth header information in the stream to see if any content remains after that, but I'm not very familiar with the chunked layout authorization header format to know if this is feasible.

errose28 · 2023-08-14T23:28:19Z

Also please check the failing unit test which looks related.

kerneltime · 2023-08-15T01:04:54Z

There is no notion of the directory in S3; can we elaborate on the exact ask here? There is a reverse request to skip directory objects on listing for FSO buckets in S3. If we create directories for zero-byte objects in S3G over FSO, what should the behavior be for listing directories?

mladjan-gadzic · 2023-08-15T07:47:25Z

Thanks for working on the S3a compatibility @mladjan-gadzic.

My understanding of IOUtils#toString is that it is going to do a full buffer copy and load the result into memory. I don't think we can do this for all data coming into s3g put. Can we just say if ozone.s3g.fso.directory.creation is enabled and a file is written with a trailing slash to an FSO bucket, we will treat it as a directory? Another more complicated option may be to seek past the auth header information in the stream to see if any content remains after that, but I'm not very familiar with the chunked layout authorization header format to know if this is feasible.

Thanks @errose28 for taking the time to review this PR! I will try suggestions and let you know how it went.

mladjan-gadzic · 2023-08-15T07:48:34Z

Also please check the failing unit test which looks related.

It is due to IOUtils#toString not creating copy but consuming InputStream. Hopefully this will be fixed with suggestions from #5124 (review)

mladjan-gadzic · 2023-08-15T08:10:58Z

There is no notion of the directory in S3; can we elaborate on the exact ask here? There is a reverse request to skip directory objects on listing for FSO buckets in S3. If we create directories for zero-byte objects in S3G over FSO, what should the behavior be for listing directories?

Directories are already being created by default for zero-byte files over FSO bucket layout. Please check this PR for more context. ozone.s3g.fso.directory.creation property was introduced to help handle this in case there is no need for directory creation.

Current PR is only extension on previous to be compatible with S3a (specifically dfs mkdirs command).

I am not familiar with request to skip directory objects on listing. Could you please provide more context how current PR breaks it?

errose28 · 2023-08-15T23:40:37Z

Hi @mladjan-gadzic. @kerneltime and I had some offline discussion about these changes and this is what we found:

It looks like Ozone did not actually implement multi chunk signature support in HDDS-693 (5 years ago), only "initial support" to quote the jira. I don't see any code verifying these signatures, just skipping over them.
Based on the AWS docs it looks like we can check the x-amz-decoded-content-length header to get the length of the data excluding the signature. The length that was being checked previously came from the Content-Length header.
Going forward we probably need a design doc and parent jira for s3a over FSO. What we are aiming to accomplish is for these two streams of communication to produce identical results:
1. FS client <-> S3a <-> s3 gateway <-> FSO bucket
2. FS client <-> FSO bucket

This requirement to make the s3 gateway "transparent" will require some more planning like Ritesh mentioned instead of just ad-hoc patches as issues arise.

errose28 · 2023-08-29T17:16:18Z

Hi @mladjan-gadzic have you had a chance to see if using the x-amz-decoded-content-length header resolves the issue? We may be able to find someone to help finish this PR if you do not have the bandwidth right now.

mladjan-gadzic · 2023-08-29T19:35:27Z

Hi @mladjan-gadzic have you had a chance to see if using the x-amz-decoded-content-length header resolves the issue? We may be able to find someone to help finish this PR if you do not have the bandwidth right now.

Hi @errose28. Yes, using mentioned header resolves the issue. Thanks for the suggestion. I also did round of manual tests to confirm it works.

errose28

Thanks for the update @mladjan-gadzic. Just left some minor comments but the approach looks good overall.

errose28 · 2023-08-29T20:46:12Z

hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java

+          bucket.getBucketLayout() == BucketLayout.FILE_SYSTEM_OPTIMIZED;
+
+      String amzDecodedLength =
+          headers.getHeaderString("x-amz-decoded-content-length");


Let's define "x-amz-decoded-content-length" in S3Consts where we have other x-amz header strings defined.

errose28 · 2023-08-29T20:49:45Z

hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java

+  private static boolean hasTrailingSlash(String keyPath) {
+    return keyPath.charAt(keyPath.length() - 1) == '/';
+  }


Looks like this method is not used anymore, I think we can remove it.

errose28 · 2023-08-29T20:51:47Z

hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java

+  private void createDirectory(String volumeName,
+      String bucketName, String keyPath) throws IOException {
+    getClientProtocol()
+        .createDirectory(volumeName, bucketName, keyPath);
+  }


In the current change set this helper method is only called once. I think it can be removed.

mladjan-gadzic · 2023-08-29T22:08:35Z

Thanks for the update @mladjan-gadzic. Just left some minor comments but the approach looks good overall.

Thanks @errose28 for the quick review!

errose28

Thanks for the updates and testing @mladjan-gadzic. LGTM.

errose28 · 2023-08-29T22:24:10Z

I just approved the CI on this PR. Once it is green I can merge it.

mladjan-gadzic · 2023-08-30T08:53:48Z

I just approved the CI on this PR. Once it is green I can merge it.

I run tests that failed and it succeeded both locally and on remote. If you can, please re-run.

errose28 · 2023-08-30T17:22:11Z

Test failure looks like it is being fixed in #5217, which should be merged soon. I will wait for that to go in before merging this.

errose28 · 2023-09-05T18:33:20Z

Looks like #5217 is still being worked on. Since this is a known issue that is failing multiple times, I will go ahead and merge this.

mladjan-gadzic · 2023-09-06T09:18:36Z

@errose28 thank you for reviewing this!

… zero byte file instead of a directory (apache#5124) (cherry picked from commit 1f57ff6) Change-Id: Idbb33446be3b5faee0b129f54b86fa01bdc88af8

Create directory if body is empty

15ea2c7

errose28 self-requested a review July 31, 2023 16:06

Merge remote-tracking branch 'upstream/master' into HDDS-8437

750158e

umamaheswararao requested a review from kerneltime August 14, 2023 16:33

errose28 reviewed Aug 14, 2023

View reviewed changes

Create dir if key path has trailing slash

b2440b6

Revert non necessary change

43dbf72

Uze x-amz-decoded-content-length header instead of trailing slash

208012e

errose28 reviewed Aug 29, 2023

View reviewed changes

Address review comments

8730ee1

errose28 approved these changes Aug 29, 2023

View reviewed changes

errose28 merged commit 1f57ff6 into apache:master Sep 5, 2023

HDDS-8437. [FSO] S3A compatibility - dfs -mkdir creates a zero byte file instead of a directory #5124

HDDS-8437. [FSO] S3A compatibility - dfs -mkdir creates a zero byte file instead of a directory #5124

Uh oh!

Conversation

mladjan-gadzic commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Info on the Marker Retention Property:

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 commented Aug 14, 2023

Uh oh!

kerneltime commented Aug 15, 2023

Uh oh!

mladjan-gadzic commented Aug 15, 2023

Uh oh!

mladjan-gadzic commented Aug 15, 2023

Uh oh!

mladjan-gadzic commented Aug 15, 2023

Uh oh!

errose28 commented Aug 15, 2023

Uh oh!

errose28 commented Aug 29, 2023

Uh oh!

mladjan-gadzic commented Aug 29, 2023

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

mladjan-gadzic Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

errose28 Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

mladjan-gadzic Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

errose28 Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

mladjan-gadzic Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

mladjan-gadzic commented Aug 29, 2023

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 commented Aug 29, 2023

Uh oh!

mladjan-gadzic commented Aug 30, 2023

Uh oh!

errose28 commented Aug 30, 2023

Uh oh!

errose28 commented Sep 5, 2023

Uh oh!

mladjan-gadzic commented Sep 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mladjan-gadzic commented Jul 27, 2023 •

edited

Loading