Skip to content

Conversation

@mladjan-gadzic
Copy link
Contributor

@mladjan-gadzic mladjan-gadzic commented Jul 27, 2023

What changes were proposed in this pull request?

Content length is not reliable measure to determine wheter dir should be created. In this case Hadoop does not provide any content, but because body is still chunk signed, length is not 0 but 86. Because of this, zero byte file is created instead of dir.

Proposed solution is to have additional check wheter key path has trailing slash. If it has, dir will be created, otherwise file.

Info on the Marker Retention Property:

In order for Hadoop to be 100% compatible with FSO layout, it is necessary to set property:

<property>
  <name>fs.s3a.directory.marker.retention</name>
  <value>keep</value>
</property>

The following shows how the marker.retention property is needed for FSO to be compatible with Hadoop's S3A fs. It is here for reference purposes but understanding it isn't necessary to review this PR.

To demonstrate this, let's use simple case such as:

  1. create dir a1
  2. create dir b1 under a1

Dir a1 can be safely created. Listing of keys on Ozone side would show key as a1/ with size of 0. Attempt to create b1 as a1/b1/ would succeed on Ozone's side. Listing of keys on it would show keys such as a1/ and a1/b1/ both with size of 0. Issue happens on Hadoop side because issued command would hang (never finish) with warning logs such as:

2023-07-26 13:53:33,937 WARN impl.MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects; failure count = 1
2023-07-26 13:53:33,938 WARN impl.MultiObjectDeleteSupport: InternalError: a1/: Directory is not empty. Key:a1

The reason for this is that Hadoop tries to delete a1/ key and just leave a1/b1/ which is alright for OBS, but FSO does not work like that. Because of this above-mentioned propery is necesarry in order for Hadoop to be compatible with FSO over S3a. After setting above property there are no warning logs, but instead something like this:

2023-07-26 22:30:43,950 INFO impl.DirectoryPolicyImpl: Directory markers will be kept

Full logs from Hadoop side:

ubuntu@ip-172-31-30-88:~/hadoop-3.3.4$ bin/hdfs dfs -Dfs.s3a.access.key=1 -Dfs.s3a.secret.key=1 -Dfs.s3a.endpoint=http://localhost:9878/ -Dfs.s3a.path.style.access=true -mkdir s3a://fso/c1/b1
2023-07-26 22:29:55,358 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2023-07-26 22:29:55,507 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2023-07-26 22:29:55,507 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started
2023-07-26 22:30:43,950 INFO impl.DirectoryPolicyImpl: Directory markers will be kept
2023-07-26 22:30:49,531 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
2023-07-26 22:30:49,531 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
2023-07-26 22:30:49,532 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8437

How was this patch tested?

  • manual test
    For info on steps how to reproduce the issue check linked Apache Jira.

@errose28 errose28 self-requested a review July 31, 2023 16:06
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the S3a compatibility @mladjan-gadzic.

My understanding of IOUtils#toString is that it is going to do a full buffer copy and load the result into memory. I don't think we can do this for all data coming into s3g put. Can we just say if ozone.s3g.fso.directory.creation is enabled and a file is written with a trailing slash to an FSO bucket, we will treat it as a directory? Another more complicated option may be to seek past the auth header information in the stream to see if any content remains after that, but I'm not very familiar with the chunked layout authorization header format to know if this is feasible.

@errose28
Copy link
Contributor

Also please check the failing unit test which looks related.

@kerneltime
Copy link
Contributor

There is no notion of the directory in S3; can we elaborate on the exact ask here? There is a reverse request to skip directory objects on listing for FSO buckets in S3. If we create directories for zero-byte objects in S3G over FSO, what should the behavior be for listing directories?

@mladjan-gadzic
Copy link
Contributor Author

Thanks for working on the S3a compatibility @mladjan-gadzic.

My understanding of IOUtils#toString is that it is going to do a full buffer copy and load the result into memory. I don't think we can do this for all data coming into s3g put. Can we just say if ozone.s3g.fso.directory.creation is enabled and a file is written with a trailing slash to an FSO bucket, we will treat it as a directory? Another more complicated option may be to seek past the auth header information in the stream to see if any content remains after that, but I'm not very familiar with the chunked layout authorization header format to know if this is feasible.

Thanks @errose28 for taking the time to review this PR! I will try suggestions and let you know how it went.

@mladjan-gadzic
Copy link
Contributor Author

Also please check the failing unit test which looks related.

It is due to IOUtils#toString not creating copy but consuming InputStream. Hopefully this will be fixed with suggestions from #5124 (review)

@mladjan-gadzic
Copy link
Contributor Author

There is no notion of the directory in S3; can we elaborate on the exact ask here? There is a reverse request to skip directory objects on listing for FSO buckets in S3. If we create directories for zero-byte objects in S3G over FSO, what should the behavior be for listing directories?

Directories are already being created by default for zero-byte files over FSO bucket layout. Please check this PR for more context. ozone.s3g.fso.directory.creation property was introduced to help handle this in case there is no need for directory creation.

Current PR is only extension on previous to be compatible with S3a (specifically dfs mkdirs command).

I am not familiar with request to skip directory objects on listing. Could you please provide more context how current PR breaks it?

@errose28
Copy link
Contributor

Hi @mladjan-gadzic. @kerneltime and I had some offline discussion about these changes and this is what we found:

  1. It looks like Ozone did not actually implement multi chunk signature support in HDDS-693 (5 years ago), only "initial support" to quote the jira. I don't see any code verifying these signatures, just skipping over them.
  2. Based on the AWS docs it looks like we can check the x-amz-decoded-content-length header to get the length of the data excluding the signature. The length that was being checked previously came from the Content-Length header.
  3. Going forward we probably need a design doc and parent jira for s3a over FSO. What we are aiming to accomplish is for these two streams of communication to produce identical results:
    1. FS client <-> S3a <-> s3 gateway <-> FSO bucket
    2. FS client <-> FSO bucket
  • This requirement to make the s3 gateway "transparent" will require some more planning like Ritesh mentioned instead of just ad-hoc patches as issues arise.

@errose28
Copy link
Contributor

Hi @mladjan-gadzic have you had a chance to see if using the x-amz-decoded-content-length header resolves the issue? We may be able to find someone to help finish this PR if you do not have the bandwidth right now.

@mladjan-gadzic
Copy link
Contributor Author

Hi @mladjan-gadzic have you had a chance to see if using the x-amz-decoded-content-length header resolves the issue? We may be able to find someone to help finish this PR if you do not have the bandwidth right now.

Hi @errose28. Yes, using mentioned header resolves the issue. Thanks for the suggestion. I also did round of manual tests to confirm it works.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @mladjan-gadzic. Just left some minor comments but the approach looks good overall.

bucket.getBucketLayout() == BucketLayout.FILE_SYSTEM_OPTIMIZED;

String amzDecodedLength =
headers.getHeaderString("x-amz-decoded-content-length");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define "x-amz-decoded-content-length" in S3Consts where we have other x-amz header strings defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 329 to 331
private static boolean hasTrailingSlash(String keyPath) {
return keyPath.charAt(keyPath.length() - 1) == '/';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this method is not used anymore, I think we can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 333 to 337
private void createDirectory(String volumeName,
String bucketName, String keyPath) throws IOException {
getClientProtocol()
.createDirectory(volumeName, bucketName, keyPath);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current change set this helper method is only called once. I think it can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mladjan-gadzic
Copy link
Contributor Author

Thanks for the update @mladjan-gadzic. Just left some minor comments but the approach looks good overall.

Thanks @errose28 for the quick review!

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates and testing @mladjan-gadzic. LGTM.

@errose28
Copy link
Contributor

I just approved the CI on this PR. Once it is green I can merge it.

@mladjan-gadzic
Copy link
Contributor Author

I just approved the CI on this PR. Once it is green I can merge it.

I run tests that failed and it succeeded both locally and on remote. If you can, please re-run.

@errose28
Copy link
Contributor

Test failure looks like it is being fixed in #5217, which should be merged soon. I will wait for that to go in before merging this.

@errose28
Copy link
Contributor

errose28 commented Sep 5, 2023

Looks like #5217 is still being worked on. Since this is a known issue that is failing multiple times, I will go ahead and merge this.

@errose28 errose28 merged commit 1f57ff6 into apache:master Sep 5, 2023
@mladjan-gadzic
Copy link
Contributor Author

@errose28 thank you for reviewing this!

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
… zero byte file instead of a directory (apache#5124)

(cherry picked from commit 1f57ff6)
Change-Id: Idbb33446be3b5faee0b129f54b86fa01bdc88af8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants