Skip to content

Add S3Handle#33

Merged
sbesson merged 55 commits intoome:masterfrom
manics-archive:s3
Nov 29, 2018
Merged

Add S3Handle#33
sbesson merged 55 commits intoome:masterfrom
manics-archive:s3

Conversation

@manics
Copy link
Copy Markdown
Member

@manics manics commented Oct 4, 2018

These are the commits from #32 with additional commits on top.

It should be possible to read a file off AWS S3, e.g. ./showinf s3://s3.us-east-1.amazonaws.com/czi.starfish.data.public/browse/raw/20180820/merfish_u2os/fov_57.tif if you don't mind waiting

Initialization took 2172.948s
54.351s elapsed (3019.5ms per plane)

Also includes primitive caching: BF_REMOTE_CACHE_ROOTDIR=$HOME/tmp/ BF_DEVEL=true BF_FLAGS="-cp $PWD/../../ome-common-java/target/ome-common-5.3.7-SNAPSHOT.jar:$PWD/bioformats_package.jar:$PWD" ./showinf s3://s3.us-east-1.amazonaws.com/czi.starfish.data.public/browse/raw/20180820/merfish_u2os/fov_57.tif though it still attempts to check for the existence of many other remote files

Initialization took 12.244s
2.402s elapsed (133.44444ms per plane)

Example full timings (including the time taken before the "Initialization" time is calculated) for s3://s3.us-east-1.amazonaws.com/czi.starfish.data.public/browse/raw/20180820/merfish_u2os/fov_57.tif:

  • Cache enabled, not cached: 1m19s
  • Cache enabled, cached: 32s
  • Cache disabled: 24m22s

Examples using a local minio server with the same file s3://INTERNAL.HOST/test.bucket/fov_57.tif

  • Cache enabled, not cached: 7s
  • Cache enabled, cached: 3s
  • Cache disabled: 1m48s

Conclusions so far:

  • Streaming a file is not practical due to the reader, probably due to the reader seeking backwards and therefore causing the stream to reset to the beginning
  • The initialisation (or pre-initialisation) steps which involve checking the existence of many other related files is very time-consuming, in this case approximately the same time as it takes to download the the 144MB file

@manics manics changed the title Add S3Handle (rebased and modified from #32) WIP: Add S3Handle (rebased and modified from #32) Oct 4, 2018
@snoopycrimecop
Copy link
Copy Markdown
Member

snoopycrimecop commented Oct 4, 2018

Conflicting PR. Removed from build OME-FILES-CPP-DEV-merge-push-superbuild#1069. See the console output for more details.
Possible conflicts:

--conflicts Conflict resolved in build BIOFORMATS-push#369. See the console output for more details.

Comment thread src/main/java/loci/common/Location.java Outdated
Comment thread src/main/java/loci/common/Location.java
@manics
Copy link
Copy Markdown
Member Author

manics commented Oct 15, 2018

Rebased on top of current master (6.0.0-SNAPSHOT) a449298

joshmoore and others added 20 commits October 24, 2018 17:09
URLs starting with `s3://` will now be passed to the
S3Handle. `./showinfo s3://bucket/path/file.tiff` then
opens the file as expected.
Changes include:
 - url may now be null even if isURL is not
 - Location(parent, child) is now the primary constructor
 - several special cases for s3 at the moment

Remaining TODOs:
 - untangle isDirectory/isFile/exists calls
 - fix list()
 - handle http:// similarly to s3://
 - add caching for various internal calls like exists()
 - remove openConnection wherever possible
 - handle absolute paths (cc: Seb)
In order to allow specifying non-AWS buckets,
endpoints and similar information need to be
encoded in the URL.
bucket.with.dots.is.indistinguishable.from.server.endpoint
Reduces the amount of magic in S3Handle

Also enables S3HandleTest
@manics manics reopened this Nov 5, 2018
Include the underlying Minio exception if these was one
It's misleading since we don't know whether it's found or not this early
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Nov 7, 2018

Restarted Travis which was reported false positives. With most of the comments addressed and the decision on minio made, what are the next steps before integrating this @manics ?

@manics
Copy link
Copy Markdown
Member Author

manics commented Nov 7, 2018

I could add more tests, but that could come now, or in a future PR. I think it's time to test this more heavily to find all related issues in the BF and OMERO stack.

@manics
Copy link
Copy Markdown
Member Author

manics commented Nov 13, 2018

Last commit adds some read/seek S3Handle tests

@manics
Copy link
Copy Markdown
Member Author

manics commented Nov 15, 2018

Last commit caches exists and length for URLs only.

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Nov 28, 2018

@manics: is this ready for a final round of review and integration into a formal 6.0.0 development milestone?

@manics
Copy link
Copy Markdown
Member Author

manics commented Nov 28, 2018

Yes

private URI uri;
private File file;

class URLLocationProperties {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer-term, this should likely be at least a static inner class if not a top-level class.

@joshmoore
Copy link
Copy Markdown
Member

👍 on the new commits. We’ll need a review of all the new inner classes in the Location & Handle hierarchies, but happy to do that as a follow on.

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Nov 29, 2018

Captured all the potential API reviews as https://trello.com/c/DEaOjION/93-ome-common-location-handle. Merging and I will cut a second mileston release of ome-common. Thanks for driving this

@sbesson sbesson merged commit 5c2e266 into ome:master Nov 29, 2018
@manics manics deleted the s3 branch November 29, 2018 17:39
@sbesson sbesson mentioned this pull request Jan 14, 2019
@manics manics restored the s3 branch February 5, 2019 10:27
@manics manics deleted the s3 branch February 5, 2019 10:29
@imagesc-bot
Copy link
Copy Markdown

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/creating-imagereader-without-setid-irods-s3/43807/5

@imagesc-bot
Copy link
Copy Markdown

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/bioformats-image-reader-reading-from-file-a-url-s3/44840/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants