8x speed-up by buffering of InputStream during reading of uncompressed files#49
Merged
josemduarte merged 1 commit intorcsb:masterfrom Sep 3, 2019
Merged
8x speed-up by buffering of InputStream during reading of uncompressed files#49josemduarte merged 1 commit intorcsb:masterfrom
josemduarte merged 1 commit intorcsb:masterfrom
Conversation
Collaborator
|
Excellent, Sebastian!
…On Mon, Jul 29, 2019, 3:21 PM Jose Manuel Duarte ***@***.***> wrote:
***@***.**** approved this pull request.
Fantastic, thank you!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AA53AEGKDPKI3PPPY6IQQQTQB5UNZA5CNFSM4IHWTA2KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB75HBDQ#pullrequestreview-268071054>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA53AEANYFJNJCT7TLNHGUDQB5UNZANCNFSM4IHWTA2A>
.
|
Member
|
@JonStargaryen do you know if this is this still relevant when running under java 11+ JRE? If so I'd like to merge and release a new bugfix as soon as possible. |
Member
Author
|
@josemduarte Yeah, it's still an issue on Java 11. I didn't run it with warm-up iterations or redundancy though but the trend is clear. Here the times to read the archive:
Probably a good idea to release a new version on Maven before releasing BioJava. |
Member
|
Thanks, @JonStargaryen ! I'll go ahead and make a new release today |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I ran some benchmarks for the BinaryCIF project/format and in comparison the Java implementation of the MMTF codec was surprisingly slow. Especially when uncompressed (non-gzipped) files were processed. Find benchmark details in the RCSB internal
ciftools-performancerepo.By employing a
BufferedInputStreamwith 65536 buffer size the performance can be improved drastically, resulting in a traversal of the currently 154k structures in 70 s (10 minutes with the current code).For comparison, read times for BinaryCIF and mmCIF parsing are given (which should be slower due to higher overhead). A performance increase for gzipped files can be expected by using a
GZIPInputStreamwith an equally sized buffer of 65536 (in contrast to the default buffer of 512 bytes).