8x speed-up by buffering of InputStream during reading of uncompressed files by sbittrich · Pull Request #49 · rcsb/mmtf-java

sbittrich · 2019-07-29T22:00:35Z

I ran some benchmarks for the BinaryCIF project/format and in comparison the Java implementation of the MMTF codec was surprisingly slow. Especially when uncompressed (non-gzipped) files were processed. Find benchmark details in the RCSB internal ciftools-performance repo.

By employing a BufferedInputStream with 65536 buffer size the performance can be improved drastically, resulting in a traversal of the currently 154k structures in 70 s (10 minutes with the current code).

For comparison, read times for BinaryCIF and mmCIF parsing are given (which should be slower due to higher overhead). A performance increase for gzipped files can be expected by using a GZIPInputStream with an equally sized buffer of 65536 (in contrast to the default buffer of 512 bytes).

coveralls · 2019-07-29T22:04:17Z

Coverage increased (+0.02%) to 81.926% when pulling 7960934 on JonStargaryen:master into 59287c5 on rcsb:master.

josemduarte

Fantastic, thank you!

pwrose · 2019-07-30T08:15:34Z

Excellent, Sebastian!

…

On Mon, Jul 29, 2019, 3:21 PM Jose Manuel Duarte ***@***.***> wrote: ***@***.**** approved this pull request. Fantastic, thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#49?email_source=notifications&email_token=AA53AEGKDPKI3PPPY6IQQQTQB5UNZA5CNFSM4IHWTA2KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB75HBDQ#pullrequestreview-268071054>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA53AEANYFJNJCT7TLNHGUDQB5UNZANCNFSM4IHWTA2A> .

josemduarte · 2019-09-02T18:33:20Z

@JonStargaryen do you know if this is this still relevant when running under java 11+ JRE?

If so I'd like to merge and release a new bugfix as soon as possible.

sbittrich · 2019-09-03T17:43:03Z

@josemduarte Yeah, it's still an issue on Java 11. I didn't run it with warm-up iterations or redundancy though but the trend is clear.

Here the times to read the archive:

Benchmark	Mode
MMTF explicitly buffered	71.851 s/op
MMTF current impl	547.395 s/op

Probably a good idea to release a new version on Maven before releasing BioJava.

josemduarte · 2019-09-03T17:56:01Z

Thanks, @JonStargaryen !

I'll go ahead and make a new release today

buffering of InputStream during reading

7960934

josemduarte approved these changes Jul 29, 2019

View reviewed changes

josemduarte merged commit 8c59b29 into rcsb:master Sep 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8x speed-up by buffering of InputStream during reading of uncompressed files#49

8x speed-up by buffering of InputStream during reading of uncompressed files#49
josemduarte merged 1 commit intorcsb:masterfrom
sbittrich:master

sbittrich commented Jul 29, 2019

Uh oh!

coveralls commented Jul 29, 2019

Uh oh!

josemduarte left a comment

Uh oh!

pwrose commented Jul 30, 2019 via email

Uh oh!

josemduarte commented Sep 2, 2019

Uh oh!

sbittrich commented Sep 3, 2019

Uh oh!

josemduarte commented Sep 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sbittrich commented Jul 29, 2019

Uh oh!

coveralls commented Jul 29, 2019

Uh oh!

josemduarte left a comment

Choose a reason for hiding this comment

Uh oh!

pwrose commented Jul 30, 2019 via email

Uh oh!

josemduarte commented Sep 2, 2019

Uh oh!

sbittrich commented Sep 3, 2019

Uh oh!

josemduarte commented Sep 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants