adding H5MDWriter to H5MD.py by edisj · Pull Request #3189 · MDAnalysis/mdanalysis

edisj · 2021-03-24T23:40:28Z

Changes made in this Pull Request:

adds class H5MDWriter to H5MD.py

This is a continuation from #2787 and #2869. I closed #2869 because I tried rebasing to current develop and ended up making a mess of things, so it was easier to make a new pull request.

At the moment, the writer has all of the functionality it needs. It writes all of the data from the timestep (positions, velocities, etc), it writes everything from the ts.data dictionary that isn't 'time', 'step', 'dt' into the 'observables' group. It can write datasets with chunked and/or compressed configurations, and files can be opened with MPI drivers.

The current to-do list is:

~~Benchmark the writer on multiple nodes with different configurations (chunked, compression)~~
Have the code reviewed and clean things up
Figure out what to do the shape argument for step and time datasets when the number of frames written isn't known by the writer. This could be solved by chunking, but I'm not sure what the default chunk size should be for these datasets.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

…to issue2866-h5mdwriter

pep8speaks · 2021-03-24T23:40:32Z

Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/coordinates/H5MD.py:

Line 107:80: E501 line too long (94 > 79 characters)
Line 973:80: E501 line too long (101 > 79 characters)
Line 974:80: E501 line too long (83 > 79 characters)
Line 1335:80: E501 line too long (80 > 79 characters)

In the file testsuite/MDAnalysisTests/coordinates/test_h5md.py:

Line 262:9: E128 continuation line under-indented for visual indent
Line 415:9: E128 continuation line under-indented for visual indent
Line 416:9: E128 continuation line under-indented for visual indent
Line 509:80: E501 line too long (80 > 79 characters)

In the file testsuite/MDAnalysisTests/coordinates/test_writer_api.py:

Line 55:80: E501 line too long (89 > 79 characters)
Line 84:80: E501 line too long (89 > 79 characters)
Line 110:80: E501 line too long (89 > 79 characters)

Comment last updated at 2021-08-21 10:28:58 UTC

package/MDAnalysis/coordinates/H5MD.py

IAlibay · 2021-04-22T18:01:36Z

@edisj @orbeckst what's the target milestone for this? (i.e. is this a long term goal or are you planning 2.0/2.1?)

orbeckst · 2021-04-22T18:04:46Z

It's not crucial for 2.0 and could go into 2.1 (at this point there will likely be a citation for a SciPy paper); if @edisj wants to work double-time to attempt to squeeze it into a 2.0 then I won't object, though.

edisj · 2021-04-22T18:50:21Z

Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?). I'm still testing some things with HPC benchmarks before updating this PR

IAlibay · 2021-04-22T19:19:08Z

Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?).

So unless I misunderstood our workshop targets, the 10th is really the day it needs to be on conda-forge. I think we'd have to put down the code freeze at the latest the Friday before, i.e. May 7th.

…to issue2866-h5mdwriter

edisj · 2021-06-06T07:58:46Z

Hey @orbeckst , ready for review! :) (also @IAlibay if you wanted to take a look)

I've made a lot of changes over the last few days and think it's complete other than maybe a couple exceptions to add and documentation to edit. I'll finish up the tests for codecov tomorrow

There are a few parts of the code I'm not sure about, I'll open some discussions...

codecov · 2021-06-06T08:06:51Z

Codecov Report

Merging #3189 (2f4038c) into develop (bd0ed5d) will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #3189      +/-   ##
===========================================
+ Coverage    93.70%   93.79%   +0.08%     
===========================================
  Files          177      177              
  Lines        22990    23206     +216     
  Branches      3247     3299      +52     
===========================================
+ Hits         21542    21765     +223     
+ Misses        1397     1390       -7     
  Partials        51       51

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/base.py	`96.06% <ø> (+0.71%)`	⬆️
package/MDAnalysis/coordinates/H5MD.py	`97.53% <100.00%> (+3.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd0ed5d...2f4038c. Read the comment docs.

IAlibay

Sorry only had time to have a quick glance over, so it's mostly superficial comments.

I'll probably wait until you've added the extra tests before I re-review.

package/MDAnalysis/coordinates/H5MD.py

IAlibay · 2021-06-07T13:42:13Z

package/MDAnalysis/coordinates/H5MD.py

 import MDAnalysis as mda
+import numpy as np
 from . import base, core
 from .base import Timestep


I don't think you're using Timestep anywhere right? (it's all just inherited from ReaderBase and WriterBase)

@richardjgowers added that in #3132 but I think it can be removed, unless there was a specific reason for it

Wow I didn't realize I had to hit "submit review" for my comments to show up, I thought I had already replied to your review comments sorry @IAlibay :(

If it's not used explicitly, remove it.

package/MDAnalysis/coordinates/H5MD.py

IAlibay · 2021-06-07T13:51:21Z

package/MDAnalysis/coordinates/H5MD.py

+        self.h5md_file = None
+
+        # check which datasets are to be written
+        self.has_positions = kwargs.get('positions', False)


I know it's something we do for the other writers (although I'm trying to change this for the NetCDF writer), so I'm kinda asking to get @orbeckst's opinion on this.

There's a lot of kwargs here, and it's not immediately clear to me that they have user-facing documentation. Would it be worth not making these kwargs but instead named optional arguments?

Yes, that's a good idea. We still need the catch-all **kwargs for anything else that we do not care about but having explicit kwarg names in the signature (+ defaults) is good.

package/MDAnalysis/coordinates/H5MD.py

IAlibay

Having a deeper look but here are some initial comments from a quick scan.

testsuite/MDAnalysisTests/data/coordinates/create_h5md_data.py

package/MDAnalysis/coordinates/H5MD.py

IAlibay

--> review through to line 1240 of H5MD.py

Sorry I'm going to have to break up my reviews into chunks. Here is the first part of my review, I'll follow up with the rest probably tomorrow at this rate.

package/MDAnalysis/coordinates/H5MD.py

IAlibay

Second half of review.

package/MDAnalysis/coordinates/H5MD.py

testsuite/MDAnalysisTests/coordinates/test_h5md.py

orbeckst

Minor things & mostly agreeing with @IAlibay (except that I think i can stay ;-) ).

You'll have to ask @IAlibay how fast you need to be in order to try and slip this into 2.0.0.

package/MDAnalysis/coordinates/H5MD.py

testsuite/MDAnalysisTests/coordinates/test_h5md.py

IAlibay · 2021-08-20T23:27:52Z

I can try to re-review a bit later today, although if you think all my comments were addressed please do go ahead with the merge @orbeckst

orbeckst · 2021-08-20T23:31:27Z

Your point about "check exception messages" was a good one — I am currently watching @edisj find out which exceptions are actually triggered .... so my comments had been addressed, yours should be done in the next hour.

edisj · 2021-08-20T23:58:25Z

Hi @IAlibay , I just pushed my final changes, all comments should be addressed. All error tests now have a match. I hope there's still time to make it into 2.0.0...

If you have time to review, there are a couple of decisions made while talking to @orbeckst that might need a little justification:

Removed the has_* setters from the writer because we realized that in order to actually write a test to activate them, we had to write code that wouldn't ever be used for an H5MD trajectory, so the setter itself was pointless
Passed defaults (compression, compression_opts, driver) from input H5MD into H5MDReader.Writer(), but intentionally left out chunks. This is because H5MDWriter chooses (1, n_atoms, 3) by default, and accidently choosing a bad chunk shape can be really bad for performance. I added a detailed note in the docstring of H5MDReader.Writer() to explain the situation and also say how to manually set the chunk shape if you wanted to.
EDIT: more reasoning- if the original file was written with a poor chunk layout (which turns out can easily happen due to h5py's auto-chunking), then H5MDReader.Writer() will at the very least not propagate a bad chunk shape, which is good if the user isn't aware of chunking. But in cases where an advanced user knows how to use chunking, the option is still there (and documented) on how to set it manually

Thanks again for your review :D, if there's anything else that remains let me know

IAlibay

Thanks @edisj I'm just having a quick look through now. Just doing this quick pre-review comment to include this duecredit thing - not sure if it needs testing or not ?

package/MDAnalysis/coordinates/H5MD.py

IAlibay

Just the one comment from a quick scan though. I'll directly approve now since it's just a documentation style thing.

package/MDAnalysis/coordinates/H5MD.py

…to issue2866-h5mdwriter

edisj · 2021-08-21T05:23:06Z

@IAlibay @orbeckst just merged with current develop for good measure, but I think the final points regarding the duecredit test and the backticks are resolved

richardjgowers

Looks good, just a couple doc quibbles that can slide

package/MDAnalysis/coordinates/H5MD.py

richardjgowers · 2021-08-21T07:24:53Z

package/MDAnalysis/coordinates/H5MD.py

+            W.write(u)
+
+To write an H5MD file with contiguous datasets, you must specifly the
+number of frames to be written and set ``chunks=False``:


Maybe a quick one line hint as to why someone would want to do this

There's a note for it on line 948, although not sure if it fully covers what you're asking @richardjgowers ?

package/MDAnalysis/coordinates/base.py

IAlibay · 2021-08-21T10:33:31Z

Seems to be everything, I'll merge once CI returns green - thanks for all the work @edisj 🎉

orbeckst · 2021-08-21T15:59:52Z

Congratulations @edisj — PR with >200 comments and what turned out to be one of the more complicated formats.

Given that this could be considered, together with the SciPy paper, the capstone of your NSF REU experience, would you want to write a blog post for https://www.mdanalysis.org/blog/ where you summarize your experience and achievements?

edisj added 3 commits March 17, 2021 16:32

new branch, initial commit with working rough draft of writer

2f5f033

Merge branch 'develop' of https://github.com/MDAnalysis/mdanalysis in…

a945c15

…to issue2866-h5mdwriter

writer with all functions

c064581

edisj mentioned this pull request Mar 24, 2021

adding H5MD-format writer #2869

Closed

4 tasks

IAlibay reviewed Mar 24, 2021

View reviewed changes

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved

changed create_dataset to require_dataset

df690a3

IAlibay added this to the 2.1.0 milestone Apr 22, 2021

orbeckst self-assigned this May 5, 2021

orbeckst added the NSF REU NSF Research Experience for Undergraduates project label May 5, 2021

edisj marked this pull request as draft May 7, 2021 23:28

edisj added 11 commits May 13, 2021 13:20

final modifications for now, TODO: tests

9002dc7

updated cobrotoxin.h5md to pass tests

edfa4c0

Merge branch 'develop' into issue2866-h5mdwriter

e308441

Merge branch 'develop' into issue2866-h5mdwriter

f830fc5

comitting for Agave testing

98b8d34

small mistake line 1090

40d7027

cleaned things up

d201c6d

Merge branch 'develop' of https://github.com/MDAnalysis/mdanalysis in…

a75c646

…to issue2866-h5mdwriter

changed how step and time dsets are written

c7f744b

typo

822b076

added unit new unit functionality and pass all tests

d54e5f5

edisj marked this pull request as ready for review June 6, 2021 07:51

IAlibay requested changes Jun 7, 2021

View reviewed changes

IAlibay requested changes Aug 13, 2021

View reviewed changes

testsuite/MDAnalysisTests/data/coordinates/create_h5md_data.py Show resolved Hide resolved

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved

IAlibay requested changes Aug 13, 2021

View reviewed changes

IAlibay requested changes Aug 14, 2021

View reviewed changes

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved

testsuite/MDAnalysisTests/coordinates/test_h5md.py Outdated Show resolved Hide resolved

orbeckst requested changes Aug 19, 2021

View reviewed changes

edisj added 2 commits August 20, 2021 13:22

addressing review comments

d389b13

adding test to hopefully cover setters

6cba0f0

orbeckst reviewed Aug 20, 2021

View reviewed changes

testsuite/MDAnalysisTests/coordinates/test_h5md.py Outdated Show resolved Hide resolved

orbeckst reviewed Aug 20, 2021

View reviewed changes

testsuite/MDAnalysisTests/coordinates/test_h5md.py Outdated Show resolved Hide resolved

removed setter

1ed2917

orbeckst approved these changes Aug 20, 2021

View reviewed changes

added match to all error tests and removed chunks from default settings

e00fc61

IAlibay reviewed Aug 21, 2021

View reviewed changes

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved

IAlibay approved these changes Aug 21, 2021

View reviewed changes

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved

edisj added 2 commits August 20, 2021 17:42

double ticks and pep8 fixes

e0d621a

added H5MD duecredit test

5a66dc4

IAlibay mentioned this pull request Aug 21, 2021

pre-release changes for 2.0.0 #3396

Closed

2 tasks

edisj added 2 commits August 20, 2021 18:44

purging excessive double back ticks

4f2f55c

Merge branch 'develop' of https://github.com/MDAnalysis/mdanalysis in…

387d562

…to issue2866-h5mdwriter

richardjgowers reviewed Aug 21, 2021

View reviewed changes

Update package/MDAnalysis/coordinates/H5MD.py

2f4038c

IAlibay merged commit ddb1d95 into MDAnalysis:develop Aug 21, 2021

IAlibay added the enhancement label Sep 25, 2023

edisj deleted the issue2866-h5mdwriter branch April 7, 2024 07:28

Conversation

edisj commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

pep8speaks commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-08-21 10:28:58 UTC

Uh oh!

Uh oh!

IAlibay commented Apr 22, 2021

Uh oh!

orbeckst commented Apr 22, 2021

Uh oh!

edisj commented Apr 22, 2021

Uh oh!

IAlibay commented Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edisj commented Jun 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IAlibay Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

edisj Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

edisj Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orbeckst Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IAlibay Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

orbeckst Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

edisj commented Mar 24, 2021 •

edited

Loading

pep8speaks commented Mar 24, 2021 •

edited

Loading

IAlibay commented Apr 22, 2021 •

edited

Loading

edisj commented Jun 6, 2021 •

edited

Loading

codecov bot commented Jun 6, 2021 •

edited

Loading

edisj Jun 11, 2021 •

edited

Loading

edisj commented Aug 20, 2021 •

edited

Loading