adding H5MDWriter to H5MD.py#3189
Conversation
|
Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-21 10:28:58 UTC |
|
It's not crucial for 2.0 and could go into 2.1 (at this point there will likely be a citation for a SciPy paper); if @edisj wants to work double-time to attempt to squeeze it into a 2.0 then I won't object, though. |
|
Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?). I'm still testing some things with HPC benchmarks before updating this PR |
So unless I misunderstood our workshop targets, the 10th is really the day it needs to be on conda-forge. I think we'd have to put down the code freeze at the latest the Friday before, i.e. May 7th. |
…to issue2866-h5mdwriter
|
Hey @orbeckst , ready for review! :) (also @IAlibay if you wanted to take a look) I've made a lot of changes over the last few days and think it's complete other than maybe a couple exceptions to add and documentation to edit. I'll finish up the tests for codecov tomorrow There are a few parts of the code I'm not sure about, I'll open some discussions... |
Codecov Report
@@ Coverage Diff @@
## develop #3189 +/- ##
===========================================
+ Coverage 93.70% 93.79% +0.08%
===========================================
Files 177 177
Lines 22990 23206 +216
Branches 3247 3299 +52
===========================================
+ Hits 21542 21765 +223
+ Misses 1397 1390 -7
Partials 51 51
Continue to review full report at Codecov.
|
IAlibay
left a comment
There was a problem hiding this comment.
Sorry only had time to have a quick glance over, so it's mostly superficial comments.
I'll probably wait until you've added the extra tests before I re-review.
| import MDAnalysis as mda | ||
| import numpy as np | ||
| from . import base, core | ||
| from .base import Timestep |
There was a problem hiding this comment.
I don't think you're using Timestep anywhere right? (it's all just inherited from ReaderBase and WriterBase)
There was a problem hiding this comment.
@richardjgowers added that in #3132 but I think it can be removed, unless there was a specific reason for it
There was a problem hiding this comment.
Wow I didn't realize I had to hit "submit review" for my comments to show up, I thought I had already replied to your review comments sorry @IAlibay :(
There was a problem hiding this comment.
If it's not used explicitly, remove it.
| self.h5md_file = None | ||
|
|
||
| # check which datasets are to be written | ||
| self.has_positions = kwargs.get('positions', False) |
There was a problem hiding this comment.
I know it's something we do for the other writers (although I'm trying to change this for the NetCDF writer), so I'm kinda asking to get @orbeckst's opinion on this.
There's a lot of kwargs here, and it's not immediately clear to me that they have user-facing documentation. Would it be worth not making these kwargs but instead named optional arguments?
There was a problem hiding this comment.
Yes, that's a good idea. We still need the catch-all **kwargs for anything else that we do not care about but having explicit kwarg names in the signature (+ defaults) is good.
IAlibay
left a comment
There was a problem hiding this comment.
Having a deeper look but here are some initial comments from a quick scan.
IAlibay
left a comment
There was a problem hiding this comment.
--> review through to line 1240 of H5MD.py
Sorry I'm going to have to break up my reviews into chunks. Here is the first part of my review, I'll follow up with the rest probably tomorrow at this rate.
|
I can try to re-review a bit later today, although if you think all my comments were addressed please do go ahead with the merge @orbeckst |
|
Your point about "check exception messages" was a good one — I am currently watching @edisj find out which exceptions are actually triggered .... so my comments had been addressed, yours should be done in the next hour. |
|
Hi @IAlibay , I just pushed my final changes, all comments should be addressed. All error tests now have a If you have time to review, there are a couple of decisions made while talking to @orbeckst that might need a little justification:
Thanks again for your review :D, if there's anything else that remains let me know |
IAlibay
left a comment
There was a problem hiding this comment.
Just the one comment from a quick scan though. I'll directly approve now since it's just a documentation style thing.
richardjgowers
left a comment
There was a problem hiding this comment.
Looks good, just a couple doc quibbles that can slide
| W.write(u) | ||
|
|
||
| To write an H5MD file with contiguous datasets, you must specifly the | ||
| number of frames to be written and set ``chunks=False``: |
There was a problem hiding this comment.
Maybe a quick one line hint as to why someone would want to do this
There was a problem hiding this comment.
There's a note for it on line 948, although not sure if it fully covers what you're asking @richardjgowers ?
|
Seems to be everything, I'll merge once CI returns green - thanks for all the work @edisj 🎉 |
|
Congratulations @edisj — PR with >200 comments and what turned out to be one of the more complicated formats. Given that this could be considered, together with the SciPy paper, the capstone of your NSF REU experience, would you want to write a blog post for https://www.mdanalysis.org/blog/ where you summarize your experience and achievements? |
Fixes #2866
Changes made in this Pull Request:
class H5MDWritertoH5MD.pyThis is a continuation from #2787 and #2869. I closed #2869 because I tried rebasing to current develop and ended up making a mess of things, so it was easier to make a new pull request.
At the moment, the writer has all of the functionality it needs. It writes all of the data from the
timestep(positions, velocities, etc), it writes everything from thets.datadictionary that isn't'time', 'step', 'dt'into the 'observables' group. It can write datasets with chunked and/or compressed configurations, and files can be opened with MPI drivers.The current to-do list is:
Benchmark the writer on multiple nodes with different configurations (chunked, compression)Figure out what to do the shape argument forstepandtimedatasets when the number of frames written isn't known by the writer. This could be solved by chunking, but I'm not sure what the default chunk size should be for these datasets.PR Checklist