Make H5MD serialize with pickle#2894
Conversation
|
Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-08-11 22:53:49 UTC |
|
Hi @orbeckst, hopefully this fixes the pickling issues. I followed the code @yuxuanzhuang gave in #2890 which does allow the tests to pass. I modified it a bit to work with It seems to work with some testing: |
Codecov Report
@@ Coverage Diff @@
## develop #2894 +/- ##
===========================================
+ Coverage 87.20% 92.87% +5.67%
===========================================
Files 167 187 +20
Lines 21744 24585 +2841
Branches 3186 3192 +6
===========================================
+ Hits 18961 22834 +3873
+ Misses 2258 1705 -553
+ Partials 525 46 -479
Continue to review full report at Codecov.
|
|
The Travis error can be solved by adding try:
import h5py
except ImportError:
HAS_H5PY = False
# Allow building documentation even if h5py is not installed
import imp
class MockH5pyFile:
pass
h5py = imp.new_module("h5py")
h5py.File = MockH5pyFile
else:
HAS_H5PY = True |
yuxuanzhuang
left a comment
There was a problem hiding this comment.
Add the file link to the end of package/doc/sphinx/source/documentation_pages/coordinates/pickle_readers.rst
| driver = self._driver | ||
| comm = self._comm | ||
| except AttributeError: # is this error necessary? | ||
| driver = None |
There was a problem hiding this comment.
Will that be a problem if self._driver exists but self._comm not?
There was a problem hiding this comment.
You're right I think it would have been a problem. self._driver should have been self.driver as well, since it's pulling the driver from the file object, right? The pickled file wasn't returning the right driver with self._driver, but does with self.driver
|
@yuxuanzhuang thank you for reviewing. @edisj please ping me when all tests are passing and when you have addressed all of @yuxuanzhuang 's comments. I can then do a final review + approval. |
| except AttributeError: | ||
| driver = None | ||
|
|
||
| return {'name': self.filename, |
There was a problem hiding this comment.
I am confused: If the comm is not included in __getstate__ then how can it be used in __setstate__?
There was a problem hiding this comment.
I see that you had it included in b29e675 but then removed again.
Did you test that a MPI.COMM_WORLD could be pickled on its own, i.e.,
from mpi4py import MPI
import pickle
comm = MPI.COMM_WORLD
comm_p = pickle.loads(pickle.dumps(comm))
assert comm == comm_por something similar?
There was a problem hiding this comment.
I don’t have a working mpi4py in my desktop. But it seems that it cannot be pickled natively.
This might help meanwhile https://bitbucket.org/mpi4py/mpi4py/issues/104/pickling-of-mpi-comm
| self.__init__(name=state['name'], | ||
| mode=state['mode'], | ||
| driver=state['driver'], | ||
| comm=state['comm']) |
There was a problem hiding this comment.
Where will state['comm'] come from?
There was a problem hiding this comment.
I thought it was taking the state dict as argument from
mdanalysis/package/MDAnalysis/coordinates/H5MD.py
Lines 744 to 747 in c7087b8
There was a problem hiding this comment.
Maybe I don't understand how pickling works but I assumed that the goal is to make each object picklable by itself, i.e., ,H5PYPicklableFile by itself can be pickled. That's the first thing to check (and add a test for!!). Once this works, H5MDReader can just treat it as one item in its own class dict, as it would a numpy array or a basic Python object. It would be bad program design if H5PYPicklableFile needed the help of H5MDReader because then you could never use it independently from H5MDReader.
There was a problem hiding this comment.
I see, that makes sense, thanks for the clarification. I'm doing some testing with comm on the parallel build I have on spudda
|
I'm making a lot of commits at the moment because I'm pulling them to my repository on workstation for testing with parallel h5py |
|
Don't worry, we can squash it all when merging. |
|
@orbeckst The latest commit successfully pickles |
|
I am trying to build a parallel h5py system. You could add a few tests (specifically for H5PYPicklable) inside |
|
I am having trouble to install all that stuffs to make h5py work. But I assume if it is hard to find the "right" def __init__(self, ...comm):
self._comm = comm
super().__init__(...) |
|
The test failures are pretty annoying for all other PRs. Can we make H5PYPicklable work for serial right now? It would be ok to raise a |
orbeckst
left a comment
There was a problem hiding this comment.
I suggest you make this PR work in serial so that develop builds cleanly again.
- Make H5PYPicklable fail to pickle for parallel (
TypeError) - Document the limitation with a note in the docs.
- Add tests
- test for serial H5PYPicklable (see @yuxuanzhuang 's comment)
- test that parallel pickle raises the TypeError (you don't need mpi4py for the tests, just mock it
from unittest.mock import MagicMock MPI_COMM_WORLD = MagicMock() f = H5PYPicklable(fname, driver="mpio", comm=MPI_COMM_WORLD) with pytest.raises(TypeError): _ = pickle.dumps(f)
- raise a new issue for pickling parallel h5py files
| def __getstate__(self): | ||
| driver = self.driver | ||
| if driver == 'mpio': | ||
| comm = self.id.get_access_plist().get_fapl_mpio()[0] |
There was a problem hiding this comment.
If MPI.COMM_WORLD cannot be natively pickled then I don't think that this will work.
|
|
||
| def __getstate__(self): | ||
| driver = self.driver | ||
| if driver == 'mpio': |
There was a problem hiding this comment.
In order to move the PR along, make __getstate__ fail (with TypeError) when you have driver mpio and a comm object.
Add a test that specifically checks for the exception being raised. Then open a new issue to make pickling of parallel h5py files work.
There was a problem hiding this comment.
The test fails with ModuleNotFoundError because h5py tries to import mpi4py when driver='mpio'. Is there any way around this?
There was a problem hiding this comment.
Which test are we talking about?
There was a problem hiding this comment.
something like this:
@pytest.mark.skipif(not HAS_H5PY, reason="h5py not installed")
def test_H5MD_parallel_pickle():
from unittest.mock import MagicMock
MPI_COMM_WORLD = MagicMock()
h5md_io = H5PYPicklable(H5MD_xvf, 'r',
driver="mpio",
comm=MPI_COMM_WORLD)
h5md_io_pickled = pickle.loads(pickle.dumps(h5md_io))
will fail before the pickling
There was a problem hiding this comment.
Try mocking the mpi4py module, too. The idea is to provide a thing (the mock) that just pretends to be there but doesn't really do anything.
There was a problem hiding this comment.
@richardjgowers you have been dubbed in other PR "The King of Mocks" – any suggestions?
There was a problem hiding this comment.
I assume it cannot be fixed as it's checked inside h5py code. (https://github.com/h5py/h5py/blob/8cc83410995996330c85867d0e56cbe922b709f8/h5py/_hl/files.py#L43)
Maybe just excluding these lines from coverage for now?
There was a problem hiding this comment.
There might be a way to trick h5py into believing that it was built with mpi support (one could patch the call to h5.get_config() https://github.com/h5py/h5py/blob/8cc83410995996330c85867d0e56cbe922b709f8/h5py/_hl/files.py#L24 or just the mpi variable) but that's not worth it for right now so I concur with @yuxuanzhuang to just skip the test (EDIT: i.e., not add a test and exclude lines from coverage). Add a comment summarizing the current issues and move forward.
There was a problem hiding this comment.
@orbeckst just pushed commit with the lines excluded from coverage
orbeckst
left a comment
There was a problem hiding this comment.
Looks good to me. We will continue the question about pickling MPI_COMMUNICATOR in #2865 .
@yuxuanzhuang could you please also review this PR?
|
@yuxuanzhuang could you please review this PR within the next day? Sorry for asking for a quick turnaround but @edisj 's project is coming to an end and we would like to have it at a good stopping point, just in case he has other more urgent things to do. You should be able to leave a approve/request changes review. I will wait with merging until I hear from you. |
|
Thank you @edisj and @yuxuanzhuang ! |
* fix MDAnalysis#2890 * added H5PYPicklable class (works in serial but not with driver="mpio" and MPI comm) * mock h5py so that the docs build even in the absence of h5py * tests (anything related to mpi is excluded from coverage) * update CHANGELOG



Fixes #2890
Changes made in this Pull Request:
H5PYPicklableclass toH5MDReaderPR Checklist