TRZReader check for n_atoms#2820
Conversation
update to HEAD
Codecov Report
@@ Coverage Diff @@
## develop #2820 +/- ##
===========================================
+ Coverage 92.19% 92.22% +0.02%
===========================================
Files 184 184
Lines 24072 24141 +69
Branches 3122 3123 +1
===========================================
+ Hits 22194 22263 +69
Misses 1813 1813
Partials 65 65
Continue to review full report at Codecov.
|
orbeckst
left a comment
There was a problem hiding this comment.
Can we check this in a more explicit manner?
package/CHANGELOG
Outdated
| * TOPParser no longer guesses elements when missing atomic number records | ||
| (Issues #2449, #2651) | ||
| * Testsuite does not any more matplotlib.use('agg') (#2191) | ||
| * TRZReader now checks `n_atoms` provided right during initilization. |
There was a problem hiding this comment.
| * TRZReader now checks `n_atoms` provided right during initilization. | |
| * TRZReader now checks `n_atoms` during initialization (PR #2820) |
|
|
||
| def test_get_wrong_n_atoms(self): | ||
| with pytest.raises(ValueError, match=r"Supplied n_atoms *"): | ||
| mda.Universe(TRZ, n_atoms = 8080) |
There was a problem hiding this comment.
| mda.Universe(TRZ, n_atoms = 8080) | |
| mda.Universe(TRZ, n_atoms=8080) |
| try: | ||
| self._get_dt() | ||
| except OSError: | ||
| raise ValueError("Supplied n_atoms {} is incompatible " | ||
| "with provided trajectory file. " | ||
| "Maybe `topology` is wrong?".format(self.n_atoms)) | ||
|
|
||
|
|
There was a problem hiding this comment.
It is not clear why _get_dt() will fail with the wrong n_atom. What's the stack trace and where does the ValueError come from? (I assume this relies on being able to do self.next(), which actually fills the ts.... but then why does self._read_next_timestep() on the preceding line work? 🤷 )
This might work and fix an error but I think we should rather check sooner and in a more transparent manner if at all possible.
There was a problem hiding this comment.
_read_next_timestep (used by __next__) only fills the ts by reading a chunk from the TRZ file using np.fromfile with the provided frame_contents (defined by n_atoms) no matter it is right or wrong.
read_frame, and other related functions (e.g. _get_dt will go next and read frame back), on the other hand, will use self._seek. With an abnormal trajectory.next().frame, This will always reach an error.
|
Given my current knowledge on Besides, I checked there are still other magic Other people with more knowledge are welcome to step in. |
I'm also not super well read on the TRZ format, but just looking at the code, is there any chance we could evaluate the return value of data['natoms']? |
|
Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-07-07 14:10:06 UTC |
|
Looks good? Also spot that |
richardjgowers
left a comment
There was a problem hiding this comment.
Looks awesome, thanks @yuxuanzhuang
IAlibay
left a comment
There was a problem hiding this comment.
@yuxuanzhuang could you just check that the docs build properly on this one? I think the versionchanged is going to look weird as-is.
| assert_equal(W.n_atoms, 100) | ||
|
|
||
| def test_get_wrong_n_atoms(self): | ||
| with pytest.raises(ValueError, match=r"Supplied n_atoms *"): |
There was a problem hiding this comment.
note, I dont think you need to have the "*" in match calls, it essentially does a string in string type of comparison.
|
I just want to discuss the necessity of raising an error, i.e. the TRZ file is still valid if the natoms written in the file is wrong (e.g. 0), which is case for the previous TRZWriter. Maybe a warning will suffice? Or we still need to find something else for validation. |
Let's raise an exception. Did you fix the writer? If so, add a note to Fixes in the CHANGELOG, please. |
IAlibay
left a comment
There was a problem hiding this comment.
Just a text clarification and an extra test and we're good I think.
package/CHANGELOG
Outdated
| * Testsuite does not any more matplotlib.use('agg') (#2191) | ||
| * In ChainReader, read_frame does not trigger change of iterating position. | ||
| (Issue #2723, PR #2815) | ||
| * TRZReader now checks `n_atoms` during initilization. (PR #2820) |
There was a problem hiding this comment.
I might be missing something obvious, but the check is happening in _read_next_timestep right? So technically it's not limited to initialization but every step. It might be more accurate here to just say "checks n_atoms on reading"?
There was a problem hiding this comment.
Could you also add the issue number here?
* fix MDAnalysis#2817 * TRZReader checks that TRZ file contains the same number of atoms as the topology or n_atoms and fails otherwise * TRZWriter correctly writes n_atoms to file (previously wrote 0!) * add test * add docs * update CHANGELOG
Fixes #2817
Changes made in this Pull Request:
dtto raiseValueErrorifn_atomsis incompatible with the trajectory.PR Checklist