Skip to content

TRZReader check for n_atoms#2820

Merged
orbeckst merged 18 commits intoMDAnalysis:developfrom
yuxuanzhuang:trzreader_check
Jul 7, 2020
Merged

TRZReader check for n_atoms#2820
orbeckst merged 18 commits intoMDAnalysis:developfrom
yuxuanzhuang:trzreader_check

Conversation

@yuxuanzhuang
Copy link
Contributor

Fixes #2817

Changes made in this Pull Request:

  • check for dt to raise ValueError if n_atoms is incompatible with the trajectory.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jul 3, 2020

Codecov Report

Merging #2820 into develop will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2820      +/-   ##
===========================================
+ Coverage    92.19%   92.22%   +0.02%     
===========================================
  Files          184      184              
  Lines        24072    24141      +69     
  Branches      3122     3123       +1     
===========================================
+ Hits         22194    22263      +69     
  Misses        1813     1813              
  Partials        65       65              
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/TRZ.py 88.51% <100.00%> (+0.08%) ⬆️
util.py 88.10% <0.00%> (+0.08%) ⬆️
coordinates/base.py 94.82% <0.00%> (+0.28%) ⬆️
auxiliary/base.py 91.31% <0.00%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17969df...a8275ae. Read the comment docs.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check this in a more explicit manner?

* TOPParser no longer guesses elements when missing atomic number records
(Issues #2449, #2651)
* Testsuite does not any more matplotlib.use('agg') (#2191)
* TRZReader now checks `n_atoms` provided right during initilization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* TRZReader now checks `n_atoms` provided right during initilization.
* TRZReader now checks `n_atoms` during initialization (PR #2820)


def test_get_wrong_n_atoms(self):
with pytest.raises(ValueError, match=r"Supplied n_atoms *"):
mda.Universe(TRZ, n_atoms = 8080)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mda.Universe(TRZ, n_atoms = 8080)
mda.Universe(TRZ, n_atoms=8080)

Comment on lines +229 to +236
try:
self._get_dt()
except OSError:
raise ValueError("Supplied n_atoms {} is incompatible "
"with provided trajectory file. "
"Maybe `topology` is wrong?".format(self.n_atoms))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why _get_dt() will fail with the wrong n_atom. What's the stack trace and where does the ValueError come from? (I assume this relies on being able to do self.next(), which actually fills the ts.... but then why does self._read_next_timestep() on the preceding line work? 🤷 )

This might work and fix an error but I think we should rather check sooner and in a more transparent manner if at all possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_read_next_timestep (used by __next__) only fills the ts by reading a chunk from the TRZ file using np.fromfile with the provided frame_contents (defined by n_atoms) no matter it is right or wrong.

read_frame, and other related functions (e.g. _get_dt will go next and read frame back), on the other hand, will use self._seek. With an abnormal trajectory.next().frame, This will always reach an error.

@yuxuanzhuang
Copy link
Contributor Author

Given my current knowledge on TRZ files (which is none), I cannot really come up with an ideal checking method. Maybe check if self.next() return a nonphysical frame? Or what may happen if TRZ only has one frame.

Besides, I checked there are still other magic n_atoms numbers that will pass the "dt" test, may leading to false positive cases.

Other people with more knowledge are welcome to step in.

@IAlibay
Copy link
Member

IAlibay commented Jul 4, 2020

Given my current knowledge on TRZ files (which is none), I cannot really come up with an ideal checking method. Maybe check if self.next() return a nonphysical frame? Or what may happen if TRZ only has one frame.

Besides, I checked there are still other magic n_atoms numbers that will pass the "dt" test, may leading to false positive cases.

Other people with more knowledge are welcome to step in.

I'm also not super well read on the TRZ format, but just looking at the code, is there any chance we could evaluate the return value of data['natoms']?

@pep8speaks
Copy link

pep8speaks commented Jul 6, 2020

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-07 14:10:06 UTC

@yuxuanzhuang
Copy link
Contributor Author

Looks good? Also spot that TRZWriter does not write n_atoms to the file. Maybe a bug or intentionally?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks @yuxuanzhuang

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuxuanzhuang could you just check that the docs build properly on this one? I think the versionchanged is going to look weird as-is.

assert_equal(W.n_atoms, 100)

def test_get_wrong_n_atoms(self):
with pytest.raises(ValueError, match=r"Supplied n_atoms *"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, I dont think you need to have the "*" in match calls, it essentially does a string in string type of comparison.

@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Jul 6, 2020

I just want to discuss the necessity of raising an error, i.e. the TRZ file is still valid if the natoms written in the file is wrong (e.g. 0), which is case for the previous TRZWriter. Maybe a warning will suffice? Or we still need to find something else for validation.

@orbeckst
Copy link
Member

orbeckst commented Jul 6, 2020

I just want to discuss the necessity of raising an error, i.e. the TRZ file is still valid if the natoms written in the file is wrong (e.g. 0), which is case for the previous TRZWriter. Maybe a warning will suffice? Or we still need to find something else for validation.

Let's raise an exception.

Did you fix the writer? If so, add a note to Fixes in the CHANGELOG, please.

@orbeckst orbeckst added this to the 1.0.x milestone Jul 6, 2020
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a text clarification and an extra test and we're good I think.

* Testsuite does not any more matplotlib.use('agg') (#2191)
* In ChainReader, read_frame does not trigger change of iterating position.
(Issue #2723, PR #2815)
* TRZReader now checks `n_atoms` during initilization. (PR #2820)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something obvious, but the check is happening in _read_next_timestep right? So technically it's not limited to initialization but every step. It might be more accurate here to just say "checks n_atoms on reading"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the issue number here?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :)

@orbeckst orbeckst merged commit c8abbee into MDAnalysis:develop Jul 7, 2020
orbeckst pushed a commit that referenced this pull request Jul 7, 2020
* fix #2817
* TRZReader checks that TRZ file contains the same number of atoms as the topology or n_atoms and fails otherwise
* TRZWriter correctly writes n_atoms to file (previously wrote 0!)
* add test
* add docs
* update CHANGELOG

(cherry picked from commit c8abbee)
orbeckst pushed a commit that referenced this pull request Jul 12, 2020
* fix #2817
* TRZReader checks that TRZ file contains the same number of atoms as the topology or n_atoms and fails otherwise
* TRZWriter correctly writes n_atoms to file (previously wrote 0!)
* add test
* add docs
* update CHANGELOG

(cherry picked from commit c8abbee)
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* fix MDAnalysis#2817 
* TRZReader checks that TRZ file contains the same number of atoms as the topology or n_atoms and fails otherwise
* TRZWriter correctly writes n_atoms to file (previously wrote 0!)
* add test
* add docs
* update CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TRZReader doesn't complain when wrong topology is provided

6 participants