Skip to content

Fix py3 compatibility regarding the HydrogenBondAnalysis.save_table()#1744

Closed
xiki-tempula wants to merge 2 commits intoMDAnalysis:developfrom
xiki-tempula:py3_save
Closed

Fix py3 compatibility regarding the HydrogenBondAnalysis.save_table()#1744
xiki-tempula wants to merge 2 commits intoMDAnalysis:developfrom
xiki-tempula:py3_save

Conversation

@xiki-tempula
Copy link
Contributor

@xiki-tempula xiki-tempula commented Dec 18, 2017

Fixes #1743

Changes made in this Pull Request:

Change the HydrogenBondAnalysis.save_table() method so that it will work under py3.
I can add a test to prove that the current implementation works in both py2/3 but I'm not sure whether it is necessary as technically I only changed one character.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

self.generate_table()
with open(filename, 'w') as f:
with open(filename, 'wb') as f:
cPickle.dump(self.table, f, protocol=cPickle.HIGHEST_PROTOCOL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not good for python 2/3 compatibility. It means that a pickle written with python 3 won't be readable with python 2. What is the type of table? Is it a numpy array? Then we have more portable options.

Copy link
Contributor Author

@xiki-tempula xiki-tempula Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de Thank you for the comment. I have tested reading and writing in py3 but only tested checked in py2.
This is a numpy recarray. I guess np.save() might be a good option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use np.savetxt then a user can consume the table however he likes. But this is a deprecation. So if this is read from somewhere we should allow reading a pickle for some time. Maybe even writing a pickle is OK until a version 1.0 (If the user really wants to).

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests to identify the bad behaviour and make sure we never regress.

@orbeckst
Copy link
Member

orbeckst commented Dec 18, 2017

The alternative approach is to remove the save() method and leave it to the user to store it.

Having a convenience method seemed, well, convenient at the time but if this requires us to worry about data formats then that's not good. Maybe use datreant.data or some other tools.

It makes sense if you want to re-analyze your data with the same functions. But we don't have a load() method so we could get rid of save() and just add a paragraph on how one can save it with pickles.

@kain88-de
Copy link
Member

Maybe use datreant.data or some other tools.

This is not going to be supported longer for exactly the same reason you mention here. Storing binary data is surprisingly hard for arbitrary objects.

It makes sense if you want to re-analyze your data with the same functions. But we don't have a load() method so we could get rid of save() and just add a paragraph on how one can save it with pickles.

yeah maybe this doesn't make much sense then anyway. @xiki-tempula how to you use the save feature?

@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Dec 18, 2017

@kain88-de Thanks for your comment. To be honest, I don't use this feature. I will just pickle whatever I want which is sometimes the whole HydrogenBondAnalysis object or sometimes only the timeseries attribute.
When I was writing WaterBridgeAnalysis module, I thought it would be nice if this new module supports most of the methods supported by HydrogenBondAnalysis. Thus, I went on to check these methods one by one to if see these methods can actually work in the new module.
I noticed that the new WaterBridgeAnalysis.save_table() doesn't work in py3 and then I realise that it is the fault of the old HydrogenBondAnalysis.save_table().
I thought that it is not a good practice to fix this problem in the WaterBridgeAnalysis PR and thus, I opened a new issue and PR to address this problem.

I agree with @orbeckst. It is probably better to just get rid of this function.

@orbeckst
Copy link
Member

Then I suggest we purge save() methods from the analysis classes. I think we have to deprecate them, so we might just do the deprecation in 0.17.0 and then remove in 1.0.

git grep 'save\w*(self'

gives

MDAnalysis/analysis/align.py:    def save(self, rmsdfile):
MDAnalysis/analysis/contacts.py:    def save(self, outfile):
MDAnalysis/analysis/diffusionmap.py:    def save(self, filename):
MDAnalysis/analysis/encore/utils.py:    def savez(self, fname):
MDAnalysis/analysis/hbonds/hbond_analysis.py:    def save_table(self, filename="hbond_table.pickle"):
MDAnalysis/analysis/hbonds/hbond_autocorrel.py:    def save_results(self, filename='hbond_autocorrel'):
MDAnalysis/analysis/hole.py:    def save(self, filename="hole.pickle"):
MDAnalysis/analysis/legacy/x3dna.py:    def save(self, filename="x3dna.pickle"):
MDAnalysis/analysis/lineardensity.py:    def save(self, description='', form='txt'):
MDAnalysis/analysis/lineardensity.py:    def _savetxt(self, filename):
MDAnalysis/analysis/lineardensity.py:    def _savez(self, filename):
MDAnalysis/analysis/psa.py:    def save_result(self, filename=None):
MDAnalysis/analysis/psa.py:    def save_paths(self, filename=None):
MDAnalysis/analysis/rms.py:    def save(self, filename=None):

Let's open another issue and then we have to go through these methods and find out if they are absolutely necessary or can be removed. We can discuss more on the new issue.

Does this sound like a sensible way forward?

@xiki-tempula
Copy link
Contributor Author

@orbeckst I agree. Actually, there is another entity.

MDAnalysis/analysis/hbonds/wbridge_analysis.py: def save_table(self, filename="wbridge_table.pickle"):

@orbeckst
Copy link
Member

Just need to add a few incendiary remarks here:

This is not going to be supported longer for exactly the same reason you mention here. Storing binary data is surprisingly hard for arbitrary objects.

It seems to me that the Python ecosystem has problems with solving this issue, together with a certain disregard for long-term stability. It wouldn't so much be an issue if everyone could agree to keep formats and file access stable. In principle, libraries such as netcdf or hdf5 or XDR (as used in Gromacs formats) are portable means to store data.

@orbeckst
Copy link
Member

Actually, there is another entity.

Thanks, I had forgotten to pull the latest. I edited my comment.

@kain88-de
Copy link
Member

Just need to add a few incendiary remarks here

we can discuss this on the datreant repo. But we do plan to keep the data module seemy alive in mdsynthesis so old code can still work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants