-
Notifications
You must be signed in to change notification settings - Fork 824
Description
I'm working together with @joaomcteixeira to implement a command line interface for MDAnalysis. Our implementation is mostly finished 🥳. Currently, we are preparing the repo for moving to the mda organization and make it ready for CI, PYPI, Conda, etc.
However, one important part is still missing. For saving the data a common structure for the results of each analysis class would be helpful. We already started a discussion but since some structural changes to the Analysis Framework and most likely to all analysis classes are necessary I move the discussion.
@orbeckst suggested three possibilities and I will use the rdf class as an example for the API. The current implementation for running and accessing the results looks like
rdf = InterRDF(ag1, ag2).run()
plt.plot(rdf.bins, rdf.rdf)1. Reserved attribute names
For example, self.results could always be a data structure containing computed data. This data structure could either be
a dictionary
rdf = InterRDF(ag1, ag2).run()
plt.plot(rdf.results["bins"], rdf.results["rdf"])or attributes
rdf = InterRDF(ag1, ag2).run()
plt.plot(rdf.results.bins, rdf.results.rdf)2. Trailing underscore attributes (like sklearn-style)
In scikit-learn, computed variables get an underscore affixed (e.g. .results_).
rdf = InterRDF(ag1, ag2).run()
plt.plot(rdf.bins_, rdf.rdf_)3. Flexible annotation
A dictionary containing a list of attributes for computed data. This dictionary could look like
self._annotation = {'bins': 'OUTPUT', 'rdf': 'OUTPUT',}Here, OUTPUT could mark anything that is always computed after run() and should be saved. Possible other tags such as "OPTIONAL" for something that might be computed by an auxiliary method. With this approach only a few changes to the classes themselves are necessary. Their python api stays untouched as shown above.
My favorite is the first approach using attributes. It combines all results into a structure common for all classes. For me as a user this also handy. I can quickly access all results without always checking the documentation for the correct name.
Regardless of the method we choose, the MDACLI will find 'results' attributes and save them. If a numpy array, we could use np.save() to CSV, if it's a pd.DataFrame, we use to_csv()...
If you have better ideas we are of course open for discussion.