-
Notifications
You must be signed in to change notification settings - Fork 765
Description
tl,dr: The MDAnalysis.analysis modules do not have a unified user interface, which is bad for users and bad for developers. We need to come up with a set of rules describing the analysis modules' user interface.
Divergent user interface in MDAnalysis.analysis
The MDAnalysis.analysis (and MDAnalysis.visualization) modules collect various kinds of "tools" to analyze simulations; in some sense, they are responsible for the "analysis" in MDAnalysis. However, while we have been pretty stringent about what our API inside the "core" should look like, we have been much less prescriptive with analysis. To a good degree this reflects the reality that code is mainly contributed by researchers that wrote something to get a particularly job done and then realized that it might be usable for the rest of the community – of course, that's exactly what we want for a user-driven open source project! On the other hand, there seems to be a growing feeling among developers that we should have a more uniform interface to the analysis tools as well.
Ideally, all our analysis tools should have a common philosophy and share a common set of options. Being able to use different analysis tools "out of the box" once you have a basic understanding of how it works makes for a good overall user experience.
From the developer side, it promotes code re-use and modularization with subsequent improvements in testing coverage and code reliability.
Using AnalysisBase
@richardjgowers wrote a prototype MDAnalysis.analysis.base.AnalysisBase class and in recent code reviews on contributions to analysis we have been pushing for basing analysis code on this class. But in discussions such as on PR #708 it is becoming clear that we should settle on what we expect of the analysis code to do, not the least so that developers, who spend a significant amount of time just cleaning up old mess when they implement code fixes and add new features, know where to set priorities and what is expected of them.
AnalysisBase outlines how to structure typical frame-based analysis but it does not really say (yet) what a user should be able to expect from analysis tools.
Different models for the user interface
Some of the current analysis tools come with additional methods to immediately plot data, many are able to write intermediate and final data to a file for reuse (and perhaps are even able to re-read the file, and perform plotting without needing to reanalyze a trajectory), most of the store results as numpy arrays in an attribute results (often a dict for multiple results).
A more purist approach is to just return final data structures, throw away intermediates and do not even store final results, and let the user do all downstream processing and plotting.
I can see four broadly defined models how we could handle the user interface:
-
Anarchy: Do not prescribe any user interface and let each analysis tool writer decide what's best and most appropriate. -
Minimalist(or developer-friendly?):- class-based: prescribe use of
AnalysisBaseand stipulate thatrun()returns all computed data. - function-based: only provide a function that performs the data reduction and returns all computed data
- class-based: prescribe use of
-
Baroque(or user-friendly?): prescribeAnalysisBasewith additional features, for example (discussion needed!)plot()for a simple visualization of the data (remember that sometimes data plotting is pretty involved, see for instance,PSAnalysis.plot()!)save()to store data as a file on diskto_df()to return as apandas.DataFrame
For any of these features you need to store the data inside the class somewhere.
-
Eclecticism: Somewhere between Minimalist and Baroque with some features mandatory and other optional (but which ones?). -
Bauhaus (the emerging consensus from the discussion below: a cohesive reduction to a common set of functional elements together with minimalist inspirations.)
- Prescribe
AnalysisBasewith a common feature set (like Baroque) with the goal to have a unified and utilitarian interface. - Provide the core numerical analysis (especially for frame-based analysis) as a function in the same module. This function is used in the
_single_frame()method.
- Prescribe
Feel free to edit/add to the list.
What do we need?
I am asking @MDAnalysis/coredevs (and anyone else interested) to chime in with opinions on what to do. The final outcome of this issue should be a consensus on set of rules (or a statement of the absence of rules for option 1) on how code in analysis ought to interface with the user. These rules will then become part of the Developer Guide.
History
- 2016-02-15: added to list of options the Bauhaus model (best-of-both-worlds) as emerging from discussions below and added note to minimalist along what @jandom originally proposed.
- 2016-02-22: consensus appears to be to go for the Bauhaus design model