As a python library in Jupyter user, I'd like to be able to re-run arbitrary cells in my notebook in whatever order I see fit and still have correct run metadata generated, so that I can enjoy the interactivity and explorability of Jupyter notebooks along with the provenance tracking and reproducibility of Dotscience.
Currently, I can't, because of #1. Let's try using a different approach and see if it works.
ACs:
As this is a prototyping effort, all these ACs are to be considered "aspirational"; we'll see what we can achieve in practice then decide, at the end, whether what we have is better than what we ALREADY have.
These ACs should apply for all of these cases:
Implementation plan:
We have a CUNNING PLAN to break this impasse! It's Luke's suggestion:
- Don't store state in-memory in the python library, because the history of that in-memory state is the dynamic flow of execution of Jupyter cells which may have nothing to do with their order in the notebook, leading to the problems expounded above.
- Instead, every time you call a metadata-registration function like ds.input(), it should output a machine-readable tag at that very point.
ds.publish() outputs an "end of this run" tag
- the parser (be it notebook or command-output) reads the tags from top to bottom, building up in-memory state in notebook lexical order and outputting a run and clearing its in-memory state at the "end of this run" tag
- Therefore, the assignment of actions to runs is based purely on the lexical structure, not the dynamic structure.
- For extra niceness, in Jupyter mode, we can output the markers inside "Jupyter widgets" that control their display (rather than plain text) so they're less obtrusive and prettier; but we need to transparently not do that when not in Jupyter.
- How does this work with "publish inside a loop"? Unless we come up with a clever trick, we'll only keep the results of the last iteration of the loop. But do users do publish inside a loop to try the same algorithm with different input parameters, or copy+paste the cell and edit the parameters in each copy?
As a python library in Jupyter user, I'd like to be able to re-run arbitrary cells in my notebook in whatever order I see fit and still have correct run metadata generated, so that I can enjoy the interactivity and explorability of Jupyter notebooks along with the provenance tracking and reproducibility of Dotscience.
Currently, I can't, because of #1. Let's try using a different approach and see if it works.
ACs:
As this is a prototyping effort, all these ACs are to be considered "aspirational"; we'll see what we can achieve in practice then decide, at the end, whether what we have is better than what we ALREADY have.
These ACs should apply for all of these cases:
ds.publish().ds.publish()that's in a loop (eg, trying the same algorithm with a range of input parameters to see what's best).Implementation plan:
We have a CUNNING PLAN to break this impasse! It's Luke's suggestion:
ds.publish()outputs an "end of this run" tag