Action to destination schedule dataflow unsafe and difficult to use

The current data flow from an action (in schedule A) to destination(s) in schedule B... is:
* If an action returns non-zero, its results are destroyed
* If an action returns zero, and has destinations, its output is moved to the destination schedule(s) "base directory", *regardless of whether that schedule is already running or not*.
* There is no data life-cycle management implemented to deal with consumed data by an action

This really gets in the way of implementing resilient reporting of results to a collector.  The desired report flow is:  render the report, optionally compress it, and if this fails (e.g. storage full), don't remove the source data.   After the report is rendered successfully, remove *only* the source data that is present in that report, and queue the report for transmission.  The report is only removed when the transmission succeeds.   The render+transmit steps may be implemented atomically, and then it becomes "the source data present in the report must only be removed when the transmission succeeds".

Proposed action -> schedule data-flow:

1. Action output is sent to an "incoming" queue on each schedule (it can continue to be the base directory of the schedule.
2. Data can arrive at the "incoming" queue of a schedule at any time, including while that scheduling is already running.  Either advisory locking or "write then rename" strategies must be used to ensure no "live" pair of (data, meta) files exist.  This is already true as implemented.
3. When the runner will *start* processing a schedule, it moves every pair of (meta, data) files to a "processing" queue of that *schedule*.
4. actions must consume data only from the "processing" queue of their schedule.
5. If, and only if, *every* action of a schedule returns a zero status, the "processing" queue is emptied by the runner when the schedule finishes running.  If any of the actions return a non-zero status *or* no action exists, the "processing" queue is left alone (i.e. accumulates data).

The SIMET team will implement the proposed action above, and it will (as usual) be made available for merge upstream in our fork.  If we find any problems with the proposed solution, we will edit this issue report accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action to destination schedule dataflow unsafe and difficult to use #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Action to destination schedule dataflow unsafe and difficult to use #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions