Asynchronous `store` protocol

> This is a write-up of a technical specification for an idea that I am sure we have been having for several years now, but usually was discussed in private or only orally.

`store`, more specifically the API call to `massStoreRun` waits and blocks until the result of the store is returned to the client. As processing a store action takes a non-trivial amount of time on the serverside (and this operation is also executed only on one thread!), this means that returning from `massStoreRun` itself takes a non-trivial amount of time. The problem surfaces if the connection between the client and the server coughs, chokes, misbehaves, because it is only the networking stack in the kernel that is keeping the door open for the reply to arrive. While a disappearing client is no problem from the server's side and *data* won't be lost, CI jobs can hang indefinitely, or scripts that expect data to be available for `cmd` query after a return from `store` will break apart.

The proposal is to switch the blocking from relying on externalia like "the TCP stack" into a softer, but more local, blocking mechanism, while also turning the API itself asynchronous. This proposal is backwards compatible.

# Database changes

We already have information in `RunLock` as to what runs are undergoing a store. However, this is not enough, we need to store some semi-temporary information about store "attempts" or "sessions". This could go into its own table, per product, as this needs to be kept for a time even if the run lock is released. This table would contain the run name, a unique session token/identifier, and some status flag. The identifier might be auto-incremented, or a hash of the time when the lock was initialised, it is not a "secret" resource.

These identifiers should be garbage collected in the usual process.

# CLI changes

There are no changes needed on the CLI. Optionally, the `store` command might be extended with a `--no-block` argument which makes it immediately exit and return to shell once the server started processing the data, in case the user does not care about when the operation finished.

# API changes

A new endpoint, hereby referred to as `massStoreRun`**`Async`** shall be created. This function should return the aforementioned "store session token", or throw. The semantics of this function should be that once the server can confirm that processing of results can reasonably continue (cheap early checks like permission, the fact that the data is validly encoded before unpacking it, etc. should be performed) it returns.

To query whether the store operation has succeeded or not, a new function should be added, which returns status information (from the database) about the store. The information needed here is malleable, but at least a boolean: _"Is the operation still in progress?"_. (Consuming a successful result might want to remove the related information from the database, to ease garbage collection times at startup.)

# Implementation changes

The `store` command should, once received the token from the server, close the connection and use the token to every once in a while **poll** the server for the status of the operation. Deciding a good interval here could be tough, but trivial choices like "every 10 sec" or "every 30 sec" should be fine as a prototype. As far as I gathered, we already perform a counting of reports **during store** (which is weird!) but if this information is available, the initial wait time, and the requery interval could be assumed using it.

Inbetween queries, the `store` binary should sleep using OS primitives for sleeping a process, but without having to rely on the network stack. Every query is its own connection, like `cmd ...`.

----

Obsoletes #4039.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous `store` protocol #3672

Database changes

CLI changes

API changes

Implementation changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Asynchronous store protocol #3672

Description

Database changes

CLI changes

API changes

Implementation changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Asynchronous `store` protocol #3672