From 0986e6f063722e2696ed4713422f6bc196081689 Mon Sep 17 00:00:00 2001 From: Thomas A Caswell Date: Thu, 28 Sep 2017 19:54:26 -0700 Subject: [PATCH] DOC: update suitcase pre-processing docs --- design/suitcase-preprocess-data-binning.rst | 81 ++++++++++++++++----- 1 file changed, 61 insertions(+), 20 deletions(-) diff --git a/design/suitcase-preprocess-data-binning.rst b/design/suitcase-preprocess-data-binning.rst index 60dcaf1..8258a74 100644 --- a/design/suitcase-preprocess-data-binning.rst +++ b/design/suitcase-preprocess-data-binning.rst @@ -1,16 +1,21 @@ -=============== -Suitcase Update -=============== +================= + Suitcase Update +================= Summary ======= -Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file. + + +Suitcase is a simple utility for exporting data from the databroker +into a stand-alone, portable file, such as h5 file. Current Implementation ====================== -One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and -outputs h5 file with the same structure of the data in databroker. + +One of the functions currently used is called ``export``, which mainly +takes inputs including header, filename and metadatastore, and outputs +h5 file with the same structure of the data in databroker. .. code-block:: python @@ -18,8 +23,9 @@ outputs h5 file with the same structure of the data in databroker. last_run = db[-1] # get header from databroker hdf5.export(last_run, 'myfile.h5', db=db) -The first argument may be a single Header or a list of Headers. You can also use keyword "fields" -in the "export" function to define specifically which data sets you want to output. +The first argument may be a single Header or a list of Headers. You +can also use keyword "fields" in the "export" function to define +specifically which data sets you want to output. .. code-block:: python @@ -30,18 +36,53 @@ in the "export" function to define specifically which data sets you want to outp filename = 'scanID_123.h5' hdf5.export(hdr, filename, db=db, fields=fds) -Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields. -If all vector data are blocked, saving data with only scaler data and header information should be very faster. -Please also define filename clearly, so you know which data it comes from. +Here I assume A, B, C are keywords for some vector data, like +images. You can define them as un_wanted_fields. If all vector data +are blocked, saving data with only scaler data and header information +should be very faster. Please also define filename clearly, so you +know which data it comes from. + +Issues and Proposed Solutions +============================= + +Easily support many formats +--------------------------- + +Currently each file format needs to implement ``export`` independently +which leads to duplication of the logic for handling the header to +document work and will require each file format to implement the +in-line processing described below. + +At the top level we should have an export function with a signature + +.. code-block:: python + + def export(headers: List[Header], + format: Union[str, Callable[[Generator[None, [str, dict]], None]]], + format_kwargs=None: Optional[Dict[str, Any]] + stream_name=None : Optional[Union[str, Iterable[str]]], + fields=None: Optional[Iterable[str]], + timestamps=True: Bool, + filters=None: Optional[Generator[[str, dict], [str, dct]]) + +Issues: + 1. where to inject the file name? + - zipped list of names? + - single name? + - filename template? + - leave it up to the format / consumer? + + +in-line processing +------------------ -Issue and Proposed Solution -=========================== -Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50). -So we need to change both the data from events and the data shape information in the descriptor. Here are some -of the solutions. +Users want to do binning on some of the datasets, i.e., changing the +shape of a given data from (100,100) to (50,50). So we need to change +both the data from events and the data shape information in the +descriptor. Here are some of the solutions. solution 1: decorator ---------------------- +~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -60,7 +101,7 @@ solution 1: decorator solution 2: partial function ----------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -73,7 +114,7 @@ solution 2: partial function solution 3: use class ---------------------- +~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -91,7 +132,7 @@ solution 3: use class We can use base class from bluesky. solution 4: based on original export function ---------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python