From 9e8f245d0fd4cc1d6c2567d17430283a28d5cf07 Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Fri, 3 Feb 2017 13:32:45 -0800 Subject: [PATCH 1/6] Initial draft of the DataSet spec --- specs/DataSet.rst | 253 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 253 insertions(+) create mode 100644 specs/DataSet.rst diff --git a/specs/DataSet.rst b/specs/DataSet.rst new file mode 100644 index 000000000000..1d32279c9f64 --- /dev/null +++ b/specs/DataSet.rst @@ -0,0 +1,253 @@ +===================== +DataSet Specification +===================== + +Introduction +============ + +The DataSet class is used in QCoDeS to hold measurement results. +It is the destination for measurement loops and the source for plotting and data analysis. +As such, it is a central component of QCoDeS. + +The DataSet class should be usable on its own, without other QCoDeS components. +In particular, the DataSet class should not require the use of Loop and parameters, although it should integrate with those components seamlessly. +This will significantly improve the modularity of QCoDeS by allowing users to plug into and extend the package in many different ways. +As long as a DataSet is used for data storage, users can freely select the QCoDeS components they want to use. + +Terminology +================ + +Parameter + A logically-single value input to or produced by a measurement. + A parameter need not be a scalar, but can be an array or a tuple or an array of tuples, etc. + A DataSet parameter corresponds conceptually to a QCoDeS parameter, but does not have to be defined by or associated with a QCoDeS Parameter . + Roughly, a parameter represents a column in a table of experimental data. + +Result + A result is the collection of parameter values associated to a single measurement in an experiment. + Roughly, a result corresponds to a row in a table of experimental data. + +Role + Parameters may play different roles in a measurement. + Specifically, they may be input to the measurement (set-points) or outputs of the measurement (measured or computed values). + This distinction is important for plotting and for replicating an experiment. + +DataSet + A DataSet is a QCoDeS object that stores the results of an experiment. + Roughly, a DataSet corresponds to a table of experimental data, along with metadata that describes the data . + Depending on the state of the experiment, a DataSet may be "in progress" or "completed". + +ExperimentContainer + An ExperimentContainer is a QCoDeS object that stores all information about an experiment. + This includes items such as the equipment on which the experiment was run, the configuration of the equipment, graphs and other analytical output, and arbitrary notes, as well as the DataSet that holds the results of the experiment. + +Requirements +============ + +The DataSet class should meet the following requirements: + +Basics +--------- + +#. A DataSet can store data of (reasonably) arbitrary types. +#. A completed DataSet should be immutable; neither its metadata nor its results may be modified. + +Creation +------------ + +#. It should be possible to create a DataSet without knowing the final item count of the various values it stores. + In particular, the number of loop iterations for a sweep should not be required to create the DataSet. +#. The list of parameters in each result to be stored in a DataSet should be specified at creation time. + This includes the name, role (set-point or output), and type of each parameter. + Parameters may be marked as optional, in which case they are not required for each result. +#. It should be possible to define a result parameter that is independent of any QCoDeSParameter or Instrument. +#. A QCoDeS Parameter should provide sufficient information to define a result parameter. +#. A DataSet should allow storage of relatively arbitrary metadata describing the run that generated the results and the parameters included in the results. + +Writing +---------- + +#. It should be possible to add a single result or a sequence of results to an in-progress DataSet. +#. A DataSet should maintain the order in which results were added. +#. An in-progress DataSet may be marked as completed. + +Access +--------- + +#. Values in a DataSet should be easily accessible for plotting and analysis, even while the DataSet is in progress. + In particular, it should be possible to retrieve results in a NumPy-compatible form. +#. It should be possible to define a cursor that specifies a location in a specific value set in a DataSet. + It should be possible to get a cursor that specifies the current end of the DataSet when the DataSet is "in progress". + It should be possible to read "new data" in a DataSet; that is, to read everything after a cursor. +#. It should be possible to subscribe to change notifications from a DataSet. + It is acceptable if such subscriptions must be in-process until QCoDeS multiprocessing is redone. + Change notifications should include the results that were added to the DataSet that triggered the notification. + +Storage and Persistence +----------------------- + +#. A DataSet object should allow writing to and reading from storage in a variety of formats. +#. Users should be able to define new persistence formats. +#. Users should be able to specify where a DataSet is written. + +Interface +========= + +Creation +-------- + +ParamSpec +~~~~~~~~~ + +A ParamSpec object specifies a single parameter in a DataSet. + +ParamSpec(Parameter p, optional=) + Creates a parameter specification from a QCoDeS Parameter. + If optional is provided and is true, then the parameter is optional in each result. + +ParamSpec(name, role, type, desc=, optional=) + Creates a parameter specification with the given name, role (‘I’ or ‘O’), and type. + The type should be a NumPy dtype object. + If a description is provided, it is included in the metadata of the DataSet. + The description can be a simple string or a string-to-string dictionary. + If optional is provided and is true, then the parameter is optional in each result. + +DataSet +~~~~~~~ + +DataSet() + Creates a DataSet with no parameters. + +DataSet(specs) + Creates a DataSet for the provided list of parameter specifications. + Each item in the list should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. + A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. + +DataSet.add_parameter(spec) + Adds a parameter to an existing DataSet. + The spec should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. + A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. + It is an error to add a parameter to a non-empty DataSet. + +DataSet.add_parameters(specs) + Adds a list of parameters to an existing DataSet. + Each item in the list should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. + A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. + It is an error to add a parameter to a non-empty DataSet. + +DataSet.add_metadata(tag=, info=) + Adds metadata to the current DataSet. + The metadata is stored under the provided tag. + It is an error to add metadata to a completed DataSet. + +Writing +------- + +DataSet.add_result(**kwargs) + Adds a result to the DataSet. + Keyword parameters should have the name of a parameter as the keyword and the value to associate as the value. + If there is only one positional parameter and it is a dictionary, then it is interpreted as a map from parameter name to parameter value. + It is an error for a value for the same parameter to be specified both using a positional parameter or dictionary parameter and using a keyword, + It is an error to provide a value for a key or keyword that is not the name of a parameter in this DataSet. + It is an error to add a result to a completed DataSet. + +DataSet.add_results(args) + Adds a sequence of results to the DataSet. + The single argument should be a sequence of dictionaries, where each dictionary provides the values for all of the parameters in that result. + See the add_result method for a description of such a dictionary. + The order of dictionaries in the sequence will be the same as the order in which they are added to the DataSet. + It is an error to add results to a completed DataSet. + +DataSet.complete() + Marks the DataSet as completed. + +Access +------ + +DataSet.length + This attribute holds the current number of results in the DataSet. + +DataSet.is_empty + This attribute will be true if the DataSet is empty (has no results), or false if at least one result has been added to the DataSet. + It is equivalent to testing if the length is zero. + +DataSet.is_completed + This attribute will be true if the DataSet is completed or false if it is in progress. + +DataSet.get_data(*params, start=, end=) + Returns the values stored in the DataSet for the specified parameters. + The values are returned as a list of parallel NumPy arrays, one array per parameter. + The data type of each array is based on the data type provided when the DataSet was created. + If a parameter is optional and no value was provided for one or more results, the corresponding array entries will be the “null” value for the data type: zero for integers, NaN for floats, “” for strings, None for objects. + The parameter list may contain a mix of string parameter names, QCoDeS Parameter objects, and ParamSpec objects. + If provided, the start and end parameters select a range of results by result count (index). + Start defaults to 0, and end defaults to the current length. + If the range is empty -- that is, if the end is less than or equal to the start – then a list of empty arrays is returned. + +DataSet.get_parameters() + Returns a list of ParamSpec objects that describe the parameters stored in this DataSet. + +DataSet.get_metadata(tag=) + Returns metadata for this DataSet. + If a tag string is provided, only metadata stored under that tag is returned. + Otherwise, all metadata is returned. + +DataSet.subscribe(callback, state=) + Subscribes the provided callback function to result additions to the DataSet. + Every time one or more results are added to the DataSet, the callback is called. + It is passed the DataSet itself, the length of the DataSet before the triggering addition, the length after the addition, and the state object provided when subscribing. + If no state object was provided, then the callback gets passed None as the fourth parameter. + When the DataSet is completed, the callback gets called with the length of the DataSet as both the before and after lengths. + This method returns an opaque subscription identifier. + +DataSet.unsubscribe(subid) + Removes the indicated subscription. + The subid must be the same object that was returned from a DataSet.subscribe call. + +Storage +------- + +DataSet.read_from(location, formatter=) + Reads a DataSet from persistent store. + Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. + If not provided, the default formatter is used. + The default formatter is currently GNUPlotFormat(). + This is a static method in the DataSet class. + It returns a new DataSet object. + +DataSet.read_updates() + Updates the DataSet by reading any new results and metadata written since the last read. + This method returns a tuple of two Booleans indicating whether or not there were new results and whether or not there was new metadata. + +DataSet.write(location, formatter=, overwrite=) + Writes the DataSet to persistent store. + Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. + If not provided, the default formatter is used; currently the default is GNUPlotFormat(). + Overwrite, if true, indicates that any old data found at the specified location should be deleted. + Otherwise, it is an error to specify a location that is already in use. + This method can be called even if the DataSet is empty, in order to specify the location and format + +DataSet.write_updates() + Writes new results in the DataSet to persistent store. + Depending on the formatter, this may append to an existing stored version or may overwrite the stored version. + +DataSet.write_copy(location, formatter=, overwrite=) + Writes a separate copy of the DataSet to persistent store. + Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. + If not provided, the formatter for the DataSet is used. + Overwrite, if true, indicates that any old data found at the specified location should be deleted. + Otherwise, it is an error to specify a location that is already in use. + +Open Issues +=========== + +#. Should DataSets automatically write to persistent store periodically, or should the user be required to call write() in order to flush changes ? + +At least for now, it seems useful to maintain the current behavior of the DataSet flushing to disk periodically. + +#. Should there be a DataSet method similar to add_result that automatically adds a new result by calling the get() method on all parameters that are defined by QCoDeS Parameters? + +It would be really easy to write a helper method that does this, so it doesn’t seem necessary to have it in the core API. From be98a5300e1c4ffdd1cc266d8505a9ab3fd77d8d Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Tue, 7 Feb 2017 11:49:45 -0800 Subject: [PATCH 2/6] Updated based on feedback --- specs/DataSet.rst | 91 ++++++++++++++++++++++++++++------------------- 1 file changed, 54 insertions(+), 37 deletions(-) diff --git a/specs/DataSet.rst b/specs/DataSet.rst index 1d32279c9f64..4a84afe6dd30 100644 --- a/specs/DataSet.rst +++ b/specs/DataSet.rst @@ -27,14 +27,9 @@ Result A result is the collection of parameter values associated to a single measurement in an experiment. Roughly, a result corresponds to a row in a table of experimental data. -Role - Parameters may play different roles in a measurement. - Specifically, they may be input to the measurement (set-points) or outputs of the measurement (measured or computed values). - This distinction is important for plotting and for replicating an experiment. - DataSet A DataSet is a QCoDeS object that stores the results of an experiment. - Roughly, a DataSet corresponds to a table of experimental data, along with metadata that describes the data . + Roughly, a DataSet corresponds to a table of experimental data, along with metadata that describes the data. Depending on the state of the experiment, a DataSet may be "in progress" or "completed". ExperimentContainer @@ -49,25 +44,30 @@ The DataSet class should meet the following requirements: Basics --------- -#. A DataSet can store data of (reasonably) arbitrary types. -#. A completed DataSet should be immutable; neither its metadata nor its results may be modified. +#. A DataSet can store data of (reasonably) arbitrary types and shapes. basically, any type and shape that can fit in a NumPy array should be supported. +#. The results stored in a completed DataSet should be immutable; no new results may be added to a completed DataSet. Creation ------------ #. It should be possible to create a DataSet without knowing the final item count of the various values it stores. In particular, the number of loop iterations for a sweep should not be required to create the DataSet. -#. The list of parameters in each result to be stored in a DataSet should be specified at creation time. +#. The list of parameters in each result to be stored in a DataSet may be specified at creation time. This includes the name, role (set-point or output), and type of each parameter. Parameters may be marked as optional, in which case they are not required for each result. +#. It should be possible to add a new parameter to an in-progress DataSet. #. It should be possible to define a result parameter that is independent of any QCoDeSParameter or Instrument. #. A QCoDeS Parameter should provide sufficient information to define a result parameter. -#. A DataSet should allow storage of relatively arbitrary metadata describing the run that generated the results and the parameters included in the results. +#. A DataSet should allow storage of relatively arbitrary metadata describing the run that + generated the results and the parameters included in the results. + Essentially, DataSet metadata should be a string-keyed dictionary at the top, + and should allow storage of any JSON-encodable data. Writing ---------- #. It should be possible to add a single result or a sequence of results to an in-progress DataSet. +#. It should be able to add an array of values for a new parameter to an in-progress DataSet. #. A DataSet should maintain the order in which results were added. #. An in-progress DataSet may be marked as completed. @@ -75,13 +75,12 @@ Access --------- #. Values in a DataSet should be easily accessible for plotting and analysis, even while the DataSet is in progress. - In particular, it should be possible to retrieve results in a NumPy-compatible form. + In particular, it should be possible to retrieve full or partial results as a NumPy array. #. It should be possible to define a cursor that specifies a location in a specific value set in a DataSet. It should be possible to get a cursor that specifies the current end of the DataSet when the DataSet is "in progress". It should be possible to read "new data" in a DataSet; that is, to read everything after a cursor. #. It should be possible to subscribe to change notifications from a DataSet. It is acceptable if such subscriptions must be in-process until QCoDeS multiprocessing is redone. - Change notifications should include the results that were added to the DataSet that triggered the notification. Storage and Persistence ----------------------- @@ -101,16 +100,14 @@ ParamSpec A ParamSpec object specifies a single parameter in a DataSet. -ParamSpec(Parameter p, optional=) - Creates a parameter specification from a QCoDeS Parameter. - If optional is provided and is true, then the parameter is optional in each result. - -ParamSpec(name, role, type, desc=, optional=) - Creates a parameter specification with the given name, role (‘I’ or ‘O’), and type. +ParamSpec(name, type, metadata=) + Creates a parameter specification with the given name and type. The type should be a NumPy dtype object. - If a description is provided, it is included in the metadata of the DataSet. - The description can be a simple string or a string-to-string dictionary. - If optional is provided and is true, then the parameter is optional in each result. + If metadata is provided, it is included in the overall metadata of the DataSet, with the name of the parameter as the top-level tag. + The metadata can be any JSON-able object. + +Either the QCoDeS Parameter class should inherit from ParamSpec, or the Parameter class should provide +a simple way to get a ParamSpec for the Parameter. DataSet ~~~~~~~ @@ -120,25 +117,29 @@ DataSet() DataSet(specs) Creates a DataSet for the provided list of parameter specifications. - Each item in the list should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. - A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. + Each item in the list should be a ParamSpec object. + +DataSet(specs, values) + Creates a DataSet for the provided list of parameter specifications and values. + Each item in the specs list should be a ParamSpec object. + Each item in the values list should be a NumPy array or a Python list of values for the corresponding ParamSpec. + There should be exactly one item in the values list for every item in the specs list. + All of the arrays/lists in the values list should have the same length. + The values list my intermix NumPy arrays and Python lists. DataSet.add_parameter(spec) - Adds a parameter to an existing DataSet. - The spec should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. - A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. - It is an error to add a parameter to a non-empty DataSet. + Adds a parameter to the DataSet. + The spec should be a ParamSpec object. DataSet.add_parameters(specs) - Adds a list of parameters to an existing DataSet. - Each item in the list should either be a QCoDeS Parameter, a tuple of a Parameter and a Boolean, or a ParamSpec object. - A Parameter or a Parameter tupled with a false value indicates a required parameter; a Parameter tupled with a true value indicates an optional parameter. - It is an error to add a parameter to a non-empty DataSet. + Adds a list of parameters to the DataSet. + Each item in the list should be a ParamSpec object. -DataSet.add_metadata(tag=, info=) - Adds metadata to the current DataSet. +DataSet.add_metadata(tag=, metadata=) + Adds metadata to the DataSet. The metadata is stored under the provided tag. - It is an error to add metadata to a completed DataSet. + If there is already metadata under the provided tag, the new metadata replaces the old metadata. + The metadata can be any JSON-able object. Writing ------- @@ -147,7 +148,6 @@ DataSet.add_result(**kwargs) Adds a result to the DataSet. Keyword parameters should have the name of a parameter as the keyword and the value to associate as the value. If there is only one positional parameter and it is a dictionary, then it is interpreted as a map from parameter name to parameter value. - It is an error for a value for the same parameter to be specified both using a positional parameter or dictionary parameter and using a keyword, It is an error to provide a value for a key or keyword that is not the name of a parameter in this DataSet. It is an error to add a result to a completed DataSet. @@ -158,7 +158,12 @@ DataSet.add_results(args) The order of dictionaries in the sequence will be the same as the order in which they are added to the DataSet. It is an error to add results to a completed DataSet. -DataSet.complete() +DataSet.add_parameter_values(spec, values) + Adds a parameter to the DataSet and associates result values with the new parameter. + The values must be a NumPy array or a Python list, with each element holding a single result value that matches the parameter's data type. + If the DataSet is not empty, then the count of provided values must equal the current count of results in the DataSet, or an error will result. + +DataSet.mark_complete() Marks the DataSet as completed. Access @@ -171,7 +176,7 @@ DataSet.is_empty This attribute will be true if the DataSet is empty (has no results), or false if at least one result has been added to the DataSet. It is equivalent to testing if the length is zero. -DataSet.is_completed +DataSet.is_marked_complete This attribute will be true if the DataSet is completed or false if it is in progress. DataSet.get_data(*params, start=, end=) @@ -247,7 +252,19 @@ Open Issues #. Should DataSets automatically write to persistent store periodically, or should the user be required to call write() in order to flush changes ? At least for now, it seems useful to maintain the current behavior of the DataSet flushing to disk periodically. +On the other hand, this really isn't core functionality. + +**Decision: No, we will leave persistence under control of higher-level code.** #. Should there be a DataSet method similar to add_result that automatically adds a new result by calling the get() method on all parameters that are defined by QCoDeS Parameters? It would be really easy to write a helper method that does this, so it doesn’t seem necessary to have it in the core API. + +**Decision: No, we will not add such a method.** + +#. Should the persistence methods be part of DataSet, or should they be methods on persistence-specific classes? + +One advantage of removing them from this class is that it makes DataSet completely stand-alone. +The DataSet module would define two classes, ParamSpec and DataSet, and require only NumPy. +This level of modularity is very desirable. + From f215e5b6741fa21fc95702c4aee2278b1606247d Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Wed, 8 Feb 2017 14:46:57 -0800 Subject: [PATCH 3/6] More feedback Removed API reference to QCoDeS Parameters, allowed addition of parameters with value arrays, dropped "optional", added min_count and min_wait to subscriptions, and some RST cleanup. --- specs/DataSet.rst | 52 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 9 deletions(-) diff --git a/specs/DataSet.rst b/specs/DataSet.rst index 4a84afe6dd30..7f683cb83a24 100644 --- a/specs/DataSet.rst +++ b/specs/DataSet.rst @@ -103,6 +103,7 @@ A ParamSpec object specifies a single parameter in a DataSet. ParamSpec(name, type, metadata=) Creates a parameter specification with the given name and type. The type should be a NumPy dtype object. + If metadata is provided, it is included in the overall metadata of the DataSet, with the name of the parameter as the top-level tag. The metadata can be any JSON-able object. @@ -125,15 +126,21 @@ DataSet(specs, values) Each item in the values list should be a NumPy array or a Python list of values for the corresponding ParamSpec. There should be exactly one item in the values list for every item in the specs list. All of the arrays/lists in the values list should have the same length. - The values list my intermix NumPy arrays and Python lists. + The values list may intermix NumPy arrays and Python lists. DataSet.add_parameter(spec) Adds a parameter to the DataSet. The spec should be a ParamSpec object. + If the DataSet is not empty, then existing results will have the type-appropriate null value for the new parameter. + + It is an error to add parameters to a completed DataSet. DataSet.add_parameters(specs) Adds a list of parameters to the DataSet. Each item in the list should be a ParamSpec object. + If the DataSet is not empty, then existing results will have the type-appropriate null value for the new parameters. + + It is an error to add parameters to a completed DataSet. DataSet.add_metadata(tag=, metadata=) Adds metadata to the DataSet. @@ -148,7 +155,9 @@ DataSet.add_result(**kwargs) Adds a result to the DataSet. Keyword parameters should have the name of a parameter as the keyword and the value to associate as the value. If there is only one positional parameter and it is a dictionary, then it is interpreted as a map from parameter name to parameter value. + It is an error to provide a value for a key or keyword that is not the name of a parameter in this DataSet. + It is an error to add a result to a completed DataSet. DataSet.add_results(args) @@ -156,12 +165,15 @@ DataSet.add_results(args) The single argument should be a sequence of dictionaries, where each dictionary provides the values for all of the parameters in that result. See the add_result method for a description of such a dictionary. The order of dictionaries in the sequence will be the same as the order in which they are added to the DataSet. + It is an error to add results to a completed DataSet. DataSet.add_parameter_values(spec, values) Adds a parameter to the DataSet and associates result values with the new parameter. The values must be a NumPy array or a Python list, with each element holding a single result value that matches the parameter's data type. If the DataSet is not empty, then the count of provided values must equal the current count of results in the DataSet, or an error will result. + + It is an error to add parameters to a completed DataSet. DataSet.mark_complete() Marks the DataSet as completed. @@ -177,32 +189,45 @@ DataSet.is_empty It is equivalent to testing if the length is zero. DataSet.is_marked_complete - This attribute will be true if the DataSet is completed or false if it is in progress. + This attribute will be true if the DataSet has been marked as complete or false if it is in progress. DataSet.get_data(*params, start=, end=) Returns the values stored in the DataSet for the specified parameters. The values are returned as a list of parallel NumPy arrays, one array per parameter. The data type of each array is based on the data type provided when the DataSet was created. - If a parameter is optional and no value was provided for one or more results, the corresponding array entries will be the “null” value for the data type: zero for integers, NaN for floats, “” for strings, None for objects. + The parameter list may contain a mix of string parameter names, QCoDeS Parameter objects, and ParamSpec objects. + If provided, the start and end parameters select a range of results by result count (index). Start defaults to 0, and end defaults to the current length. - If the range is empty -- that is, if the end is less than or equal to the start – then a list of empty arrays is returned. + + If the range is empty -- that is, if the end is less than or equal to the start, or if start is after the current end of the DataSet – + then a list of empty arrays is returned. DataSet.get_parameters() Returns a list of ParamSpec objects that describe the parameters stored in this DataSet. DataSet.get_metadata(tag=) Returns metadata for this DataSet. + If a tag string is provided, only metadata stored under that tag is returned. Otherwise, all metadata is returned. + +Subscribing +---------------- -DataSet.subscribe(callback, state=) +DataSet.subscribe(callback, min_wait=, min_count=, state=) Subscribes the provided callback function to result additions to the DataSet. - Every time one or more results are added to the DataSet, the callback is called. - It is passed the DataSet itself, the length of the DataSet before the triggering addition, the length after the addition, and the state object provided when subscribing. + As results are added to the DataSet, the subscriber is notified by having the callback invoked. + + - min_wait is the minimum amount of time between notifications for this subscription, in milliseconds. The default is 100. + - min_count is the minimum number of results for which a notification should be sent. The default is 1. + + When the callback is invoked, it is passed the DataSet itself, the current length of the DataSet, and the state object provided when subscribing. If no state object was provided, then the callback gets passed None as the fourth parameter. - When the DataSet is completed, the callback gets called with the length of the DataSet as both the before and after lengths. + + The callback is invoked when the DataSet is completed, regardless of the values of min_wait and min_count. + This method returns an opaque subscription identifier. DataSet.unsubscribe(subid) @@ -215,23 +240,30 @@ Storage DataSet.read_from(location, formatter=) Reads a DataSet from persistent store. Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. - If not provided, the default formatter is used. + If not provided, the correct formatter is determined from the file extension and file format. + If the correct formatter cannot be determined, the default formatter is used. The default formatter is currently GNUPlotFormat(). + This is a static method in the DataSet class. It returns a new DataSet object. DataSet.read_updates() Updates the DataSet by reading any new results and metadata written since the last read. + This method returns a tuple of two Booleans indicating whether or not there were new results and whether or not there was new metadata. DataSet.write(location, formatter=, overwrite=) Writes the DataSet to persistent store. Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. If not provided, the default formatter is used; currently the default is GNUPlotFormat(). + Overwrite, if true, indicates that any old data found at the specified location should be deleted. Otherwise, it is an error to specify a location that is already in use. + This method can be called even if the DataSet is empty, in order to specify the location and format DataSet.write_updates() @@ -241,8 +273,10 @@ DataSet.write_updates() DataSet.write_copy(location, formatter=, overwrite=) Writes a separate copy of the DataSet to persistent store. Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. + Formatter is a QCoDeS Formatter object that specifies how data is read and written. If not provided, the formatter for the DataSet is used. + Overwrite, if true, indicates that any old data found at the specified location should be deleted. Otherwise, it is an error to specify a location that is already in use. From 5ccbe08c8c1dea9e37327ed12d9f939d39bed391 Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Thu, 9 Feb 2017 16:33:46 -0800 Subject: [PATCH 4/6] Added unique identifier Added a name parameter to the constructor and an id attribute that returns a unique identifier for the DAtaSet, suitable for use as a reference. --- specs/DataSet.rst | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/specs/DataSet.rst b/specs/DataSet.rst index 7f683cb83a24..3b2eac87b271 100644 --- a/specs/DataSet.rst +++ b/specs/DataSet.rst @@ -46,6 +46,7 @@ Basics #. A DataSet can store data of (reasonably) arbitrary types and shapes. basically, any type and shape that can fit in a NumPy array should be supported. #. The results stored in a completed DataSet should be immutable; no new results may be added to a completed DataSet. +#. Each DataSet should have a unique identifying string that can be used to create references to DataSets. Creation ------------ @@ -113,15 +114,21 @@ a simple way to get a ParamSpec for the Parameter. DataSet ~~~~~~~ -DataSet() +Construction +------------ + +DataSet(name) Creates a DataSet with no parameters. + The name should be a short string that will be part of the DataSet's identifier. -DataSet(specs) +DataSet(name, specs) Creates a DataSet for the provided list of parameter specifications. + The name should be a short string that will be part of the DataSet's identifier. Each item in the list should be a ParamSpec object. -DataSet(specs, values) +DataSet(name, specs, values) Creates a DataSet for the provided list of parameter specifications and values. + The name should be a short string that will be part of the DataSet's identifier. Each item in the specs list should be a ParamSpec object. Each item in the values list should be a NumPy array or a Python list of values for the corresponding ParamSpec. There should be exactly one item in the values list for every item in the specs list. @@ -181,6 +188,11 @@ DataSet.mark_complete() Access ------ +DataSet.id + Returns the unique identifying string for this DataSet. + This string will include the date and time that the DataSet was created and the name supplied to the constructor, + as well as additional content to ensure uniqueness. + DataSet.length This attribute holds the current number of results in the DataSet. From 27cafd9211c69fdecb74c0078cc2a546e9495ba0 Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Thu, 9 Feb 2017 16:36:18 -0800 Subject: [PATCH 5/6] Id field should be part of the metadata Specified that the identifier should be automatically stored in the DataSet's metadata. --- specs/DataSet.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/specs/DataSet.rst b/specs/DataSet.rst index 3b2eac87b271..90078d7ff090 100644 --- a/specs/DataSet.rst +++ b/specs/DataSet.rst @@ -63,7 +63,9 @@ Creation generated the results and the parameters included in the results. Essentially, DataSet metadata should be a string-keyed dictionary at the top, and should allow storage of any JSON-encodable data. - +#. The DataSet identifier should be automatically stored in the DataSet's metadata under the "id" tag. + + Writing ---------- From 3b417f72fb34c9203094d525231bfbebbcd75978 Mon Sep 17 00:00:00 2001 From: Alan Geller Date: Wed, 15 Feb 2017 16:45:03 -0800 Subject: [PATCH 6/6] Nearly-final draft Removed persistence, added more metadata details, added utilitiy function section --- specs/DataSet.rst | 108 +++++++++++++++++++++++----------------------- 1 file changed, 53 insertions(+), 55 deletions(-) diff --git a/specs/DataSet.rst b/specs/DataSet.rst index 90078d7ff090..89eae2ddb3f3 100644 --- a/specs/DataSet.rst +++ b/specs/DataSet.rst @@ -17,6 +17,13 @@ As long as a DataSet is used for data storage, users can freely select the QCoDe Terminology ================ +Metadata + Many items in this spec have metadata associated with them. + In all cases, we expect metadata to be represented as a dictionary with string keys. + While the values are arbitrary and up to the user, in many cases we expect metadata to be nested, string-keyed dictionaries + with scalars (strings or numbers) as the final values. + In some cases, we specify particular keys or paths in the metadata that other QCoDeS components may rely on. + Parameter A logically-single value input to or produced by a measurement. A parameter need not be a scalar, but can be an array or a tuple or an array of tuples, etc. @@ -88,6 +95,10 @@ Access Storage and Persistence ----------------------- +#. Storage and persistence should be defined outside of the DataSet class. + +The following items are no longer applicable: + #. A DataSet object should allow writing to and reading from storage in a variety of formats. #. Users should be able to define new persistence formats. #. Users should be able to specify where a DataSet is written. @@ -107,9 +118,19 @@ ParamSpec(name, type, metadata=) Creates a parameter specification with the given name and type. The type should be a NumPy dtype object. - If metadata is provided, it is included in the overall metadata of the DataSet, with the name of the parameter as the top-level tag. + If metadata is provided, it is included in the overall metadata of the DataSet. The metadata can be any JSON-able object. - + +ParamSpec.name + The name of this parameter. + +ParamSpec.type + The dtype of this parameter. + +ParamSpec.metadata + The metadata of this parameter. + This should be an empty dictionary as a default. + Either the QCoDeS Parameter class should inherit from ParamSpec, or the Parameter class should provide a simple way to get a ParamSpec for the Parameter. @@ -251,68 +272,45 @@ DataSet.unsubscribe(subid) Storage ------- -DataSet.read_from(location, formatter=) - Reads a DataSet from persistent store. - Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. - - Formatter is a QCoDeS Formatter object that specifies how data is read and written. - If not provided, the correct formatter is determined from the file extension and file format. - If the correct formatter cannot be determined, the default formatter is used. - The default formatter is currently GNUPlotFormat(). - - This is a static method in the DataSet class. - It returns a new DataSet object. +DataSet persistence is handled externally to this class. -DataSet.read_updates() - Updates the DataSet by reading any new results and metadata written since the last read. - - This method returns a tuple of two Booleans indicating whether or not there were new results and whether or not there was new metadata. +The existing QCoDeS storage subsystem should be modified so that some object has two methods: -DataSet.write(location, formatter=, overwrite=) - Writes the DataSet to persistent store. - Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. - - Formatter is a QCoDeS Formatter object that specifies how data is read and written. - If not provided, the default formatter is used; currently the default is GNUPlotFormat(). - - Overwrite, if true, indicates that any old data found at the specified location should be deleted. - Otherwise, it is an error to specify a location that is already in use. - - This method can be called even if the DataSet is empty, in order to specify the location and format +- A write_dataset method that takes a DataSet object and writes it to the appropriate storage location in an appropriate format. +- A read_dataset method that reads from the appropriate location, either with a specified format or inferring the format, and returns + a DataSet object. + +Metadata +======== -DataSet.write_updates() - Writes new results in the DataSet to persistent store. - Depending on the formatter, this may append to an existing stored version or may overwrite the stored version. +While in general the metadata associated with a DataSet is free-form, it is useful to specify a set of "well-known" tags and paths that components can rely on to contain specific information. +Other components are free to specify new well-known metadata tags and paths, as long as they don't conflict with the set defined here. -DataSet.write_copy(location, formatter=, overwrite=) - Writes a separate copy of the DataSet to persistent store. - Location may be a string file system path, a string URL, or some other string that is meaningful to the formatter specified. +parameters + This tag contains a dictionary from the string name of each parameter to information about that parameter. + Thus, if DataSet ds has a parameter named "foo", there will be a key "foo" in the dictionary returned from ds.get_metadata("parameters"). + The value associated with this key will be a string-keyed dictionary. - Formatter is a QCoDeS Formatter object that specifies how data is read and written. - If not provided, the formatter for the DataSet is used. +parameters/__param__/spec + This path contains a string-keyed dictionary with (at least) the following two keys: + The "type" key is associated with the NumPy dtype for the values of this parameter. + The "metadata" key is associated with the metadata that was passed to the ParamSpec constructor that defines this parameter, or an empty dictionary if no metadata was set. - Overwrite, if true, indicates that any old data found at the specified location should be deleted. - Otherwise, it is an error to specify a location that is already in use. - -Open Issues -=========== - -#. Should DataSets automatically write to persistent store periodically, or should the user be required to call write() in order to flush changes ? - -At least for now, it seems useful to maintain the current behavior of the DataSet flushing to disk periodically. -On the other hand, this really isn't core functionality. - -**Decision: No, we will leave persistence under control of higher-level code.** +Utilities +========= -#. Should there be a DataSet method similar to add_result that automatically adds a new result by calling the get() method on all parameters that are defined by QCoDeS Parameters? +There are many utility routines that may be defined outside of the DataSet class that may be useful. +We collect several of them here, with the note that these functions will not be part of the DataSet class +and will not be required by the DataSet class. -It would be really easy to write a helper method that does this, so it doesn’t seem necessary to have it in the core API. +dataframe_from_dataset(dataset) + Creates a Pandas DataFrame object from a DataSet that has been marked as completed. -**Decision: No, we will not add such a method.** +Open Issues +=========== -#. Should the persistence methods be part of DataSet, or should they be methods on persistence-specific classes? +#. Should it be possible to "reopen" a DataSet that has been marked as completed? -One advantage of removing them from this class is that it makes DataSet completely stand-alone. -The DataSet module would define two classes, ParamSpec and DataSet, and require only NumPy. -This level of modularity is very desirable. +This is convenient for adding data analysis results after the experiement has added, but could potentially lead mixing data from different experimental runs accidentally. +It is already possible to modify metadata after the DataSet has beenmarked as completed, but sometimes that may not be sufficient.