diff --git a/docs/userdocs/user_guide/_snippets.rst b/docs/userdocs/user_guide/_snippets.rst new file mode 100644 index 0000000..f291d76 --- /dev/null +++ b/docs/userdocs/user_guide/_snippets.rst @@ -0,0 +1,89 @@ +Snippets +======== + +Notes and writeups of handy description areas, that don't yet have a home. + +Data component (NameMap) dictionaries +------------------------------------- +For all of these properties, dictionary-style behaviour means that its ``.keys()`` +is a sequence of the content names, and ``.values()`` is a sequence of the contained +objects. + + +NcData +------ +The :class:`~ncdata.NcData` class represents either a dataset or group, +the structures of these are identical. + +NcAttributes +------------ +attributes are stored as NcAttribute objects, rather than simple name: value maps. +thus an 'attribute' of a NcVariable or NcData is an attribute object, not a value. + +Thus: + + >>> variable.attributes['x'] + NcAttribute('x', [1., 2., 7.]) + +The attribute has a ``.value`` property, but it is most usefully accessed with the +:meth:`~ncdata.NcAttribute.as_python_value()` method : + + >>> attr = NcAttribute('b', [1.]) + >>> attr.value + array([1.]) + >>> attr.as_python_value() + array(1.) + + >>> attr = NcAttribute('a', "this") + >>> attr.value + array('this', dtype='>> attr.as_python_value() + 'this' + +From within a parent object's ``.attributes`` dictionary, + + +Component Dictionaries +---------------------- +ordering +- insert, remove, rename effects +re-ordering + + +As described :ref:`above `, sub-components are stored under their names +in a dictionary container. + +Since all components have a name, and are stored by name in the parent property +dictionary (e.g. ``variable.attributes`` or ``data.dimensions``), the component +dictionaries have an :meth:`~ncdata.NameMap.add` method, which works with the component +name. +supported operations +^^^^^^^^^^^^^^^^^^^^ +standard dict methods : del, getitem, setitem, clear, append, extend +extra methods : add, addall + +ordering +^^^^^^^^ +For Python dictionaries in general, +since `announced in Python 3.7 `_, +the order of the entries is now a significant and stable feature of Python dictionaries. +There +Also as for Python dictionaries generally, there is no particular assistance for +managing or using the order. The following may give some indication: + +extract 'n'th item: ``data.variables[list(elelments.keys())[n]]`` +sort the list: + # get all the contents, sorted by name + content = list(data.attributes.values()) + content = sorted(content, key= lambda v: v.name) + # clear the container -- necessary to forget the old ordering + data.attributes.clear() + # add all back in, in the new order + data.attributes.addall(content) + +New entries are added last, and renamed entries retain their + +The :meth:`~ncdata.utils/dataset_differences` method reports differences in the +ordering of components (unless turned off). + + diff --git a/docs/userdocs/user_guide/data_objects.rst b/docs/userdocs/user_guide/data_objects.rst new file mode 100644 index 0000000..3c937c3 --- /dev/null +++ b/docs/userdocs/user_guide/data_objects.rst @@ -0,0 +1,274 @@ +Core Data Objects +================= +Ncdata uses Python objects to represent netCDF data, and allows the user to freely +inspect and/or modify it, aiming to do this is the most natural and pythonic way. + +.. _data-model: + +Data Classes +------------ +The data model components are elements of the +`NetCDF Classic Data Model`_ , plus **groups** (from the 'enhanced' netCDF model). + +That is, a Dataset(File) consists of just Dimensions, Variables, Attributes and +Groups. + +.. note:: + We are not, as yet, explicitly supporting the NetCDF4 extensions to variable-length + and user types. See : :ref:`data-types` + +The core ncdata classes representing these Data Model components are +:class:`~ncdata.NcData`, :class:`~ncdata.NcDimension`, :class:`~ncdata.NcVariable` and +:class:`~ncdata.NcAttribute`. + +Notes : + +* There is no "NcGroup" class : :class:`~ncdata.NcData` is used for both the "group" and + "dataset" (aka file). + +* All data objects have a ``.name`` property, but this can be empty (``None``) when it is not + contained in a parent object as a component. See :ref:`components-and-containers`, + below. + + +:class:`~ncdata.NcData` +^^^^^^^^^^^^^^^^^^^^^^^ +This represents a dataset containing variables, attributes and groups. +It is also used to represent groups. + +:class:`~ncdata.NcDimension` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +This represents a dimension, defined in terms of name, length, and whether "unlimited" +(or not). + +:class:`~ncdata.NcVariable` +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Represents a data variable, with dimensions and, optionally, data and attributes. + +Note that ``.dimensions`` is simply a list of names (strings) : they are not +:class:`~ncdata.NcDimension` objects, and not linked to actual dimensions of the +dataset, so *actual* dimensions are only identified dynamically, when they need to be. + +Variables can be created with either real (numpy) or lazy (dask) arrays, or no data at +all. + +A variable has a ``.dtype``, which may be set if creating with no data. +However, at present, after creation ``.data`` and ``.dtype`` can be reassigned and there +is no further checking of any sort. + +.. _variable-dtypes: + +Variable Data Arrays +"""""""""""""""""""" +When a variable does have a ``.data`` property, this will be an array, with at least +the usual ``shape``, ``dtype`` and ``__getitem__`` properties. In practice we assume +for now that we will always have real (numpy) or lazy (dask) arrays. + +When data is exchanged with an actual file, it is simply written if real, and streamed +(via :meth:`dask.array.store`) if lazy. + +When data is exchanged with supported data analysis packages (i.e. Iris or Xarray, so +far), these arrays are transferred directly without copying or making duplicates (such +as numpy views). +This is a core principle (see :ref:`design-principles`), but may require special support in +those packages. + +See also : :ref:`data-types` + +:class:`~ncdata.NcAttribute` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Represents an attribute, with name and value. The value is always either a scalar +or a 1-D numpy array -- this is enforced as a computed property (read and write). + +.. _attribute-dtypes: + +Attribute Values +"""""""""""""""" +In actual netCDF data, the value of an attribute is effectively limited to a one-dimensional +array of certain valid netCDF types, and one-element arrays are exactly equivalent to scalar values. + +In ncdata, the ``.value`` of an :class:`ncdata.NcAttribute` must always be a numpy array, and +when creating one the provided ``.value`` is cast with :func:`numpy.asanyarray`. + +However you are not prevented from setting an attributes ``.value`` to something other than +an array, which may cause an error. So for now, if setting the value of an existing attribute, +ensure you always write compatible numpy data, or use :meth:`ncdata.NameMap.set_attrval` which is safe. + +For *reading* attributes, it is best to use :meth:`ncdata.NameMap.get_attrval` or (equivalently) +:meth:`ncdata.NcAttribute.as_python_value()` : These consistently return either +``None`` (if missing); a numpy scalar; or array; or a Python string. Those results are +intended to be equivalent to what you should get from storing in an actual file and reading back, +including re-interpreting a length-one vector as a scalar value. + +.. attention:: + The correct handling and (future) discrimination of string data as character arrays ("char" in netCDF terms) + and/or variable-length strings ("string" type) is still to be determined. + + For now, we are converting **all** string attributes to python strings. + + There is **also** a longstanding known problem with the low-level C (and FORTRAN) interface, which forbids the + creation of vector character attributes, which appear as single concatenated strings. So for now, **all** + string-type attributes appear as single Python strings (you never get an array of strings or list of strings). + +See also : :ref:`data-types` + +.. _correctness-checks: + +Correctness and Consistency +--------------------------- +In practice, to support flexibility in construction and manipulation, it is +not practical for ncdata structures to represent valid netCDF at +all times, since this would makes changing things awkward. +For example, if a group refers to a dimension *outside* the group, you could not simply +extract it from the dataset because it is not valid in isolation. + +Thus, we do allow that ncdata structures represent *invalid* netCDF data. +For example, circular references, missing dimensions or naming mismatches. +Effectively there are a set of data validity rules, which are summarised in the +:func:`ncdata.utils.save_errors` routine. + +In practice, there a minimal set of runtime rules for creating ncdata objects, and +additional requirements when ncdata is converted to actual netCDF. For example, +variables can be initially created with no data. But if subsequently written to a file, +data must be assigned first. + +.. Note:: + These issues are not necessarily all fully resolved. Caution required ! + +.. _components-and-containers: + +Components, Containers and Names +-------------------------------- +Each dimension, variable, attribute or group normally exists as a component in a +parent dataset (or group), where it is stored in a "container" property of the parent, +i.e. either its ``.dimensions``, ``.variables``, ``.attributes`` or ``.groups``. + +Each of the "container" properties is a :class:`~ncdata._core.NameMap` object, which +is a dictionary type mapping a string (name) to a specific type of components. +The dictionary``.keys()`` are a sequence of component names, and its ``.values()`` are +the corresponding contained components. + +Every component object also has a ``.name`` property. By this, it is implicit that you +**could** have a difference between the name by which the object is indexed in its +container, and its ``.name``. This is to be avoided ! + +The :meth:`~ncdata.NameMap` container class is provided with convenience methods which +aim to make this easier, such as :meth:`~ncdata.NameMap.add` and +:meth:`~ncdata.NameMap.rename`. + +NcData and NcVariable ".attributes" components +---------------------------------------------- +Note that the contents of a ".attributes" are :class:`~ncdata.NcAttributes` objects, +not attribute values. + +Thus to fetch an attribute you might write, for example one of these : + +.. code-block:: + + units1 = dataset.variables['var1'].get_attrval('units') + units1 = dataset.variables['var1'].attributes['units'].as_python_value() + +but **not** ``unit = dataset.variables['x'].attributes['attr1']`` + +And not ``unit = dataset.variables['x'].attributes['attr1']`` + +Or, likewise, to ***set*** values, one of + +.. code-block:: + + dataset.variables['var1'].set_attrval('units', "K") + dataset.variables['var1'].attributes['units'] = NcAttribute("units", K) + +but **not** ``dataset.variables['x'].attributes['units'].value = "K"`` + + +Container ordering +------------------ +The order of elements of a container is technically significant, and does constitute a +potential difference between datasets (or files). + +The :meth:`ncdata.NameMap.rename` method preserves the order of an element, +while :meth:`ncdata.NameMap.add` adds the new components at the end. + +The :func:`ncdata.utils.dataset_differences` utility provides various keywords allowing +you to ignore ordering in comparisons, when required. + + +Container methods +----------------- +The :class:`~ncdata.NameMap` class also provides a variety of manipulation methods, +both normal dictionary operations and some extra ones. + +The most notable ones are : ``del``, ``pop``, ``add``, ``addall``, ``rename`` and of +course ``__setitem__`` . + +See :ref:`common_operations` section. + +.. _data-constructors: + +Core Object Constructors +------------------------ +The ``__init__`` methods of the core classes are designed to make in-line definition of +new objects in user code reasonably legible. So, when initialising one of the container +properties, the keyword/args defining component parts use the utility method +:meth:`ncdata.NameMap.from_items` so that you can specify a group of components in a variety of ways : +either a pre-created container or a similar dictionary-like object : + +.. code-block:: python + + >>> ds1 = NcData(groups={ + ... 'x':NcData('x'), + ... 'y':NcData('y') + ... }) + >>> print(ds1) + + groups: + + + > + +or **more usefully**, just a *list* of suitable data objects, like this... + +.. code-block:: python + + >>> ds2 = NcData( + ... variables=[ + ... NcVariable('v1', ('x',), data=[1,2]), + ... NcVariable('v2', ('x',), data=[2,3]) + ... ] + ... ) + >>> print(ds2) + + variables: + + + > + +Or, in the **special case of attributes**, a regular dictionary of ``name: value`` form +will be automatically converted to a NameMap of ``name: NcAttribute(name: value)`` : + +.. code-block:: python + + >>> var = NcVariable( + ... 'v3', + ... attributes={'x':'this', 'b':1.4, 'arr': [1, 2, 3]} + ... ) + >>> print(var) + ): v3() + v3:x = 'this' + v3:b = 1.4, + v3:arr = array([1, 2, 3]) + > + + +Relationship to File Storage +---------------------------- +Note that file-specific storage aspects, such as chunking, data-paths or compression +strategies, are not recorded in the core objects. However, array representations in +variable and attribute data (notably dask lazy arrays) may hold such information. +The concept of "unlimited" dimensions is arguably an exception. However, this is a +core provision in the NetCDF data model itself (see "Dimension" in the `NetCDF Classic Data Model`_). + +.. _NetCDF Classic Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#classic_model diff --git a/docs/userdocs/user_guide/design_principles.rst b/docs/userdocs/user_guide/design_principles.rst index 657483a..3ac2e19 100644 --- a/docs/userdocs/user_guide/design_principles.rst +++ b/docs/userdocs/user_guide/design_principles.rst @@ -11,6 +11,7 @@ Purpose * allow analysis packages (Iris, Xarray) to exchange data efficiently, including lazy data operations and streaming +.. _design-principles: Design Principles ----------------- diff --git a/docs/userdocs/user_guide/general_topics.rst b/docs/userdocs/user_guide/general_topics.rst new file mode 100644 index 0000000..661278a --- /dev/null +++ b/docs/userdocs/user_guide/general_topics.rst @@ -0,0 +1,200 @@ +.. _common_operations: + +Common Operations +================= +A group of common operations are available on all the core component types, +i.e. the operations of extract/remove/insert/rename/copy on the ``.datasets``, ``.groups``, +``.dimensions``, ``.variables`` and ``.attributes`` properties of the core objects. + +Most of these are hopoefully "obvious" Pythonic methods of the container objects. + +Extract and Remove +------------------ +These are implemented as :meth:`~ncdata.NameMap.__delitem__` and :meth:`~ncdata.NameMap.pop` +methods, which work in the usual way. + +Examples : + +* ``var_x = dataset.variables.pop("x")`` +* ``del data.variables["x"]`` + +Insert / Add +------------ +A new content (component) can be added under its own name with the +:meth:`~ncdata.NameMap.add` method. + +Example : ``dataset.variables.add(NcVariable("x", dimensions=["x"], data=my_data))`` + +An :meth:`~ncdata.NcAttribute` can also be added or set (if already present) with the special +:meth:`~ncdata.NameMap.set_attrval` method. + +Example : ``dataset.variables["x"].set_attrval["units", "m s-1")`` + +Rename +------ +A component can be renamed with the :meth:`~ncdata.NameMap.rename` method. This changes +both the name in the container **and** the component's own name -- it is not recommended +ever to set ``component.name`` directly, as this obviously can become inconsistent. + +Example : ``dataset.variables.rename("x", "y")`` + +.. warning:: + Renaming a dimension will not rename references to it (i.e. in variables), which + obviously may cause problems. + We may add a utility to do this safely this in future. + +Copy +---- +All core objects support a ``.copy()`` method, which however does not copy array content +(e.g. variable data or attribute arrays). See for instance :meth:`ncdata.NcData.copy`. + +There is also a utility function :func:`ncdata.utils.ncdata_copy`, this is effectively +the same as the NcData object copy. + + +Creation +-------- +The constructors should allow reasonably readable inline creation of data. +See here : :ref:`data-constructors` + +Ncdata is deliberately not very fussy about 'correctness', since it is not tied to an actual +dataset which must "make sense". see : :ref:`correctness-checks` . + +Hence, there is no great need to install things in the 'right' order (e.g. dimensions +before variables which need them). You can create objects in one go, like this : + +.. code-block:: + + data = NcData( + dimensions=[ + NcDimension("y", 2), + NcDimension("x", 3), + ], + variables=[ + NcVariable("y", dimensions=["y"], data=[10, 11]), + NcVariable("x", dimensions=["y"], data=[20, 21, 22]), + NcVariable("dd", dimensions=["y", "x"], data=[[0, 1, 2], [3, 4, 5]]) + ] + ) + + +or iteratively, like this : + +.. code-block:: + + data = NcData() + dims = [("y", 2), ("x", 3)] + data.variables.addall([ + NcVariable(nn, dimensions=[nn], data=np.arange(ll)) + for ll, nn in dims + ]) + data.variables.add( + NcVariable("dd", dimensions=["y", "x"], + data=np.arange(6).reshape(2,3)) + ) + data.dimensions.addall([NcDimension(nn, ll) for nn, ll in dims]) + +Note : here, the variables were created before the dimensions + + +Equality Checks +--------------- +We provide a simple ``==`` check for all the core objects but this can be very costly, +at least for variables, because it will check all the data, even in lazy arrays (!). + +You can use :func:`ncdata.utils.dataset_differences` for much more nuanced and controllable +checking. + + +Validity Checking +----------------- +See : :ref:`correctness-checks` + +General Topics +============== +Odd discussion topics + +.. _data-types: + +Data Types (dtypes) +------------------- +:ref:`Variable data ` and :ref:`attribute values ` +all use a subset of numpy **dtypes**, compatible with netcdf datatypes. +These are effectively those defined by `netcdf4-python `_, and this +therefore also effectively determines what we see in `dask arrays `_ . + +However, at present ncdata directly supports only the `NetCDF Classic Data Model`_ (plus groups, +see : :ref:`data-model`). +So, this does ***not*** include the user-defined, enumerated or variable-length datatypes. + +.. attention:: + + In practice, we have found that at least variables of the variable-length "string" datatype do seem to function + correctly at present, but this is not officially supported, and not currently tested. + + We hope to extend support to the more general `NetCDF Enhanced Data Model`_ in future. + +As-of January 2025 there is + +.. _NetCDF Classic Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#classic_model + +.. _NetCDF Enhanced Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#enhanced_model + + +.. _character-data: + +Character Data +-------------- +NetCDF can can contain string and character data in at least 3 different contexts : + +1. in variable data arrays +2. in attribute values +3. in names of components (i.e. dimensions / variables / attributes / groups ) + +The first case (3.) is, effectively, quite separate. +Since NetCDF version 4, the names of items within files are fully unicode compliant and can +use virtually ***any*** characters, with the exception of the forward slash "/" +( since in some technical cases a component name needs to specified as a "path-like" compound ) + +.. _thread-safety: + +Thread Safety +------------- +In short, it turns out that thread safety can be an issue whenever "lazy" data is being read, which occurs whenever +data is being plotted, calculated or written to a new output file. + +Whenever data is being "computed" (in Dask terms : see `Dask compute `_), that was loaded using more than +one of the Iris, Xarray and ncdata.netcdf4 packages, then :mod:`ncdata.threadlock_sharing` must be used to avoid +possible errors. + +A Fuller Explanation.. +^^^^^^^^^^^^^^^^^^^^^^ +In practice, Iris, Xarray and Ncdata are all capable of scanning netCDF files and interpreting +their metadata, while **not** reading all the core variable data contained in them. + +The file load generates `Dask.arrray `_ objects representing sections of +variable data for calculation on later request, with certain key benefits : + +1. no data loading or calculation happens until needed +2. the work is divided into sectional 'tasks', of which only some may ultimately be needed +3. it may be possible to perform multiple sections of calculation (including data fetch) in parallel +4. it may be possible to localise operations (fetch or calculate) near to data distributed across a cluster + +Usually, the most efficient parallelisation of array operations is by multi-threading, +since that can use memory sharing of large data arrays in memory. However, the python netCDF4 library is **not threadsafe**, +therefore the "netcdf fetch" call in each input operation must be guarded by a mutex. + +So Xarray, Iris and ncdata all create data objects with Dask arrays, which reference input data chunks fetching sections +of the input files. Each of those uses a mutex to stop it accessing the netCDF4 interface at the same time as +any of the others. + +This works beautifully **until** ncdata connects lazy data loaded with Iris (say) with lazy data loaded from Xarray, +which unfortunately are using their own *separate* mutexes to protect the *same* netcdf library. Then, when we attempt +to calculate or save this result, we may get sporadic and unpredictable system-level errors, even a core-dump. + +So, the function of :mod:`ncdata.threadlock_sharing` is to **connect** the thread-locking schemes of the separate libraries, +so that they cannot accidentally overlap an access call from the other package in a different thread, +just as they already cannot overlap one of their own. + +.. _dask-array: https://docs.dask.org/en/stable/array.html +.. _dask-compute: https://docs.dask.org/en/latest/generated/dask.array.Array.compute.html \ No newline at end of file diff --git a/docs/userdocs/user_guide/howtos.rst b/docs/userdocs/user_guide/howtos.rst new file mode 100644 index 0000000..f58a700 --- /dev/null +++ b/docs/userdocs/user_guide/howtos.rst @@ -0,0 +1,525 @@ +How-To Questions +================ +Short goal-focussed descriptions of how to achieve specific things. +These are mostly presented as example code snippets, but also link to other +documentation to describe concepts and technical details. + +**"Why Not Just..."** sections highlight warnings for what *not* to do, +i.e. wrong turns and gotchas, with brief descriptions of why. + + +.. _howto_access: + +Access a data object +-------------------- +Index by component names to get the object which represents a particular element. + +.. code-block:: python + + >>> dataset.attributes["experiment"] + NcAttribute("'experiment', 'A301.7') + >>> dataset.dimensions["x"] + NcDimension('x', 3) + >>> dataset.variables['vx'].attributes['units'] + NcAttribute("'unit', 'm s-1') + +Variable, attributes, dimensions and sub-groups are all stored by name like this, +in a parent property which is a "component container" dictionary. + +.. Warning:: + + The :attr:`~ncdata.NcVariable.dimensions` property of a :class:`~ncdata.NcVariable` + is different : it is *not* a dictionary of :class:`~ncdata.NcDimension` objects, + but just a *list of dimension names*. + + +.. _howto_add_something: + +Add a data object +----------------- +Use the :meth:`~ncdata.NameMap.add` method of a component-container property to insert +a new item. + + >>> data.dimensions.add(NcDimension("y", 4)) + >>> data.dimensions + {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)} + +The item must be of the correct type, in this case a :class:`~ncdata.NcDimension`. +If not, an error will be raised. + +.. Warning:: + + **Why Not Just...** ``data.dimensions["y"] = NcDimension("y", 4)`` ? + + This does actually work, but the user must ensure that the dictionary key always + matches the name of the component added. Using :meth:`~ncdata.NameMap.add` is thus + safe, and actually *simpler*, since all components have a definite name anyway. + + +.. _howto_remove_something: + +Remove a data object +-------------------- +The standard Python ``del`` operator can be applied to a component property to remove +something by its name. + + >>> data.dimensions + {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)} + >>> del data.dimensions['x'] + >>> data.dimensions + {'y': NcDimension('y', 3)} + + +.. _howto_rename_something: + +Rename a data object +-------------------- +Use the :meth:`~ncdata.NameMap.rename` method to rename a component. + +.. code-block:: + + >>> data.dimensions + {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)} + >>> data.dimensions.rename['x', 'q'] + >>> data.dimensions + {'q': NcDimension('q', 3) 'y': NcDimension('y', 3)} + +Note that this affects both the element's container key *and* its ``.name``. + + +.. Warning:: + + Renaming a **dimension** can cause problems, so must be done with care. + See :ref:`howto_rename_dimension`. + +.. Warning:: + + **Why Not Just...** ``dim = data.dimensions['x']; dim.name = "q"`` ? + + This would break the expected ``key == elements[key].name`` rule. + We don't prevent this, but it is usually a mistake. + :func:`~ncdata.utils.save_errors` detects this type of problem. + + +.. _howto_rename_dimension: + +Rename a dimension +------------------ +Simply using ``ncdata.dimensions.rename()`` can cause problems, because you must then +**also** replace the name where it occurs in the dimensions of any variables. + +.. Note:: + + **To-Do** : there should be a utility for this, but as yet it does not exist. + See `Issue#87 `_. + + +.. _howto_read_attr: + +Read an attribute value +----------------------- +To get an attribute of a dataset, group or variable, use the +:meth:`ncdata.NcData.get_attrval` or :meth:`ncdata.NcVariable.get_attrval` +method, which returns either a single (scalar) number, a numeric array, or a string. + +.. code-block:: python + + >>> variable.get_attr("x") + 3.0 + >>> dataset.get_attr("context") + "Results from experiment A301.7" + >>> dataset.variables["q"].get_attr("level_settings") + [1.0, 2.5, 3.7] + +**Given an isolated** :class:`ncdata.NcAttribute` **instance** : + +Its value is best read with the :meth:`ncdata.NcAttribute.get_python_value` method, +which produces the same results as the above. + + >>> variable.attributes[myname].get_python_value() + 3.0 + +.. Note:: + + **Why Not Just...** use ``NcAttribute.value`` ? + + For example + + .. code-block:: python + + >>> data.variables["x"].attributes["q"].value + [1] + + The ``.value`` is always stored as a :class:`~numpy.ndarray` array, but this is not + how it is stored in netCDF. The ``get_python_value()`` returns the attribute + as a straightforward value, compatible with what is seen in ``ncdump`` output, + and results from the ``netCDF4`` module. + + +.. _howto_write_attr: + +Change an attribute value +------------------------- +To set an attribute of a dataset, group or variable, use the +:meth:`ncdata.NcData.set_attrval` or :meth:`ncdata.NcVariable.set_attrval` method. + +All attributes are writeable, and the type can be freely changed. + +.. code-block:: python + + >>> variable.set_attr("x", 3.) + >>> variable.get_attr("x") + 3.0 + >>> variable.set_attr("x", "string-value") + >>> variable.get_attr("x") + "string-value" + +.. Note:: + + **Why Not Just...** set ``NcAttribute.value`` directly ? + + For example + + .. code-block:: python + + >>> data.variables["x"].attributes["q"].value = 4.2 + + This is generally unwise, because the ``.value`` should always be a numpy + :class:`~numpy.ndarray` array, with a suitable ``dtype``, but the + :class:`~ncdata.Ncattribute` type does not currently enforce this. + The ``set_attrval`` method both converts for convenience, and ensures that the + value is stored in a valid form. + + +.. _howto_create_attr: + +Create an attribute +------------------- +To create an attribute on a dataset, group or variable, just set its value with the +:meth:`ncdata.NcData.set_attrval` or :meth:`ncdata.NcVariable.set_attrval` method. +This works just like :ref:`howto_write_attr` : i.e. it makes no difference whether the +attribute already exists or not. + +.. code-block:: python + + >>> variable.set_attr("x", 3.) + +.. Note:: + + Assigning attributes when *creating* a dataset, variable or group is somewhat + simpler, discussed :ref:`here `. + + +.. _howto_create_variable: + +Create a variable +----------------- +Use the :meth:`NcVariable() ` constructor to create a new +variable with a name, dimensions, and optional data and attributes. + +A minimal example: + +.. code-block:: python + + >>> var = NcVariable("data", ("x_axis",)) + >>> print(var) + ): data(x_axis)> + >>> print(var.data) + None + >>> + +A more rounded example, including a data array: + +.. code-block:: python + + >>> var = NcVariable("vyx", ("y", "x"), + ... data=[[1, 2, 3], [0, 1, 1]], + ... attributes=[NcAttribute('a', 1), NcAttribute('b', 'setting=off')] + ... ) + >>> print(var) + + >>> print(var.data) + [[1 2 3] + [0 1 1]] + >>> + + + +.. _howto_access_vardata: + +Read or write variable data +--------------------------- +The :attr:`~ncdata.NcVariable.data` property of a :class:`~ncdata.NcVariable` usually +holds a data array. + +.. code-block:: python + + >>> var.data = np.array([1, 2]) + >>> print(var.data) + +This may be either a :class:`numpy.ndarray` (real) or a :class:`dask.array.Array` +(lazy) array. If the data is converted from another source (file, iris or xarray), +it is usually lazy. + +It can be freely overwritten by the user. + +.. Warning:: + + If not ``None``, the ``.data`` should **always** be an array of the correct shape. + + The :func:`~ncdata.utils.save_errors` function checks that all variables have + valid dimensions, and that ``.data`` arrays match the dimensions. + + + +Save data to a new file +----------------------- +Use the :func:`ncdata.netcdf4.to_nc4` function to write data to a file: + +.. code-block:: python + + >>> from ncdata.netcdf4 import to_nc4 + >>> to_nc4(data, filepath) + + +Read from or write to Iris cubes +-------------------------------- +Use :func:`ncdata.iris.to_iris` and :func:`ncdata.iris.from_iris`. + +.. code-block:: python + + >>> from ncdata.iris import from_iris, to_iris + >>> cubes = iris.load(file) + >>> ncdata = from_iris(cubes) + >>> + >>> cubes2 = to_iris(ncdata) + +Note that: + +* :func:`ncdata.iris.to_iris` calls :func:`iris.load` +* :func:`ncdata.iris.from_iris` calls :func:`iris.save` + +Extra kwargs are passed on to the iris load/save routine. + +Since an :class:`~ncdata.NcData` is like a complete file, or dataset, it is written to +or read from multiple cubes, in a :class:`~iris.cube.CubeList`. + + +Read from or write to Xarray datasets +------------------------------------- +Use :func:`ncdata.xarray.to_xarray` and :func:`ncdata.xarray.from_xarray`. + +.. code-block:: python + + >>> from ncdata.xarray import from_xarray, to_xarray + >>> dataset = xarray.open_dataset(filepath) + >>> ncdata = from_xarray(dataset) + >>> + >>> ds2 = to_xarray(ncdata) + +Note that: + +* :func:`ncdata.xarray.to_xarray` calls :func:`xarray.Dataset.load_store`. + +* :func:`ncdata.xarray.from_xarray` calls :func:`xarray.Dataset.dump_to_store` + +Any additional kwargs are passed on to the xarray load/save routine. + +An NcData writes or reads as an :class:`xarray.Dataset`. + + + +Convert data directly from Iris to Xarray, or vice versa +-------------------------------------------------------- +Use :func:`ncdata.iris_xarray.cubes_to_xarray` and +:func:`ncdata.iris_xarray.cubes_from_xarray`. + +.. code-block:: python + + >>> from ncdata.iris_xarray import cubes_from_xarray, cubes_to_xarray + >>> cubes = iris.load(filepath) + >>> dataset = cubes_to_xarray(cubes) + >>> + >>> cubes2 = cubes_from_xarray(dataset) + +These functions are simply a convenient shorthand for combined use of +:func:`ncdata.xarray.from_xarray` then :func:`ncdata.iris.to_iris`, +or :func:`ncdata.iris.from_iris` then :func:`ncdata.xarray.to_xarray`. + +Extra keyword controls for the relevant iris and xarray load and save routines can be +passed using specific dictionary keywords, e.g. + +.. code-block:: python + + >>> cubes = cubes_from_xarray( + ... dataset, + ... iris_load_kwargs={'constraints': 'air_temperature'}, + ... xr_save_kwargs={'unlimited_dims': ('time',)}, + ... ) + ... + +Combine data from different input files into one output +------------------------------------------------------- +This can be + + +Create a brand-new dataset +-------------------------- +Use the :meth:`NcData() <~ncdata.NcData.__init__>` constructor to create a new dataset. + +Contents and components can be attached on creation ... + +.. code-block:: python + + >>> data = NcData( + >>> dimensions=[NcDimension("y", 2), NcDimension("x", 3)], + >>> variables=[ + >>> NcVariable("y", ("y",), data=[0, 1]), + >>> NcVariable("x", ("x",), data=[0, 1, 2]), + >>> NcVariable( + >>> "vyx", ("y", "x"), + >>> data=np.zeros((2, 3)), + >>> attributes=[ + >>> NcAttribute("long_name", "rate"), + >>> NcAttribute("units", "m s-1") + >>> ] + >>> )], + >>> attributes=[NcAttribute("history", "imaginary")]) + ... + >>> print(data) + + dimensions: + y = 2 + x = 3 + + variables: + + ... + +... or added iteratively ... + +.. code-block:: python + + >>> data = NcData() + >>> ny, nx = 2, 3 + >>> data.dimensions.add(NcDimension("y", ny)) + >>> data.dimensions.add(NcDimension("x", nx)) + >>> data.variables.add(NcVariable("y", ("y",))) + >>> data.variables.add(NcVariable("x", ("x",))) + >>> data.variables.add(NcVariable("vyx", ("y", "x"))) + >>> vx, vy, vyx = [data.variables[k] for k in ("x", "y", "vyx")] + >>> vx.data = np.arange(nx) + >>> vy.data = np.arange(ny) + >>> vyx.data = np.zeros((ny, nx)) + >>> vyx.set_attrval("long_name", "rate"), + >>> vyx.set_attrval("units", "m s-1") + >>> data.set_attrval("history", "imaginary") + + +Remove or rewrite specific attributes +------------------------------------- + + +Save selected variables to a new file +------------------------------------- +Load input with :func:`ncdata.netcdf4.from_nc4`; use :meth:`ncdata.NameMap.add` to add +selected elements into a new :class:`ncdata.Ncdata`, and then save it +with :func:`ncdata.netcdf4.to_nc4`. + +For a simple case with no groups, it could look something like this: + +.. code-block:: python + + >>> input = from_nc4(input_filepath) + >>> output = NcData() + >>> for varname in ('data1', 'data2', 'dimx', 'dimy'): + >>> var = input.variables[varname] + >>> output.variables.add(var) + >>> for name in var.dimensions if name not in output.dimensions: + >>> output.dimensions.add(input.dimensions[dimname]) + ... + >>> to_nc4(output, output_filepath) + +Sometimes it's simpler to load the input, delete content **not** wanted, then re-save. +It's perfectly safe to do that, since the original file will be unaffected. + +.. code-block:: python + + >>> data = from_nc4(input_filepath) + >>> for name in ('extra1', 'extra2', 'unwanted'): + >>> del data.variables[varname] + ... + >>> del data.dimensions['pressure'] + >>> to_nc4(data, output_filepath) + + +Adjust file content before loading into Iris/Xarray +--------------------------------------------------- +Use :func:`~ncdata.netcdf4.from_nc4`, and then :func:`~ncdata.iris.to_iris` or +:func:`~ncdata.xarray.to_xarray`. You can thus adjust file content at the file level, +to avoid loading problems. + +For example, to replace an invalid coordinate name in iris input : + +.. code-block:: python + + >>> from ncdata.netcdf4 import from_nc4 + >>> from ncdata.iris import to_iris + >>> ncdata = from_nc4(input_filepath) + >>> for var in ncdata.variables: + >>> coords = var.attributes.get('coordinates', "") + >>> if "old_varname" in coords: + >>> coords.replace("old_varname", "new_varname") + >>> var.set_attrval("coordinates", coords) + ... + >>> cubes = to_iris(ncdata) + +or, to replace a mis-used special attribute in xarray input : + +.. code-block:: python + + >>> from ncdata.netcdf4 import from_nc4 + >>> from ncdata.xarray import to_xarray + >>> ncdata = from_nc4(input_filepath) + >>> for var in ncdata.variables: + >>> if "_fillvalue" in var.attributes: + >>> var.attributes.rename("_fillvalue", "_FillValue") + ... + >>> cubes = to_iris(ncdata) + + +Adjust Iris/Xarray save output before writing to a file +------------------------------------------------------- +Use :func:`~ncdata.iris.from_iris` or :func:`~ncdata.xarray.from_xarray`, and then +:func:`~ncdata.netcdf4.to_nc4`. You can thus make changes to the saved output which +would be difficult to overcome if first written to an actual file. + +For example, to force an additional unlimited dimension in iris output : + +.. code-block:: python + + >>> from ncdata.iris import from_iris + >>> from ncdata.netcdf4 import to_nc4 + >>> ncdata = from_iris(cubes) + >>> ncdata.dimensions['timestep'].unlimited = True + >>> to_nc4(ncdata, "output.nc") + +or, to convert xarray data variable output to masked integers : + +.. code-block:: python + + >>> from numpy import ma + >>> from ncdata.iris import from_xarray + >>> from ncdata.netcdf4 import to_nc4 + >>> ncdata = from_xarray(dataset) + >>> var = ncdata.variables['experiment'] + >>> mask = var.data.isnan() + >>> data = var.data.astype(np.int16) + >>> data[mask] = -9999 + >>> var.data = data + >>> var.set_attrval("_FillValue", -9999) + >>> to_nc4(ncdata, "output.nc") + diff --git a/docs/userdocs/user_guide/known_issues.rst b/docs/userdocs/user_guide/known_issues.rst index 12c8b1e..8430e61 100644 --- a/docs/userdocs/user_guide/known_issues.rst +++ b/docs/userdocs/user_guide/known_issues.rst @@ -22,6 +22,13 @@ To be fixed * `issue#66 `_ +.. _todo: + +Incomplete Documentation +^^^^^^^^^^^^^^^^^^^^^^^^ +(PLACEHOLDER: documentation is incomplete, please fix me !) + + Identified Design Limitations ----------------------------- diff --git a/docs/userdocs/user_guide/user_guide.rst b/docs/userdocs/user_guide/user_guide.rst index 9b0b938..65500da 100644 --- a/docs/userdocs/user_guide/user_guide.rst +++ b/docs/userdocs/user_guide/user_guide.rst @@ -18,8 +18,8 @@ For the present, please see the following : :maxdepth: 2 design_principles - (TODO : empty) Data object descriptions - (TODO : empty) General topics - (TODO : empty) How-tos + data_objects + general_topics + howtos known_issues ../../change_log