diff --git a/docs/change_log.rst b/docs/change_log.rst
index 2080abf..611ae32 100644
--- a/docs/change_log.rst
+++ b/docs/change_log.rst
@@ -1,22 +1,27 @@
+.. _change_log:
+
Versions and Change Notes
=========================
-Project Status
---------------
+.. _development_status:
+
+Project Development Status
+--------------------------
We intend to follow `PEP 440 `_,
or (older) `SemVer `_ versioning principles.
This means the version string has the basic form **"major.minor.bugfix[special-types]"**.
-Current release version is at **"v0.1"**.
+Current release version is at **"v0.2"**.
-This is a first complete implementation,
-with functional operational of all public APIs.
+This is a complete implementation, with functional operational of all public APIs.
The code is however still experimental, and APIs are not stable
(hence no major version yet).
+.. _change_notes:
Change Notes
------------
+Summary of key features by release number
Unreleased
^^^^^^^^^^
diff --git a/docs/details/character_handling.rst b/docs/details/character_handling.rst
new file mode 100644
index 0000000..fcd29bf
--- /dev/null
+++ b/docs/details/character_handling.rst
@@ -0,0 +1,61 @@
+.. _string-and-character-data:
+
+Character and String Data Handling
+----------------------------------
+NetCDF can contain string and character data in at least 3 different contexts :
+
+Characters in Data Component Names
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+That is, names of groups, variables, attributes or dimensions.
+Component names in the API are just native Python strings.
+
+Since NetCDF version 4, the names of components within files are fully unicode
+compliant, using UTF-8.
+
+These names can use virtually **any** characters, with the exception of the forward
+slash "/", since in some technical cases a component name needs to specified as a
+"path-like" compound.
+
+
+Characters in Attribute Values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Character data in string *attribute* values can likewise be read and written simply as
+Python strings.
+
+However they are actually *stored* in an :class:`~ncdata.NcAttribute`'s
+``.value`` as a character array of dtype "`.
+
+.. warning::
+
+ The netCDF4 package will perform automatic character encoding/decoding of a
+ character variable if it has a special ``_Encoding`` attribute. Ncdata does not
+ currently allow for this. See : :ref:`known-issues`
+
diff --git a/docs/details/details_index.rst b/docs/details/details_index.rst
index c3864b3..3e77dc0 100644
--- a/docs/details/details_index.rst
+++ b/docs/details/details_index.rst
@@ -1,9 +1,14 @@
Detail Topics
=============
+Detail reference topics
+
.. toctree::
:maxdepth: 2
+ ../change_log
+ ./known_issues
./interface_support
+ ./character_handling
./threadlock_sharing
./developer_notes
diff --git a/docs/details/developer_notes.rst b/docs/details/developer_notes.rst
index 23c0708..7271394 100644
--- a/docs/details/developer_notes.rst
+++ b/docs/details/developer_notes.rst
@@ -28,6 +28,12 @@ Documentation build
Release actions
---------------
+#. Update the :ref:`change_log` page in the details section
+
+ #. ensure all major changes + PRs are referenced in the :ref:`change_notes` section
+
+ #. update the "latest version" stated in the :ref:`development_status` section
+
#. Cut a release on GitHub : this triggers a new docs version on [ReadTheDocs](https://readthedocs.org/projects/ncdata/)
#. Build the distribution
diff --git a/docs/details/interface_support.rst b/docs/details/interface_support.rst
index f2fedcc..091582b 100644
--- a/docs/details/interface_support.rst
+++ b/docs/details/interface_support.rst
@@ -14,43 +14,59 @@ Datatypes
^^^^^^^^^
Ncdata supports all the regular datatypes of netcdf, but *not* the
variable-length and user-defined datatypes.
+Please see : :ref:`data-types`.
-This means, notably, that all string variables will have the basic numpy type
-'S1', equivalent to netcdf 'NC_CHAR'. Thus, multi-character string variables
-must always have a definite "string-length" dimension.
-Attribute values, by contrast, are treated as Python strings with the normal
-variable length support. Their basic dtype can be any numpy string dtype,
-but will be converted when required.
-
-The NetCDF C library and netCDF4-python do not support arrays of strings in
-attributes, so neither does NcData.
-
-
-Data Scaling, Masking and Compression
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Ncdata does not implement scaling and offset within data arrays : The ".data"
+Data Scaling and Masking
+^^^^^^^^^^^^^^^^^^^^^^^^
+Ncdata does not implement scaling and offset within variable data arrays : The ".data"
array has the actual variable dtype, and the "scale_factor" and
"add_offset" attributes are treated like any other attribute.
-The existence of a "_FillValue" attribute controls how.. TODO
+Likewise, Ncdata does not use masking within its variable data arrays, so that variable
+data arrays contain "raw" data, which include any "fill" values -- i.e. at any missing
+data points you will have a "fill" value rather than a masked point.
+
+The use of "scale_factor", "add_offset" and "_FillValue" attributes are standard
+conventions described in the NetCDF documentation itself, and implemented by NetCDF
+library software including the Python netCDF4 library. To ignore these default
+interpretations, ncdata has to actually turn these features "off". The rationale for
+this, however, is that the low-level unprocessed data content, equivalent to actual
+file storage, may be more likely to form a stable common basis of equivalence, particularly
+between different system architectures.
+.. _file-storage:
+
File storage control
^^^^^^^^^^^^^^^^^^^^
The :func:`ncdata.netcdf4.to_nc4` cannot control compression or storage options
provided by :meth:`netCDF4.Dataset.createVariable`, which means you can't
control the data compression and translation facilities of the NetCDF file
library.
-If required, you should use :mod:`iris` or :mod:`xarray` for this.
+If required, you should use :mod:`iris` or :mod:`xarray` for this, i.e. use
+:meth:`xarray.Dataset.to_netcdf` or :func:`iris.save` instead of
+:func:`ncdata.netcdf4.to_nc4`, as these provide more special options for controlling
+netcdf file creation.
+
+File-specific storage aspects, such as chunking, data-paths or compression
+strategies, are not recorded in the core objects. However, array representations in
+variable and attribute data (notably dask lazy arrays) may hold such information.
+
+The concept of "unlimited" dimensions is also, you might think, outside the abstract
+model of NetCDF data and not of concern to Ncdata . However, in fact this concept is
+present as a core property of dimensions in the classic NetCDF data model (see
+"Dimension" in the `NetCDF Classic Data Model`_), so that is why it **is** an essential
+property of an NcDimension also.
Dask chunking control
^^^^^^^^^^^^^^^^^^^^^
Loading from netcdf files generates variables whose data arrays are all Dask
lazy arrays. These are created with the "chunks='auto'" setting.
-There is currently no control for this : If required, load via Iris or Xarray
-instead.
+
+However there is a simple per-dimension chunking control available on loading.
+See :func:`ncdata.netcdf4.from_nc4`.
Xarray Compatibility
@@ -94,3 +110,4 @@ see : `support added in v3.7.0 `_
+* in conversion to/from netCDF4 files
+
+ * netCDF4 performs automatic encoding/decoding of byte data to characters, triggered
+ by the existence of an ``_Encoding`` attribute on a character type variable.
+ Ncdata does not currently account for this, and may fail to read/write correctly.
+
+
+.. _todo:
+
+Incomplete Documentation
+^^^^^^^^^^^^^^^^^^^^^^^^
+(PLACEHOLDER: documentation is incomplete, please fix me !)
+
Identified Design Limitations
-----------------------------
@@ -36,7 +51,7 @@ There are no current plans to address these, but could be considered in future
* notably, includes compound and variable-length types
* ..and especially **variable-length strings in variables**.
- see : :ref:`string_and_character_data`
+ see : :ref:`string-and-character-data`, :ref:`data-types`
Features planned
diff --git a/docs/details/threadlock_sharing.rst b/docs/details/threadlock_sharing.rst
index 9fae3cc..f07e026 100644
--- a/docs/details/threadlock_sharing.rst
+++ b/docs/details/threadlock_sharing.rst
@@ -1,30 +1,23 @@
+.. _thread-safety:
+
NetCDF Thread Locking
=====================
-Ncdata includes support for "unifying" the thread-safety mechanisms between
-ncdata and the format packages it supports (Iris and Ncdata).
+Ncdata provides the :mod:`ncdata.threadlock_sharing` module, which can ensure that all
+multiple relevant data-format packages use a "unified" thread-safety mechanism to
+prevent them disturbing each other.
This concerns the safe use of the common NetCDF library by multiple threads.
Such multi-threaded access usually occurs when your code has Dask arrays
created from netcdf file data, which it is either computing or storing to an
output netcdf file.
-The netCDF4 package (and the underlying C library) does not implement any
-threadlock, neither is it thread-safe (re-entrant) by design.
-Thus contention is possible unless controlled by the calling packages.
-*Each* of the data-format packages (Ncdata, Iris and Xarray) defines its own
-locking mechanism to prevent overlapping calls into the netcdf library.
-
-All 3 data-format packages can map variable data into Dask lazy arrays. Iris and
-Xarray can also create delayed write operations (but ncdata currently does not).
-
-However, those mechanisms cannot protect an operation of that package from
-overlapping with one in *another* package.
+In short, this is not needed when all your data is loaded with only **one** of the data
+packages (Iris, Xarray or ncdata). The problem only occurs when you try to
+realise/calculate/save results which combine data loaded from a mixture of sources.
-The :mod:`ncdata.threadlock_sharing` module can ensure that all of the relevant
-packages use the *same* thread lock,
-so that they can safely co-operate in parallel operations.
+sample code:
-sample code::
+.. code-block:: python
from ncdata.threadlock_sharing import enable_lockshare, disable_lockshare
from ncdata.xarray import from_xarray
@@ -40,7 +33,9 @@ sample code::
disable_lockshare()
-or::
+... *or* ...
+
+.. code-block:: python
with lockshare_context(iris=True):
ncdata = NcData(source_filepath)
@@ -48,3 +43,39 @@ or::
cubes = ncdata.iris.to_iris(ncdata)
iris.save(cubes, output_filepath)
+
+Background
+^^^^^^^^^^
+In practice, Iris, Xarray and Ncdata are all capable of scanning netCDF files and interpreting their metadata, while
+not reading all the core variable data contained in them.
+
+This generates objects containing Dask :class:`~dask.array.Array`\s, which provide
+deferred access to bulk data in files, with certain key benefits :
+
+* no data loading or calculation happens until needed
+* the work is divided into sectional "tasks", of which only some may ultimately be needed
+* it may be possible to perform multiple sections of calculation (including data fetch) in parallel
+* it may be possible to localise operations (fetch or calculate) near to data distributed across a cluster
+
+Usually, the most efficient parallelisation of array operations is by multi-threading, since that can use memory
+sharing of large data arrays in memory.
+
+However, the python netCDF4 library (and the underlying C library) is not threadsafe
+(re-entrant) by design, neither does it implement any thread locking itself, therefore
+the “netcdf fetch” call in each input operation must be guarded by a mutex.
+Thus, contention is possible unless controlled by the calling packages.
+
+Each of Xarray, Iris and ncdata create input data tasks to fetch sections of data from
+the input files. Each uses a mutex lock around netcdf accesses in those tasks, to stop
+them accessing the netCDF4 interface at the same time as any of the others.
+
+This works beautifully until ncdata connects (for example) lazy data loaded *with Iris*
+with lazy data loaded *from Xarray*. These would then unfortunately each be using their
+own *separate* mutexes to protect the same netcdf library. So, if we then attempt to
+calculate or save the result, which combines data from both sources, we could get
+sporadic and unpredictable system-level errors, even a core-dump type failure.
+
+So, the function of :mod:`ncdata.threadlock_sharing` is to connect the thread-locking
+schemes of the separate libraries, so that they cannot accidentally overlap an access
+call in a different thread *from the other package*, just as they already cannot
+overlap *one of their own*.
diff --git a/docs/index.rst b/docs/index.rst
index 09246a4..c161708 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -38,8 +38,9 @@ User Documentation
User Guide <./userdocs/user_guide/user_guide>
-Reference
----------
+Reference Documentation
+-----------------------
+
.. toctree::
:maxdepth: 2
diff --git a/docs/userdocs/getting_started/installation.rst b/docs/userdocs/getting_started/installation.rst
index 2be9443..1f5df84 100644
--- a/docs/userdocs/getting_started/installation.rst
+++ b/docs/userdocs/getting_started/installation.rst
@@ -4,13 +4,28 @@ Ncdata is available on PyPI and conda-forge
Install from conda-forge with conda
-----------------------------------
-Like this::
- conda install -c conda-forge ncdata
+Like this:
+
+.. code-block:: bash
+
+ $ conda install -c conda-forge ncdata
Install from PyPI with pip
--------------------------
-Like this::
+Like this:
+
+.. code-block:: bash
+
pip install ncdata
+Check install
+^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+ $ python -c "from ncdata import NcData; print(NcData())"
+
+ >
+
diff --git a/docs/userdocs/getting_started/introduction.rst b/docs/userdocs/getting_started/introduction.rst
index fec4b00..d89f743 100644
--- a/docs/userdocs/getting_started/introduction.rst
+++ b/docs/userdocs/getting_started/introduction.rst
@@ -21,7 +21,7 @@ The following code snippets demonstrate the absolute basics.
Likewise, internal consistency is not checked, so it is possible to create
data that cannot be stored in an actual file.
- See :func:`ncdata.utils.save_errors`.
+ See :ref:`correctness-checks`.
We may revisit this in later releases to make data manipulation "safer".
@@ -31,7 +31,9 @@ Simple data creation
The :class:`ncdata.NcData` object is the basic container, representing
a dataset or group. It contains :attr:`~ncdata.NcData.dimensions`,
:attr:`~ncdata.NcData.variables`, :attr:`~ncdata.NcData.groups`,
-and :attr:`~ncdata.NcData.attributes`::
+and :attr:`~ncdata.NcData.attributes`:
+
+.. code-block:: python
>>> from ncdata import NcData, NcDimension, NcVariable
>>> data = NcData("myname")
@@ -58,7 +60,9 @@ Getting data to+from files
The :mod:`ncdata.netcdf4` module provides simple means of reading and writing
NetCDF files via the `netcdf4-python package `_.
-Simple example::
+Simple example:
+
+.. code-block:: python
>>> from ncdata.netcdf4 import to_nc4, from_nc4
@@ -85,7 +89,9 @@ Please see `Converting between data formats`_ for more details.
Variables
^^^^^^^^^
Variables live in a :attr:`ncdata.NcData.variables` attribute,
-which behaves like a dictionary::
+which behaves like a dictionary:
+
+.. code-block:: python
>>> var = NcVariable("vx", dimensions=["x"], dtype=float)
>>> data.variables.add(var)
@@ -109,7 +115,9 @@ which behaves like a dictionary::
Attributes
^^^^^^^^^^
Variables live in the ``attributes`` property of a :class:`~ncdata.NcData`
-or :class:`~ncdata.Variable`::
+or :class:`~ncdata.NcVariable`:
+
+.. code-block:: python
>>> var.set_attrval('a', 1)
NcAttribute('a', 1)
@@ -150,7 +158,9 @@ and :meth:`~ncdata.NcVariable.get_attrval` of NcData/NcVariable.
Deletion and Renaming
^^^^^^^^^^^^^^^^^^^^^
-Use python 'del' operation to remove::
+Use python 'del' operation to remove:
+
+.. code-block:: python
>>> del var.attributes['a']
>>> print(var)
@@ -158,7 +168,9 @@ Use python 'del' operation to remove::
vx:b = 'this'
>
-There is also a 'rename' method of variables/attributes/groups::
+There is also a 'rename' method of variables/attributes/groups:
+
+.. code-block:: python
>>> var.attributes.rename("b", "qq")
>>> print(var)
@@ -177,13 +189,12 @@ There is also a 'rename' method of variables/attributes/groups::
>
>
-.. _renaming_dimensions:
.. warning::
Renaming a :class:`~ncdata.NcDimension` within a :class:`~ncdata.NcData`
- does *not* adjust the variables which reference it, since a variables'
+ does *not* adjust the variables which reference it, since a variable's
:attr:`~ncdata.NcVariable.dimensions` is a simple list of names.
- See : `renaming_dimensions`_ , also :func:`ncdata.utils.save_errors`.
+ See : :ref:`howto_rename_dimension` , also :func:`ncdata.utils.save_errors`.
Converting between data formats
@@ -217,21 +228,31 @@ at :ref:`interface_support`.
Example code snippets :
+.. code-block:: python
+
>>> from ndata.threadlock_sharing import enable_lockshare
>>> enable_lockshare(iris=True, xarray=True)
+.. code-block:: python
+
>>> from ncdata.netcdf import from_nc4
>>> ncdata = from_nc4("datapath.nc")
+.. code-block:: python
+
>>> from ncdata.iris import to_iris, from_iris
>>> xx, yy = to_iris(ncdata, ['x_wind', 'y_wind'])
>>> vv = (xx * xx + yy * yy) ** 0.5
>>> vv.units = xx.units
+.. code-block:: python
+
>>> from ncdata.xarray import to_xarray
>>> xrds = to_xarray(from_iris(vv))
>>> xrds.to_zarr(out_path)
+.. code-block:: python
+
>>> from ncdata.iris_xarray import cubes_from_xarray
>>> vv2 = cubes_from_xarray(xrds)
>>> assert vv2 == vv
@@ -246,10 +267,12 @@ Thread safety
prevent possible errors when computing or saving lazy data.
For example:
+ .. code-block:: python
+
>>> from ndata.threadlock_sharing import enable_lockshare
>>> enable_lockshare(iris=True, xarray=True)
- See details at :mod:`ncdata.threadlock_sharing`
+ See details at :ref:`thread_safety`.
Working with NetCDF files
diff --git a/docs/userdocs/user_guide/common_operations.rst b/docs/userdocs/user_guide/common_operations.rst
new file mode 100644
index 0000000..9aa4cc8
--- /dev/null
+++ b/docs/userdocs/user_guide/common_operations.rst
@@ -0,0 +1,147 @@
+.. _common_operations:
+
+Common Operations
+=================
+A group of common operations are available on all the core component types,
+i.e. the operations of extract/remove/insert/rename/copy on the ``.dimensions``,
+``.variables``, ``.attributes`` and ``.groups`` properties of core objects.
+
+Most of these are hopefully "obvious" Pythonic methods of the container objects.
+
+Extract and Remove
+------------------
+These are implemented as :meth:`~ncdata.NameMap.__delitem__` and :meth:`~ncdata.NameMap.pop`
+methods, which work in the usual way.
+
+Examples :
+
+* ``var_x = dataset.variables.pop("x")``
+* ``del data.variables["x"]``
+
+Insert / Add
+------------
+A new content (component) can be added under its own name with the
+:meth:`~ncdata.NameMap.add` method.
+
+Example : ``dataset.variables.add(NcVariable("x", dimensions=["x"], data=my_data))``
+
+An :meth:`~ncdata.NcAttribute` can also be added or set (if already present) with the special
+:meth:`~ncdata.NameMap.set_attrval` method.
+
+Example : ``dataset.variables["x"].set_attrval("units", "m s-1")``
+
+Rename
+------
+A component can be renamed with the :meth:`~ncdata.NameMap.rename` method. This changes
+both the name in the container **and** the component's own name -- it is not recommended
+ever to set ``component.name`` directly, as this obviously can become inconsistent.
+
+Example : ``dataset.variables.rename("x", "y")``
+
+.. warning::
+ Renaming a dimension will not rename references to it (i.e. in variables), which
+ obviously may cause problems.
+ We may add a utility to do this safely in future.
+
+Copying
+-------
+All core objects support a ``.copy()`` method. See for instance
+:meth:`ncdata.NcData.copy`.
+
+These however do *not* copy variable data arrays (either real or lazy), but produce new
+(copied) variables referencing the same arrays. So, for example:
+
+.. code-block::
+
+ >>> Construct a simple test dataset
+ >>> ds = NcData(
+ ... dimensions=[NcDimension('x', 12)],
+ ... variables=[NcVariable('vx', ['x'], np.ones(12))]
+ ... )
+
+ >>> # Make a copy
+ >>> ds_copy = ds.copy()
+
+ >>> # The new dataset has a new matching variable with a matching data array
+ >>> # The variables are different ..
+ >>> ds_copy.variables['vx'] is ds.variables['vx']
+ False
+ >>> # ... but the arrays are THE SAME ARRAY
+ >>> ds_copy.variables['vx'].data is ds.variables['vx'].data
+ True
+
+ >>> # So changing one actually CHANGES THE OTHER ...
+ >>> ds.variables['vx'].data[6:] = 777
+ >>> ds_copy.variables['vx'].data
+ array([1., 1., 1., 1., 1., 1., 777., 777., 777., 777., 777., 777.])
+
+If needed you can of course replace variable data with copies yourself, since you can
+freely assign to ``.data``.
+For real data, this is just ``var.data = var.data.copy()``.
+
+There is also a utility function :func:`ncdata.utils.ncdata_copy` : This is
+effectively the same thing as the NcData object :meth:`~ncdata.NcData.copy` method.
+
+
+Equality Checking
+-----------------
+We provide a simple, comprehensive ``==`` check for :mod:`~ncdata.NcDimension` and
+:mod:`~ncdata.NcAttribute` objects, but not at present :mod:`~ncdata.NcVariable` or
+:mod:`~ncdata.NcData`.
+
+So, using ``==`` on :mod:`~ncdata.NcVariable` or :mod:`~ncdata.NcData` objects
+will only do an identity check -- that is, it tests ``id(A) == id(B)``, or ``A is B``.
+
+However, these objects **can** be properly compared with the dataset comparison
+utilities, :func:`ncdata.utils.dataset_differences` and
+:func:`ncdata.utils.variable_differences`. By default, these operations are very
+comprehensive and may be very costly for instance comparing large data arrays, but they
+also allow more nuanced and controllable checking, e.g. to skip data array comparisons
+or ignore variable ordering.
+
+
+Object Creation
+---------------
+The constructors should allow reasonably readable inline creation of data.
+See here : :ref:`data-constructors`
+
+Ncdata is deliberately not very fussy about 'correctness', since it is not tied to an actual
+dataset which must "make sense". see : :ref:`correctness-checks` .
+
+Hence, there is no great need to install things in the 'right' order (e.g. dimensions
+before variables which need them). You can create objects in one go, like this :
+
+.. code-block::
+
+ data = NcData(
+ dimensions=[
+ NcDimension("y", 2),
+ NcDimension("x", 3),
+ ],
+ variables=[
+ NcVariable("y", dimensions=["y"], data=[10, 11]),
+ NcVariable("x", dimensions=["y"], data=[20, 21, 22]),
+ NcVariable("dd", dimensions=["y", "x"], data=[[0, 1, 2], [3, 4, 5]])
+ ]
+ )
+
+
+or iteratively, like this :
+
+.. code-block::
+
+ data = NcData()
+ dims = [("y", 2), ("x", 3)]
+ data.variables.addall([
+ NcVariable(nn, dimensions=[nn], data=np.arange(ll))
+ for ll, nn in dims
+ ])
+ data.variables.add(
+ NcVariable("dd", dimensions=["y", "x"],
+ data=np.arange(6).reshape(2,3))
+ )
+ data.dimensions.addall([NcDimension(nn, ll) for nn, ll in dims])
+
+Note : here, the variables were created before the dimensions
+
+
diff --git a/docs/userdocs/user_guide/data_objects.rst b/docs/userdocs/user_guide/data_objects.rst
new file mode 100644
index 0000000..dc81ab4
--- /dev/null
+++ b/docs/userdocs/user_guide/data_objects.rst
@@ -0,0 +1,269 @@
+Core Data Objects
+=================
+Ncdata uses Python objects to represent netCDF data, and allows the user to freely
+inspect and/or modify it, aiming to do this in the most natural and pythonic way.
+
+.. _data-model:
+
+Data Classes
+------------
+The data model components are elements of the
+`NetCDF Classic Data Model`_ , plus **groups** (from the
+`"enhanced" netCDF data model`_ ).
+
+That is, a Dataset(File) consists of just Dimensions, Variables, Attributes and
+Groups.
+
+.. note::
+ We are not, as yet, explicitly supporting the NetCDF4 extensions to variable-length
+ and user types. See : :ref:`data-types`
+
+The core ncdata classes representing these Data Model components are
+:class:`~ncdata.NcData`, :class:`~ncdata.NcDimension`, :class:`~ncdata.NcVariable` and
+:class:`~ncdata.NcAttribute`.
+
+Notes :
+
+* There is no "NcGroup" class : :class:`~ncdata.NcData` is used for both the "group" and
+ "dataset" (aka file).
+
+* All data objects have a ``.name`` property, but this can be empty (``None``) when it is not
+ contained in a parent object as a component. See :ref:`components-and-containers`,
+ below.
+
+
+:class:`~ncdata.NcData`
+^^^^^^^^^^^^^^^^^^^^^^^
+This represents a dataset containing variables, dimensions, attributes and groups.
+It is also used to represent groups.
+
+:class:`~ncdata.NcDimension`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This represents a dimension, defined in terms of name, length, and whether "unlimited"
+(or not).
+
+:class:`~ncdata.NcVariable`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Represents a data variable, with dimensions and, optionally, data and attributes.
+
+Note that ``.dimensions`` is simply a list of names (strings) : they are not
+:class:`~ncdata.NcDimension` objects, and not linked to actual dimensions of the
+dataset, so *actual* dimensions are only identified dynamically, when they need to be.
+
+Variables can be created with either real (numpy) or lazy (dask) arrays, or no data at
+all.
+
+A variable has a ``.dtype``, which may be set if creating with no data.
+However, at present, after creation ``.data`` and ``.dtype`` can be reassigned and there
+is no further checking of any sort.
+
+.. _variable-dtypes:
+
+Variable Data Arrays
+""""""""""""""""""""
+When a variable does have a ``.data`` property, this will be an array, with at least
+the usual ``shape``, ``dtype`` and ``__getitem__`` properties. In practice we assume
+for now that we will always have real (numpy) or lazy (dask) arrays.
+
+When data is exchanged with an actual file, it is simply written if real, and streamed
+(via :meth:`dask.array.store`) if lazy.
+
+When data is exchanged with supported data analysis packages (i.e. Iris or Xarray, so
+far), these arrays are transferred directly without copying or making duplicates (such
+as numpy views).
+This is a core principle (see :ref:`design-principles`), but may require special support in
+those packages.
+
+See also : :ref:`data-types`
+
+:class:`~ncdata.NcAttribute`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Represents an attribute, with name and value. The value is always either a scalar
+or a 1-D numpy array -- this is enforced as a computed property (read and write).
+
+.. _attribute-dtypes:
+
+Attribute Values
+""""""""""""""""
+In actual netCDF data, the value of an attribute is effectively limited to a one-dimensional
+array of certain valid netCDF types, and one-element arrays are exactly equivalent to scalar values.
+
+The ``.value`` of an :class:`ncdata.NcAttribute` must always be a numpy scalar or 1-dimensional array.
+
+When assigning a ``.value``, or creating a new :class:`ncdata.NcAttribute`, the value
+is cast with :func:`numpy.asanyarray`, and if this fails, or yields a multidimensional array
+then an error is raised.
+
+When *reading* attributes, for consistent results it is best to use the
+:meth:`ncdata.NcVariable.get_attrval` method or (equivalently) :meth:`ncdata.NcAttribute.as_python_value` :
+These return either ``None`` (if missing); a numpy scalar; or array; or a Python string.
+These are intended to be equivalent to what you would get from storing in an actual file and reading back,
+including re-interpreting a length-one vector as a scalar value.
+
+.. attention::
+ The correct handling and (future) discrimination of attribute values which are character arrays
+ ("char" in netCDF terms) and/or variable-length strings ("string" type) is still to be determined.
+ ( We do not yet properly support any variable-length types. )
+
+ For now, we are simply converting **all** string-like attributes by
+ :meth:`ncdata.NcAttribute.as_python_value` to python strings.
+
+See also : :ref:`data-types`
+
+.. _correctness-checks:
+
+Correctness and Consistency
+---------------------------
+In order to allow flexibility in construction and manipulation, it is not practical
+for ncdata structures to represent valid netCDF at all times, since this would make
+changing things awkward.
+For example, if a group refers to a dimension *outside* the group, strict correctness
+would not allow you to simply extract it from the dataset, because it is not valid in isolation.
+Thus, we do allow ncdata structures to represent *invalid* netCDF data.
+For example, circular references, missing dimensions or naming mismatches.
+
+In practice, there are a minimal set of rules which apply when initially creating
+ncdata objects, and additional requirements which apply when creating actual netCDF files.
+For example, a variable can be initially created with no data. But if subsequently written
+to a file, some data must be defined.
+
+The full set of data validity rules are summarised in the
+:func:`ncdata.utils.save_errors` routine.
+
+.. Note::
+ These issues are not necessarily all fully resolved. Caution required !
+
+.. _components-and-containers:
+
+Components, Containers and Names
+--------------------------------
+Each dimension, variable, attribute or group normally exists as a component in a
+parent dataset (or group), where it is stored in a "container" property of the parent,
+i.e. either its ``.dimensions``, ``.variables``, ``.attributes`` or ``.groups``.
+
+Each of the "container" properties is a :class:`~ncdata._core.NameMap` object, which
+is a dictionary type mapping a string (name) to a specific type of components.
+The dictionary ``.keys()`` are a sequence of component names, and its ``.values()`` are
+the corresponding contained components.
+
+Every component object also has a ``.name`` property. By this, it is implicit that you
+**could** have a difference between the name by which the object is indexed in its
+container, and its ``.name``. This is to be avoided !
+
+The :meth:`~ncdata.NameMap` container class is provided with convenience methods which
+aim to make this easier, such as :meth:`~ncdata.NameMap.add` and
+:meth:`~ncdata.NameMap.rename`.
+
+NcData and NcVariable ".attributes" components
+----------------------------------------------
+Note that the contents of a ".attributes" are :class:`~ncdata.NcAttributes` objects,
+not attribute values.
+
+Thus to fetch an attribute you might write, for example one of these :
+
+.. code-block::
+
+ units1 = dataset.variables['var1'].get_attrval('units')
+ units1 = dataset.variables['var1'].attributes['units'].as_python_value()
+
+but **not** ``unit = dataset.variables['x'].attributes['attr1']``
+
+Or, likewise, to **set** values, one of
+
+.. code-block::
+
+ dataset.variables['var1'].set_attrval('units', "K")
+ dataset.variables['var1'].attributes['units'] = NcAttribute("units", K)
+
+but **not** ``dataset.variables['x'].attributes['units'].value = "K"``
+
+
+.. _container-ordering:
+
+Container ordering
+------------------
+The order of elements of a container is technically significant, and does constitute a
+potential difference between datasets (or files).
+
+The :meth:`ncdata.NameMap.rename` method preserves the order of an element,
+while :meth:`ncdata.NameMap.add` adds the new components at the end.
+
+The :func:`ncdata.utils.dataset_differences` utility provides various keywords allowing
+you to ignore ordering in comparisons, when required.
+
+
+Container methods
+-----------------
+The :class:`~ncdata.NameMap` class also provides a variety of manipulation methods,
+both normal dictionary operations and some extra ones.
+
+The most notable ones are : ``del``, ``pop``, ``add``, ``addall``, ``rename`` and of
+course ``__setitem__`` .
+
+See :ref:`common_operations` section.
+
+.. _data-constructors:
+
+Core Object Constructors
+------------------------
+The ``__init__`` methods of the core classes are designed to make in-line definition of
+new objects in user code reasonably legible. So, when initialising one of the container
+properties, the keyword/args defining component parts use the utility method
+:meth:`ncdata.NameMap.from_items` so that you can specify a group of components in a variety of ways :
+either a pre-created container or a similar dictionary-like object :
+
+.. code-block:: python
+
+ >>> ds1 = NcData(groups={
+ ... 'x':NcData('x'),
+ ... 'y':NcData('y')
+ ... })
+ >>> print(ds1)
+
+ groups:
+
+
+ >
+
+or **more usefully**, just a *list* of suitable data objects, like this...
+
+.. code-block:: python
+
+ >>> ds2 = NcData(
+ ... variables=[
+ ... NcVariable('v1', ('x',), data=[1,2]),
+ ... NcVariable('v2', ('x',), data=[2,3])
+ ... ]
+ ... )
+ >>> print(ds2)
+
+ variables:
+
+
+ >
+
+Or, in the **special case of attributes**, a regular dictionary of ``name: value`` form
+will be automatically converted to a NameMap of ``name: NcAttribute(name: value)`` :
+
+.. code-block:: python
+
+ >>> var = NcVariable(
+ ... 'v3',
+ ... attributes={'x':'this', 'b':1.4, 'arr': [1, 2, 3]}
+ ... )
+ >>> print(var)
+ ): v3()
+ v3:x = 'this'
+ v3:b = 1.4,
+ v3:arr = array([1, 2, 3])
+ >
+
+
+Relationship to File Storage
+----------------------------
+See :ref:`file-storage`
+
+.. _NetCDF Classic Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#classic_model
+.. _"enhanced" netCDF data model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#enhanced_model
\ No newline at end of file
diff --git a/docs/userdocs/user_guide/design_principles.rst b/docs/userdocs/user_guide/design_principles.rst
index 657483a..3ac2e19 100644
--- a/docs/userdocs/user_guide/design_principles.rst
+++ b/docs/userdocs/user_guide/design_principles.rst
@@ -11,6 +11,7 @@ Purpose
* allow analysis packages (Iris, Xarray) to exchange data efficiently,
including lazy data operations and streaming
+.. _design-principles:
Design Principles
-----------------
diff --git a/docs/userdocs/user_guide/general_topics.rst b/docs/userdocs/user_guide/general_topics.rst
new file mode 100644
index 0000000..ea890aa
--- /dev/null
+++ b/docs/userdocs/user_guide/general_topics.rst
@@ -0,0 +1,86 @@
+.. _general_topics:
+
+General Topics
+==============
+Odd discussion topics relating to core ncdata classes + data management
+
+Validity Checking
+-----------------
+See : :ref:`correctness-checks`
+
+
+.. _data-types:
+
+Data Types (dtypes)
+-------------------
+:ref:`Variable data ` and :ref:`attribute values `
+all use a subset of numpy **dtypes**, compatible with netcdf datatypes.
+These are effectively those defined by `netcdf4-python `_, and this
+therefore also effectively determines what we see in `dask arrays `_ .
+
+However, at present ncdata directly supports only the so-called "Primitive Types" of the NetCDF "Enhanced Data Model".
+So, it does **not** include the user-defined, enumerated or variable-length datatypes.
+
+.. attention::
+
+ In practice, we have found that at least variables of the variable-length "string" datatype **do** seem to function
+ correctly at present, but this is not officially supported, and not currently tested.
+
+ See also : :ref:`howto_load_variablewidth_strings`
+
+ We hope to extend support to the more general `NetCDF Enhanced Data Model`_ in future
+
+
+For reference, the currently supported + tested datatypes are :
+
+* unsigned byte = numpy "u1"
+* unsigned short = numpy "u2"
+* unsigned int = numpy "u4"
+* unsigned int64 = numpy "u4"
+* byte = numpy "i1"
+* short = numpy "i2"
+* int = numpy "i4"
+* int64 = numpy "i8"
+* float = numpy "f4"
+* double = numpy "f8"
+* char = numpy "S1"
+
+
+Character and String Data
+-------------------------
+String and character data occurs in at least 3 different places :
+
+1. in names of components (e.g. variables)
+2. in string attributes
+3. in character-array data variables
+
+Very briefly :
+
+* types (1) and (2) are equivalent to Python strings and may include unicode
+* type (3) are equivalent to character (byte) arrays, and normally represent only
+ fixed-length strings with the length being given as a file dimension.
+
+NetCDF4 does also have provision for variable-length strings as an elemental type,
+which you can have arrays of, but ncdata does not yet properly support this.
+
+For more details, please see : :ref:`string-and-character-data`
+
+
+.. _thread_safety:
+
+Thread Safety
+-------------
+Whenever you combine variable data loaded using more than **one** data-format package
+(i.e. at present, Iris and Xarray and Ncdata itself), you can potentially get
+multi-threading contention errors in netCDF4 library access. This may result in
+problems ranging from sporadic value changes to a segmentation faults or other system
+errors.
+
+In these cases you should always to use the :mod:`ncdata.threadlock_sharing` module to
+avoid such problems. See :ref:`thread-safety`.
+
+
+.. _NetCDF Classic Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#classic_model
+
+.. _NetCDF Enhanced Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#enhanced_model
+
diff --git a/docs/userdocs/user_guide/howtos.rst b/docs/userdocs/user_guide/howtos.rst
new file mode 100644
index 0000000..74caa5f
--- /dev/null
+++ b/docs/userdocs/user_guide/howtos.rst
@@ -0,0 +1,643 @@
+How-To Questions
+================
+Short goal-focussed descriptions of how to achieve specific things.
+These are mostly presented as example code snippets, but also link to other
+documentation to describe concepts and technical details.
+
+**"Why Not Just..."** sections highlight warnings for what *not* to do,
+i.e. wrong turns and gotchas, with brief descriptions of why.
+
+
+.. _howto_access:
+
+Access a variable, dimension, attribute or group
+------------------------------------------------
+Index by component names to get the object which represents a particular element.
+
+.. code-block:: python
+
+ >>> dataset.attributes["experiment"]
+ NcAttribute("'experiment', 'A301.7')
+ >>> dataset.dimensions["x"]
+ NcDimension('x', 3)
+ >>> dataset.variables['vx'].attributes['units']
+ NcAttribute("'unit', 'm s-1')
+
+Variable, attributes, dimensions and sub-groups are all stored by name like this,
+in a parent property which is a "component container" dictionary.
+
+.. Warning::
+
+ The :attr:`~ncdata.NcVariable.dimensions` property of a :class:`~ncdata.NcVariable`
+ is different : it is *not* a dictionary of :class:`~ncdata.NcDimension` objects,
+ but just a *list of dimension names*.
+
+
+.. _howto_add_something:
+
+Add a variable, dimension, attribute or group
+---------------------------------------------
+Use the :meth:`~ncdata.NameMap.add` method of a component-container property to insert
+a new item.
+
+ >>> data.dimensions.add(NcDimension("y", 4))
+ >>> data.dimensions
+ {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)}
+
+The item must be of the correct type, in this case a :class:`~ncdata.NcDimension`.
+If not, an error will be raised.
+
+.. Warning::
+
+ **Why Not Just...** ``data.dimensions["y"] = NcDimension("y", 4)`` ?
+
+ This does actually work, but the user must ensure that the dictionary key always
+ matches the name of the component added. Using :meth:`~ncdata.NameMap.add` is thus
+ safe, and actually *simpler*, since all components have a definite name anyway.
+
+
+.. _howto_remove_something:
+
+Remove a variable, dimension, attribute or group
+------------------------------------------------
+The standard Python ``del`` operator can be applied to a component property to remove
+something by its name.
+
+ >>> data.dimensions
+ {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)}
+ >>> del data.dimensions['x']
+ >>> data.dimensions
+ {'y': NcDimension('y', 3)}
+
+
+.. _howto_rename_something:
+
+Rename a variable, attribute or group
+-------------------------------------
+Use the :meth:`~ncdata.NameMap.rename` method to rename a component.
+
+.. code-block::
+
+ >>> data.dimensions
+ {'x': NcDimension('x', 3) 'y': NcDimension('y', 3)}
+ >>> data.dimensions.rename['x', 'q']
+ >>> data.dimensions
+ {'q': NcDimension('q', 3) 'y': NcDimension('y', 3)}
+
+Note that this affects both the element's container key *and* its ``.name``.
+
+
+.. Warning::
+
+ Renaming a **dimension** can cause problems, so must be done with care.
+ See :ref:`howto_rename_dimension`.
+
+.. Warning::
+
+ **Why Not Just...** ``dim = data.dimensions['x']; dim.name = "q"`` ?
+
+ This would break the expected ``key == elements[key].name`` rule.
+ We don't prevent this, but it is usually a mistake.
+ :func:`~ncdata.utils.save_errors` detects this type of problem.
+
+
+.. _howto_rename_dimension:
+
+Rename a dimension
+------------------
+Simply using ``ncdata.dimensions.rename()`` can cause problems, because you must then
+**also** replace the name where it occurs in the dimensions of any variables.
+
+.. Note::
+
+ **To-Do** : there should be a utility for this, but as yet it does not exist.
+ See `Issue#87 `_.
+
+
+.. _howto_read_attr:
+
+Read an attribute value
+-----------------------
+To get an attribute of a dataset, group or variable, use the
+:meth:`ncdata.NcData.get_attrval` or :meth:`ncdata.NcVariable.get_attrval`
+method, which returns either a single (scalar) number, a numeric array, or a string.
+
+.. code-block:: python
+
+ >>> variable.get_attrval("x")
+ 3.0
+ >>> dataset.get_attrval("context")
+ "Results from experiment A301.7"
+ >>> dataset.variables["q"].get_attrval("level_settings")
+ [1.0, 2.5, 3.7]
+
+**Given an isolated** :class:`ncdata.NcAttribute` **instance** :
+
+Its value is best read with the :meth:`ncdata.NcAttribute.get_python_value` method,
+which produces the same results as the above.
+
+ >>> variable.attributes[myname].get_python_value()
+ 3.0
+
+.. Warning::
+
+ **Why Not Just...** use ``NcAttribute.value`` ?
+
+ For example
+
+ .. code-block:: python
+
+ >>> data.variables["x"].attributes["q"].value
+ [1]
+
+ The ``.value`` is always stored as a :class:`~numpy.ndarray` array, but this is not
+ how it is stored in netCDF. The ``get_python_value()`` returns the attribute
+ as a straightforward value, compatible with what is seen in ``ncdump`` output,
+ and results from the ``netCDF4`` module.
+
+
+.. _howto_write_attr:
+
+Change an attribute value
+-------------------------
+To set an attribute of a dataset, group or variable, use the
+:meth:`ncdata.NcData.set_attrval` or :meth:`ncdata.NcVariable.set_attrval` method.
+
+All attributes are writeable, and the type can be freely changed.
+
+.. code-block:: python
+
+ >>> variable.set_attr("x", 3.)
+ >>> variable.get_attrval("x")
+ 3.0
+ >>> variable.set_attr("x", "string-value")
+ >>> variable.get_attrval("x")
+ "string-value"
+
+**Or** if you already have an attribute object in hand, you can simply set
+``attribute.value`` directly : this a property with controlled access, so the
+assigned value is cast with :func:`numpy.asarray`.
+
+For example
+
+.. code-block:: python
+
+ >>> attr = data.variables["x"].attributes["q"]
+ >>> attr.value = 4.2
+ >>> print(attr.value)
+ array(4.2)
+
+
+.. _howto_create_attr:
+
+Create an attribute
+-------------------
+To create an attribute on a dataset, group or variable, just set its value with the
+:meth:`ncdata.NcData.set_attrval` or :meth:`ncdata.NcVariable.set_attrval` method.
+This works just like :ref:`howto_write_attr` : i.e. it makes no difference whether the
+attribute already exists or not.
+
+.. code-block:: python
+
+ >>> variable.set_attr("x", 3.)
+
+.. Note::
+
+ Assigning attributes when *creating* a dataset, variable or group is somewhat
+ simpler, discussed :ref:`here `.
+
+
+.. _howto_create_variable:
+
+Create a variable
+-----------------
+Use the :meth:`NcVariable() ` constructor to create a new
+variable with a name, dimensions, and optional data and attributes.
+
+A minimal example:
+
+.. code-block:: python
+
+ >>> var = NcVariable("data", ("x_axis",))
+ >>> print(var)
+ ): data(x_axis)>
+ >>> print(var.data)
+ None
+ >>>
+
+A more rounded example, including a data array:
+
+.. code-block:: python
+
+ >>> var = NcVariable("vyx", ("y", "x"),
+ ... data=[[1, 2, 3], [0, 1, 1]],
+ ... attributes=[NcAttribute('a', 1), NcAttribute('b', 'setting=off')]
+ ... )
+ >>> print(var)
+
+ >>> print(var.data)
+ [[1 2 3]
+ [0 1 1]]
+ >>>
+
+
+
+.. _howto_access_vardata:
+
+Read or write variable data
+---------------------------
+The :attr:`~ncdata.NcVariable.data` property of a :class:`~ncdata.NcVariable` usually
+holds a data array.
+
+.. code-block:: python
+
+ >>> var.data = np.array([1, 2])
+ >>> print(var.data)
+ array([1, 2])
+
+This may be either a :class:`numpy.ndarray` (real) or a :class:`dask.array.Array`
+(lazy) array. If the data is converted from another source (file, iris or xarray),
+it is usually lazy.
+
+It can be freely overwritten by the user.
+
+.. Warning::
+
+ If not ``None``, the ``.data`` should **always** be an array of the correct shape.
+
+ The :func:`~ncdata.utils.save_errors` function checks that all variables have
+ valid dimensions, and that ``.data`` arrays match the dimensions.
+
+
+Read data from a NetCDF file
+----------------------------
+Use the :func:`ncdata.netcdf4.from_nc4` function to load a dataset from a netCDF file.
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 from_nc4
+ >>> ds = from_nc4(filepath)
+ >>> print(ds)
+
+
+
+Control chunking in a netCDF read
+---------------------------------
+Use the ``dim_chunks`` argument in the :func:`ncdata.netcdf4.from_nc4` function
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 from_nc4
+ >>> ds = from_nc4(filepath, dim_chunks={"time": 3})
+ >>> print(ds.variables["x"].data.chunksize)
+ (3,)
+
+
+Save data to a new file
+-----------------------
+Use the :func:`ncdata.netcdf4.to_nc4` function to write data to a file:
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import to_nc4
+ >>> to_nc4(data, filepath)
+
+
+Read from or write to Iris cubes
+--------------------------------
+Use :func:`ncdata.iris.to_iris` and :func:`ncdata.iris.from_iris`.
+
+.. code-block:: python
+
+ >>> from ncdata.iris import from_iris, to_iris
+ >>> cubes = iris.load(file)
+ >>> ncdata = from_iris(cubes)
+ >>>
+ >>> cubes2 = to_iris(ncdata)
+
+Note that:
+
+* :func:`ncdata.iris.to_iris` calls :func:`iris.load`
+* :func:`ncdata.iris.from_iris` calls :func:`iris.save`
+
+Extra kwargs are passed on to the iris load/save routine.
+
+Since an :class:`~ncdata.NcData` is like a complete file, or dataset, it is written to
+or read from multiple cubes, in a :class:`~iris.cube.CubeList`.
+
+
+Read from or write to Xarray datasets
+-------------------------------------
+Use :func:`ncdata.xarray.to_xarray` and :func:`ncdata.xarray.from_xarray`.
+
+.. code-block:: python
+
+ >>> from ncdata.xarray import from_xarray, to_xarray
+ >>> dataset = xarray.open_dataset(filepath)
+ >>> ncdata = from_xarray(dataset)
+ >>>
+ >>> ds2 = to_xarray(ncdata)
+
+Note that:
+
+* :func:`ncdata.xarray.to_xarray` calls :func:`xarray.Dataset.load_store`.
+
+* :func:`ncdata.xarray.from_xarray` calls :func:`xarray.Dataset.dump_to_store`
+
+Any additional kwargs are passed on to the xarray load/save routine.
+
+An NcData writes or reads as an :class:`xarray.Dataset`.
+
+
+
+Convert data directly from Iris to Xarray, or vice versa
+--------------------------------------------------------
+Use :func:`ncdata.iris_xarray.cubes_to_xarray` and
+:func:`ncdata.iris_xarray.cubes_from_xarray`.
+
+.. code-block:: python
+
+ >>> from ncdata.iris_xarray import cubes_from_xarray, cubes_to_xarray
+ >>> cubes = iris.load(filepath)
+ >>> dataset = cubes_to_xarray(cubes)
+ >>>
+ >>> cubes2 = cubes_from_xarray(dataset)
+
+These functions are simply a convenient shorthand for combined use of
+:func:`ncdata.xarray.from_xarray` then :func:`ncdata.iris.to_iris`,
+or :func:`ncdata.iris.from_iris` then :func:`ncdata.xarray.to_xarray`.
+
+Extra keyword controls for the relevant iris and xarray load and save routines can be
+passed using specific dictionary keywords, e.g.
+
+.. code-block:: python
+
+ >>> cubes = cubes_from_xarray(
+ ... dataset,
+ ... iris_load_kwargs={'constraints': 'air_temperature'},
+ ... xr_save_kwargs={'unlimited_dims': ('time',)},
+ ... )
+ ...
+
+Combine data from different input files into one output
+-------------------------------------------------------
+This can be easily done by pasting elements from two sources into one output dataset.
+
+You can freely modify a loaded dataset, since it is no longer connected to the input
+file.
+
+Just be careful that any shared dimensions match.
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import from_nc4, to_nc4
+ >>> data = from_nc4('input1.nc')
+ >>> data2 = from_nc4('input2.nc')
+ >>> # Add some known variables from file2 into file1
+ >>> wanted = ('x1', 'x2', 'x3')
+ >>> for name in wanted:
+ ... data.variables.add(data2.variables[name])
+ ...
+ >>> to_nc4(data, 'output.nc')
+
+
+Create a brand-new dataset
+--------------------------
+Use the :meth:`NcData() <~ncdata.NcData.__init__>` constructor to create a new dataset.
+
+Contents and components can be attached on creation ...
+
+.. code-block:: python
+
+ >>> data = NcData(
+ ... dimensions=[NcDimension("y", 2), NcDimension("x", 3)],
+ ... variables=[
+ ... NcVariable("y", ("y",), data=[0, 1]),
+ ... NcVariable("x", ("x",), data=[0, 1, 2]),
+ ... NcVariable(
+ ... "vyx", ("y", "x"),
+ ... data=np.zeros((2, 3)),
+ ... attributes=[
+ ... NcAttribute("long_name", "rate"),
+ ... NcAttribute("units", "m s-1")
+ ... ]
+ ... )],
+ ... attributes=[NcAttribute("history", "imaginary")]
+ ... )
+ >>> print(data)
+
+ dimensions:
+ y = 2
+ x = 3
+
+ variables:
+
+
+
+
+ global attributes:
+ :history = 'imaginary'
+ >
+ >>>
+
+... or added iteratively ...
+
+.. code-block:: python
+
+ >>> data = NcData()
+ >>> ny, nx = 2, 3
+ >>> data.dimensions.add(NcDimension("y", ny))
+ >>> data.dimensions.add(NcDimension("x", nx))
+ >>> data.variables.add(NcVariable("y", ("y",)))
+ >>> data.variables.add(NcVariable("x", ("x",)))
+ >>> data.variables.add(NcVariable("vyx", ("y", "x")))
+ >>> vx, vy, vyx = [data.variables[k] for k in ("x", "y", "vyx")]
+ >>> vx.data = np.arange(nx)
+ >>> vy.data = np.arange(ny)
+ >>> vyx.data = np.zeros((ny, nx))
+ >>> vyx.set_attrval("long_name", "rate"),
+ >>> vyx.set_attrval("units", "m s-1")
+ >>> data.set_attrval("history", "imaginary")
+
+
+Remove or rewrite specific attributes
+-------------------------------------
+Load an input dataset with :func:`ncdata.netcdf4.from_nc4`.
+
+Then you can modify, add or remove global and variable attributes at will.
+
+For example :
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import from_nc4, to_nc4
+ >>> ds = from_nc4('input.nc4')
+ >>> history = ds.get_attrval("history") if "history" in ds.attributes else ""
+ >>> ds.set_attrval("history", history + ": modified to SPEC-FIX.A")
+ >>> removes = ("grid_x", "review")
+ >>> for name in removes:
+ ... if name in ds.attributes:
+ ... del ds.attributes.[name]
+ ...
+ >>> for var in ds.variables.values():
+ ... if "coords" in var.attributes:
+ ... var.attributes.rename("coords", "coordinates") # common non-CF problem
+ ... units = var.get_attrval("units")
+ ... if units and units == "ppm":
+ ... var.set_attrval("units", "1.e-6") # another common non-CF problem
+ ...
+ >>> to_nc(ds, "output_fixed.nc")
+
+
+Save selected variables to a new file
+-------------------------------------
+Load an input dataset with :func:`ncdata.netcdf4.from_nc4`; make a new empty dataset
+with :class:`~ncdata.NcData`\ (); use ``dataset.dimensions.add()``,
+``dataset.variables.add()`` and similar to add/copy selected elements into it; then
+save it with :func:`ncdata.netcdf4.to_nc4`.
+
+For a simple case with no groups, it could look something like this:
+
+.. code-block:: python
+
+ >>> ds_in = from_nc4(input_filepath)
+ >>> ds_out = NcData()
+ >>> for varname in ('data1', 'data2', 'dimx', 'dimy'):
+ >>> var = ds_in.variables[varname]
+ >>> ds_out.variables.add(var)
+ >>> for name in var.dimensions if name not in ds_out.dimensions:
+ >>> ds_out.dimensions.add(ds_in.dimensions[dimname])
+ ...
+ >>> to_nc4(ds_out, output_filepath)
+
+Sometimes it's simpler to load the input, delete content **not** wanted, then re-save.
+It's perfectly safe to do that, since the original file will be unaffected.
+
+.. code-block:: python
+
+ >>> data = from_nc4(input_filepath)
+ >>> for name in ('extra1', 'extra2', 'unwanted'):
+ >>> del data.variables[varname]
+ ...
+ >>> del data.dimensions['pressure']
+ >>> to_nc4(data, output_filepath)
+
+
+Adjust file content before loading into Iris/Xarray
+---------------------------------------------------
+Use :func:`~ncdata.netcdf4.from_nc4`, and then :func:`~ncdata.iris.to_iris` or
+:func:`~ncdata.xarray.to_xarray`. You can thus adjust file content at the file level,
+to avoid loading problems.
+
+For example, to replace an invalid coordinate name in iris input :
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import from_nc4
+ >>> from ncdata.iris import to_iris
+ >>> ncdata = from_nc4(input_filepath)
+ >>> for var in ncdata.variables:
+ >>> coords = var.attributes.get('coordinates', "")
+ >>> if "old_varname" in coords:
+ >>> coords.replace("old_varname", "new_varname")
+ >>> var.set_attrval("coordinates", coords)
+ ...
+ >>> cubes = to_iris(ncdata)
+
+or, to replace a mis-used special attribute in xarray input :
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import from_nc4
+ >>> from ncdata.xarray import to_xarray
+ >>> ncdata = from_nc4(input_filepath)
+ >>> for var in ncdata.variables:
+ >>> if "_fillvalue" in var.attributes:
+ >>> var.attributes.rename("_fillvalue", "_FillValue")
+ ...
+ >>> cubes = to_iris(ncdata)
+
+
+Adjust Iris/Xarray save output before writing to a file
+-------------------------------------------------------
+Use :func:`~ncdata.iris.from_iris` or :func:`~ncdata.xarray.from_xarray`, and then
+:func:`~ncdata.netcdf4.to_nc4`. You can thus make changes to the saved output which
+would be difficult to overcome if first written to an actual file.
+
+For example, to force an additional unlimited dimension in iris output :
+
+.. code-block:: python
+
+ >>> from ncdata.iris import from_iris
+ >>> from ncdata.netcdf4 import to_nc4
+ >>> ncdata = from_iris(cubes)
+ >>> ncdata.dimensions['timestep'].unlimited = True
+ >>> to_nc4(ncdata, "output.nc")
+
+or, to convert xarray data variable output to masked integers :
+
+.. code-block:: python
+
+ >>> from numpy import ma
+ >>> from ncdata.iris import from_xarray
+ >>> from ncdata.netcdf4 import to_nc4
+ >>> ncdata = from_xarray(dataset)
+ >>> var = ncdata.variables['experiment']
+ >>> mask = var.data.isnan()
+ >>> data = var.data.astype(np.int16)
+ >>> data[mask] = -9999
+ >>> var.data = data
+ >>> var.set_attrval("_FillValue", -9999)
+ >>> to_nc4(ncdata, "output.nc")
+
+
+.. _howto_load_variablewidth_strings:
+
+Load a file containing variable-width string variables
+------------------------------------------------------
+You must supply a ``dim_chunks`` keyword to the :meth:`ncdata.netcdf4.from_nc4` method,
+specifying how to chunk all dimension(s) which the "string" type variable uses.
+
+.. code-block:: python
+
+ >>> from ncdata.netcdf4 import from_nc4
+ >>> # This file has a netcdf "string" type variable, with dimensions ('date',).
+ >>> # : don't chunk that dimension.
+ >>> dataset = from_nc4(filepath, dim_chunks={"date": -1})
+
+This is needed to avoid a Dask error like
+``"auto-chunking with dtype.itemsize == 0 is not supported, please pass in `chunks`
+explicitly."``
+
+When you do this, Dask returns the variable data as a numpy *object* array, containing
+Python strings. You will probably also want to (manually) convert that to something
+more tractable, to work with it effectively.
+
+For example, something like this :
+
+.. code-block:: python
+
+ >>> var = dataset.variables['name']
+ >>> data = var.data.compute()
+ >>> maxlen = max(len(s) for s in var.data)
+
+ >>> # convert to fixed-width character array
+ >>> data = np.array([[s.ljust(maxlen, "\0") for s in var.data]])
+ >>> print(data.shape, data.dtype)
+ (1010, 12) >> dataset.dimensions.add(NcDimension('name_strlen', maxlen))
+ >>> var.dimensions = var.dimensions + ("name_strlen",)
+ >>> var.data = data
diff --git a/docs/userdocs/user_guide/user_guide.rst b/docs/userdocs/user_guide/user_guide.rst
index 9b0b938..12b6cbe 100644
--- a/docs/userdocs/user_guide/user_guide.rst
+++ b/docs/userdocs/user_guide/user_guide.rst
@@ -1,25 +1,13 @@
User Documentation
==================
-Beyond the basic introduction
+Detailed explanations, beyond the basic tutorial-style introductions
(for which, see :ref:`getting_started`)
-.. warning::
- The User Guide is still very incomplete.
-
-The User Guide is still mostly work-in-progress.
-For the present, please see the following :
-
- * :ref:`Introduction `
- * tested `example scripts in the project repo `_
- * example code snippets in the `project README `_
-
-
.. toctree::
:maxdepth: 2
design_principles
- (TODO : empty) Data object descriptions
- (TODO : empty) General topics
- (TODO : empty) How-tos
- known_issues
- ../../change_log
+ data_objects
+ common_operations
+ general_topics
+ howtos
diff --git a/lib/ncdata/_core.py b/lib/ncdata/_core.py
index 71bc06d..dd61e5d 100644
--- a/lib/ncdata/_core.py
+++ b/lib/ncdata/_core.py
@@ -497,12 +497,10 @@ def copy(self):
class NcAttribute:
"""
- An object representing a netcdf variable or dataset attribute.
+ An object representing a netcdf variable, group or dataset attribute.
- Associates a name to a value which is a numpy scalar or 1-D array.
-
- We expect the value to be 0- or 1-dimensional, and an allowed dtype.
- However none of this is checked.
+ Associates a name to a value which is always a numpy scalar or 1-D array, of an
+ allowed dtype. See :ref:`attribute-dtypes`.
In an actual netcdf dataset, a "scalar" is actually just an array of length 1.
"""
@@ -511,7 +509,7 @@ def __init__(self, name: str, value): # noqa: D107
#: attribute name
self.name: str = name
# Attribute values are arraylike, have dtype
- #: attribute value
+ #: attribute value, constrained to a suitable numpy array object.
self.value: np.ndarray = value
@property
diff --git a/lib/ncdata/dataset_like.py b/lib/ncdata/dataset_like.py
index 04ca62a..55af241 100644
--- a/lib/ncdata/dataset_like.py
+++ b/lib/ncdata/dataset_like.py
@@ -1,29 +1,32 @@
r"""
-An adaptor layer making a NcData appear like a :class:`netCDF4.Dataset`.
+An adaptor layer for :mod:`ncdata` to emulate :mod:`netCDF4`.
-Allows an :class:`~ncdata.NcData` to masquerade as a
+Primarily, allows an :class:`ncdata.NcData` to masquerade as a
:class:`netCDF4.Dataset` object.
Note:
This is a low-level interface, exposed publicly for extended experimental uses.
- If you only want to convert **Iris** data to+from :class:`~ncdata.NcData`,
+ If you only want to convert **Iris** data to + from :class:`~ncdata.NcData`,
please use the functions in :mod:`ncdata.iris` instead.
----
-These classes contain :class:`~ncdata.NcData` and :class:`~ncdata.NcVariable`\\s, but
-emulate the access APIs of a :class:`netCDF4.Dataset` / :class:`netCDF4.Variable`.
+These classes contain :class:`~ncdata.NcData`, :class:`~ncdata.NcDimension`, and
+:class:`~ncdata.NcVariable` objects, but emulate the access APIs of
+:class:`netCDF4.Dataset` :class:`netCDF4.Dimension` and :class:`netCDF4.Variable`.
This is provided primarily to support a re-use of the :mod:`iris.fileformats.netcdf`
-file format load + save, to convert cubes to+from ncdata objects (and hence, especially,
-convert Iris :class:`~iris.cube.Cube`\\s to+from an Xarray :class:`~xarray.Dataset`).
+file format load + save, to convert cubes to + from ncdata objects (and hence,
+especially, to convert Iris :class:`~iris.cube.Cube`\s to + from an Xarray
+:class:`~xarray.Dataset`
+).
Notes
-----
Currently only supports what is required for Iris load/save capability.
-It *should* be possible to use these objects with other packages expecting a
-:class:`netCDF4.Dataset` object, however the API simulation is far from complete, so
-this may need to be extended, in future, to support other such uses.
+In principle, it *should* be possible to use these objects with other packages
+expecting a :class:`netCDF4.Dataset` object. However the API simulation is far from
+complete, so this module may need to be extended, in future, to support other such uses.
"""
from typing import Any, Dict, List
@@ -85,7 +88,7 @@ class Nc4DatasetLike(_Nc4DatalikeWithNcattrs):
It can be both read and written (modified) via its emulated
:class:`netCDF4.Dataset`-like API.
- The core NcData content, 'self._ncdata', is a :class:`ncdata.NcData`.
+ The core, contained content object, ``self._ncdata``, is a :class:`ncdata.NcData`.
This completely defines the parent object state.
If not provided on init, a new, empty dataset is created.
@@ -97,7 +100,7 @@ class Nc4DatasetLike(_Nc4DatalikeWithNcattrs):
file_format = "NETCDF4"
def __init__(self, ncdata: NcData = None):
- """Create an Nc4DatasetLike, wrapping an NcData."""
+ """Create an Nc4DatasetLike, wrapping an :class:`~ncdata.NcData`."""
if ncdata is None:
ncdata = NcData() # an empty dataset
#: the contained dataset. If not provided, a new, empty dataset is created.
@@ -195,18 +198,24 @@ class Nc4VariableLike(_Nc4DatalikeWithNcattrs):
"""
An object which contains a :class:`ncdata.NcVariable` and emulates a :class:`netCDF4.Variable`.
- The core NcData content, 'self._ncdata', is a :class:`NcVariable`.
+ The core, contained content object, ``self._ncdata``, is a :class:`~ncdata.NcVariable`.
This completely defines the parent object state.
- The property "_data_array" is detected by Iris to do direct data transfer
+ The property ``._data_array`` is detected by Iris to do direct data transfer
(copy-free and lazy-preserving).
+
At present, this object emulates only the *default* read/write behaviour of a
- netCDF4 Variable, i.e. the underlying NcVariable contains a 'raw' data array, and
- the _data_array property interface applies/removes any scaling and masking as it is
- "seen" from the outside.
- That suits how *Iris* reads netCFD4 data, but it won't work if the user wants to
+ :class:`netCDF4.Variable`, i.e. :
+
+ * the underlying NcVariable contains a 'raw' data array, which may be real
+ (i.e. numpy) or lazy (i.e. dask).
+ * The ``._data_array`` property read/write interface then applies/removes any
+ scaling and masking as it is to be "seen" from the outside.
+
+ That suits how *Iris* reads netCDF4 data, but it won't work if the user wants to
control the masking/saving behaviour, as you can do in netCDF4.
- Thus, at present, we do *not* provide any of set_auto_mask/scale/maskandscale.
+ Thus, at present, we do *not* provide any of the
+ ``set_auto_mask/scale/maskandscale()`` methods.
"""
@@ -447,7 +456,7 @@ class Nc4DimensionLike:
"""
An object which emulates a :class:`netCDF4.Dimension` object.
- The core NcData content, 'self._ncdata', is a :class:`ncdata.NcDimension`.
+ The core, contained content object, ``self._ncdata``, is a :class:`ncdata.NcDimension`.
This completely defines the parent object state.
"""
diff --git a/lib/ncdata/iris.py b/lib/ncdata/iris.py
index 0db0bae..cb94a67 100644
--- a/lib/ncdata/iris.py
+++ b/lib/ncdata/iris.py
@@ -1,13 +1,18 @@
r"""
Interface routines for converting data between ncdata and Iris.
-Convert :class:`~ncdata.NcData` to and from Iris :class:`~iris.cube.Cube`\\s.
-
-This uses the :class:`ncdata.dataset_like` interface ability to mimic netCDF4.Dataset
-objects, which are used like files to load and save Iris data.
-This means that all we need to know of Iris is its netcdf load+save interfaces.
+Convert :class:`~ncdata.NcData`\s to and from Iris :class:`~iris.cube.Cube`\s.
"""
+#
+# NOTE: This uses the :mod:`ncdata.dataset_like` interface ability to mimic a
+# :class:`netCDF4.Dataset` object, which can then be loaded like a file into Iris.
+# The Iris netcdf loader now has specific support for loading an open dataset object,
+# see : https://github.com/SciTools/iris/pull/5214.
+# This means that, hopefully, all we need to know of Iris itself is the load and save,
+# though we do specifically target the netcdf format interface.
+#
+
from typing import Any, AnyStr, Dict, Iterable, Union
import iris
@@ -19,10 +24,6 @@
__all__ = ["from_iris", "to_iris"]
-#
-# The primary conversion interfaces
-#
-
def to_iris(ncdata: NcData, **iris_load_kwargs: Dict[AnyStr, Any]) -> CubeList:
"""
@@ -40,7 +41,7 @@ def to_iris(ncdata: NcData, **iris_load_kwargs: Dict[AnyStr, Any]) -> CubeList:
Returns
-------
- cubes : CubeList
+ cubes : iris.cube.CubeList
loaded results
"""
dslike = Nc4DatasetLike(ncdata)
@@ -61,7 +62,7 @@ def from_iris(
cubes : :class:`iris.cube.Cube`, or iterable of Cubes
cube or cubes to "save" to an NcData object.
iris_save_kwargs : dict
- additional keys passed to :func:`iris.save` operation.
+ additional keys passed to :func:`iris.fileformats.netcdf.save` operation.
Returns
-------
diff --git a/lib/ncdata/iris_xarray.py b/lib/ncdata/iris_xarray.py
index 9a6a288..0a6891e 100644
--- a/lib/ncdata/iris_xarray.py
+++ b/lib/ncdata/iris_xarray.py
@@ -1,11 +1,12 @@
r"""
Interface routines for converting data between Xarray and Iris.
-Convert :class:`~xarray.Dataset` to and from Iris :class:`~iris.cube.Cube`\\s.
+Convert :class:`~xarray.Dataset`\s to and from Iris :class:`~iris.cube.Cube`\s.
By design, these transformations should be equivalent to saving data from one package
-to a netcdf file, and re-loading into the other package. There is also support for
-passing additional keywords to the appropriate load/save routines.
+to a netcdf file, and re-loading into the other package. But without actually saving
+or loading data, of course. There is also support for passing additional keywords to
+the relevant load/save routines.
"""
import xarray
@@ -57,7 +58,7 @@ def cubes_to_xarray(
cubes, iris_save_kwargs=None, xr_load_kwargs=None
) -> xarray.Dataset:
r"""
- Convert Iris :class:`iris.cube.Cube`\\s to an xarray :class:`xarray.Dataset`.
+ Convert Iris :class:`iris.cube.Cube`\s to an xarray :class:`xarray.Dataset`.
Equivalent to saving the dataset to a netcdf file, and loading that with Xarray.
@@ -70,8 +71,7 @@ def cubes_to_xarray(
source data
iris_save_kwargs : dict
- additional keywords passed to :func:`iris.save`, and to
- :func:`iris.fileformats.netcdf.saver.save`
+ additional keywords passed to :func:`iris.fileformats.netcdf.save`.
xr_load_kwargs : dict
additional keywords passed to :meth:`xarray.Dataset.load_store`
diff --git a/lib/ncdata/netcdf4.py b/lib/ncdata/netcdf4.py
index 8c77ff2..5ddef22 100644
--- a/lib/ncdata/netcdf4.py
+++ b/lib/ncdata/netcdf4.py
@@ -318,6 +318,11 @@ def from_nc4(
(160, 15)
>>>
+ See also : :ref:`howto_load_variablewidth_strings` : This illustrates a particular
+ case which **does** encounter an error with dask "auto" chunking, and therefore
+ also fails with a plain "from_nc4" call. The ``dim_chunks`` keyword enables you to
+ work around the problem.
+
"""
if dim_chunks is None:
dim_chunks = {}
diff --git a/lib/ncdata/threadlock_sharing.py b/lib/ncdata/threadlock_sharing.py
index c29dbe5..51ef237 100644
--- a/lib/ncdata/threadlock_sharing.py
+++ b/lib/ncdata/threadlock_sharing.py
@@ -8,8 +8,9 @@
Most commonly, this occurs when netcdf file data is read to
compute a Dask array, or written in a Dask delayed write operation.
-All 3 data-format packages can map variable data into Dask lazy arrays. Iris and
-Xarray can also create delayed write operations (but ncdata currently does not).
+All 3 data-format packages (ncdata, Iris and xarray) can map variable data into Dask
+lazy arrays on file load. Iris and Xarray can also create delayed write operations
+(but ncdata currently does not).
However, those mechanisms cannot protect an operation of that package from
overlapping with one in *another* package.
@@ -17,12 +18,12 @@
This module can ensure that all of the enabled packages use the *same* thread lock,
so that any and all of them can safely co-operate in parallel operations.
-sample code::
+sample usages::
from ncdata.threadlock_sharing import enable_lockshare, disable_lockshare
from ncdata.xarray import from_xarray
- from ncdata.iris import from_iris
- from ncdata.netcdf4 import to_nc4
+ from ncdata.iris import from_iris, to_iris
+ from ncdata.netcdf4 import to_nc4, from_nc4
enable_lockshare(iris=True, xarray=True)
@@ -36,10 +37,16 @@
or::
with lockshare_context(iris=True):
- ncdata = NcData(source_filepath)
- ncdata.variables['x'].attributes['units'] = 'K'
- cubes = ncdata.iris.to_iris(ncdata)
- iris.save(cubes, output_filepath)
+ ncdata = from_nc4(source_filepath)
+ my_adjust_process(ncdata)
+ data_cube = to_iris(ncdata).extract("main_var")
+ grid_cube = iris.load_cube(grid_path, "grid_cube")
+ result_cube = data_cube.regrid(grid_cube)
+ iris.save(result_cube, output_filepath)
+
+.. WARNING::
+ The solution in this module is at present still experimental, and not itself
+ thread-safe. So probably can only be applied at the outer level of an operation.
"""
from contextlib import contextmanager
@@ -69,7 +76,7 @@ def enable_lockshare(iris: bool = False, xarray: bool = False):
Notes
-----
- If an 'enable_lockshare' call was already established, the function does nothing,
+ If an ``enable_lockshare`` call was already established, the function does nothing,
i.e. it is not possible to modify an existing share. Instead, you must call
:func:`disable_lockshare` to cancel the current sharing, before you can establish
a new one.
diff --git a/lib/ncdata/utils/_compare_nc_datasets.py b/lib/ncdata/utils/_compare_nc_datasets.py
index 655babf..28f5d28 100644
--- a/lib/ncdata/utils/_compare_nc_datasets.py
+++ b/lib/ncdata/utils/_compare_nc_datasets.py
@@ -32,30 +32,61 @@ def dataset_differences(
suppress_warnings: bool = False,
) -> List[str]:
r"""
- Compare netcdf data objects.
+ Compare two netcdf datasets.
- Accepts paths, pathstrings, open :class:`netCDF4.Dataset`\\s or :class:`NcData` objects.
+ Accepts paths, pathstrings, open :class:`netCDF4.Dataset`\s or
+ :class:`~ncdata.NcData` objects.
+ File paths are opened with the :mod:`netCDF4` module.
Parameters
----------
- dataset_or_path_1, dataset_or_path_2 : str or Path or netCDF4.Dataset or NcData
- two datasets to compare, either NcData or netCDF4
- check_dims_order, check_vars_order, check_attrs_order, check_groups_order : bool, default True
- If False, no error results from the same contents in a different order,
- however unless `suppress_warnings` is True, the error string is issued as a warning.
- check_names: bool, default False
+ dataset_or_path_1 : str or Path or netCDF4.Dataset or NcData
+ First dataset to compare : either an open :class:`netCDF4.Dataset`, a path to
+ open one, or an :class:`~ncdata.NcData` object.
+
+ dataset_or_path_2 : str or Path or netCDF4.Dataset or NcData
+ Second dataset to compare : either an open :class:`netCDF4.Dataset`, a path to
+ open one, or an :class:`~ncdata.NcData` object.
+
+ check_dims_order : bool, default True
+ If False, no error results from the same dimensions appearing in a different
+ order. However, unless `suppress_warnings` is True, the error string is issued
+ as a warning.
+
+ check_vars_order : bool, default True
+ If False, no error results from the same variables appearing in a different
+ order. However unless `suppress_warnings` is True, the error string is issued
+ as a warning.
+
+ check_attrs_order : bool, default True
+ If False, no error results from the same attributes appearing in a different
+ order. However unless `suppress_warnings` is True, the error string is issued
+ as a warning.
+
+ check_groups_order : bool, default True
+ If False, no error results from the same groups appearing in a different order.
+ However unless `suppress_warnings` is True, the error string is issued as a
+ warning.
+
+ check_names : bool, default False
Whether to warn if the names of the top-level datasets are different
- check_dims_unlimited: bool, default True
+
+ check_dims_unlimited : bool, default True
Whether to compare the 'unlimited' status of dimensions
+
check_var_data : bool, default True
If True, all variable data is also checked for equality.
If False, only dtype and shape are compared.
- NOTE: comparison of large arrays is done in-memory, so may be highly inefficient.
- show_n_first_different: int, default 2
+ NOTE: comparison of arrays is done in-memory, so could be highly inefficient
+ for large variable data.
+
+ show_n_first_different : int, default 2
Number of value differences to display.
+
suppress_warnings : bool, default False
When False (the default), report changes in content order as Warnings.
When True, ignore changes in ordering.
+ See also : :ref:`container-ordering`.
Returns
-------
@@ -68,6 +99,7 @@ def dataset_differences(
ds2_was_path = not hasattr(dataset_or_path_2, "variables")
ds1, ds2 = None, None
try:
+ # convert path-likes to netCDF4.Dataset
if ds1_was_path:
ds1 = nc.Dataset(dataset_or_path_1)
else:
@@ -78,6 +110,9 @@ def dataset_differences(
else:
ds2 = dataset_or_path_2
+ # NOTE: Both ds1 and ds2 are now *either* NcData *or* netCDF4.Dataset
+ # _isncdata() will be used to distinguish.
+
errs = _group_differences(
ds1,
ds2,
diff --git a/lib/ncdata/xarray.py b/lib/ncdata/xarray.py
index bc8e876..ecdd9d5 100644
--- a/lib/ncdata/xarray.py
+++ b/lib/ncdata/xarray.py
@@ -1,12 +1,15 @@
-"""
+r"""
Interface routines for converting data between ncdata and xarray.
-Converts :class:`~ncdata.NcData` to and from Xarray :class:`~xarray.Dataset` objects.
-
-This embeds a certain amount of Xarray knowledge (and dependency), hopefully a minimal
-amount. The structure of an NcData object makes it fairly painless.
+Converts :class:`ncdata.NcData`\s to and from :class:`xarray.Dataset` objects.
"""
+
+# NOTE: This embeds a certain amount of Xarray knowledge (and dependency).
+# Hopefully a minimal amount.
+# The structure of an NcData object makes it fairly painless.
+#
+
from pathlib import Path
from typing import AnyStr, Union
@@ -159,12 +162,14 @@ def to_xarray(ncdata: NcData, **xarray_load_kwargs) -> xr.Dataset:
"""
Convert :class:`~ncdata.NcData` to an xarray :class:`~xarray.Dataset`.
+ Behaves (ideally, somewhat) like an :func:`xarray.load_dataset` call.
+
Parameters
----------
ncdata : NcData
source data
- kwargs : dict
+ xarray_load_kwargs : dict
additional xarray "load keywords", passed to :meth:`xarray.Dataset.load_store`
Returns
@@ -182,12 +187,14 @@ def from_xarray(
"""
Convert an xarray :class:`xarray.Dataset` to a :class:`NcData`.
+ Behaves (ideally, somewhat) like an :meth:`xarray.Dataset.to_netcdf` call.
+
Parameters
----------
xrds : :class:`xarray.Dataset`
source data
- kwargs : dict
+ xarray_save_kwargs : dict
additional xarray "save keywords", passed to
:meth:`xarray.Dataset.dump_to_store`