Skip to content

General data wrangling #57

@pp-mo

Description

@pp-mo

Although strictly excluded as a goal for the initial release,
I still think the 'secondary' usage of ncdata will be useful :

  • for modifying data before loading, or after saving, with an analysis package
  • or just to adjust data and save to another file

For this there real scope for some convenience and sugar.
Some ideas :

  • ds.is_valid(error_when_not=False) : checking the consistencies not ensured by the free-and-easy design
    • ideas
      • all elements are filed under their own name (e.g. ds.variables['x'].name == 'x')
      • dims used by variables all exist
      • variables all have data
      • variable data shapes all match the dims
    • ( delivered : Save errors util #64 )
  • make it easy to add items by name : el.variables[var.name] = var --> el.variables.add(var)
  • make it easy to rename content, e.g. ds.variables.rename('x', 'y')
  • make it easy to construct containers (variables, attributes) from lists of element specifications
    e.g. NcData(dimensions=nc_dims(x=3, y=5, t=(2, True)), variables=nc_vars(x=(['x'], int), y=(['y'], int), data=(['t', 'y', 'x'], float))
    (or something !)
  • special convenience handling for attrs : e,g,
    el.ncd_setatt(name, value) ~= el.attributes[name] = NcAttribute(name, value)
    el.ncd_getatt(name) ~= el.attributes.get('name', NcAttribute('', None)).as_python_value()

Update:

v0.1.1 delivered most of this :


For instance, some actions I needed to adjust a given file output from xarray so that Iris can correctly interpret the coord-system ...

>>> ds = ncdata.netcdf4.from_nc4(filepath)
>>> ds.variables['x'].attributes['standard_name'] = NcAttribute('standard_name', 'projection_x_coordinate')
>>> ds.variables['y'].attributes['standard_name'] = NcAttribute('standard_name', 'projection_y_coordinate')
>>> ds.variables['x'].attributes['units'] = NcAttribute('units', 'm')
>>> ds.variables['y'].attributes['units'] = NcAttribute('units', 'm')
>>> del ds.variables['spatial_ref'].attributes['spatial_ref']
>>> del ds.variables['spatial_ref'].attributes['crs_wkt']
>>> del ds.variables['spatial_ref'].attributes['horizontal_datum_name'] 
>>> cube, = to_iris(ds)
>>> print(cube.coord_system)
<bound method Cube.coord_system of <iris 'Cube' of band_data / (unknown) (band: 5; projection_y_coordinate: 6400; projection_x_coordinate: 7600)>>
>>> print(cube.coord_system())
TransverseMercator(latitude_of_projection_origin=53.5, longitude_of_central_meridian=-8.0, false_easting=200000.0, false_northing=250000.0, scale_factor_at_central_meridian=1.000035, ellipsoid=GeogCS(semi_major_axis=6377340.189, semi_minor_axis=6356034.447938534))
>>> 

So, how about

ds.variables['x'].attributes.update(NameMap(
    NcAttribute,  # type of contents
    ('standard_name', 'projection_x_coordinate'),  # *args are init arglists
    (`units', 'm')
))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions