Skip to content

Extract metadata from NetCDF and HDF5 files as XML in NcML format #9153

@pdurbin

Description

@pdurbin

Assuming PR #9152 is merged we'll have a library in place to start extracting XML from NetCDF and HDF5 files.

The supported XML format is called NcML and is described here: https://docs.unidata.ucar.edu/netcdf-java/current/userguide/ncml_overview.html

Yesterday there was general agreement among devs that it would be fine to save the XML as a derivative or aux file.

This will open the door for previewing the file as raw XML to start.

Additionally, we could work on created a dedicated previewer that shows the data in a nicer way than raw XML.

The code we write will look something like this:

String ncml = netcdfFile.toNcml(file.getName());

Here's the output for an HDF5 file at src/test/resources/hdf/hdf5/vlen_string_dset (from the PR above):

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="file:vlen_string_dset">
  <variable name="DS1" shape="4" type="String" />
</netcdf>

Here's part of the output for a NetCDF file at src/test/resources/netcdf/madis-raob.nc (also from the PR above):

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="file:madis-raob.nc">
  <dimension name="recNum" length="1" isUnlimited="true" />
  <dimension name="manLevel" length="22" />
  <dimension name="sigTLevel" length="150" />
  <dimension name="sigWLevel" length="76" />
  <dimension name="mWndNum" length="4" />
  <dimension name="mTropNum" length="4" />
  <dimension name="staNameLen" length="50" />
  <dimension name="QCcheckNum" length="10" />
  <dimension name="QCcheckNameLen" length="60" />
  <dimension name="maxStaticIds" length="1000" />
  <dimension name="totalIdLen" length="50" />
  <dimension name="nInventoryBins" length="32" />
  <variable name="nStaticIds" shape="" type="int">
    <attribute name="_FillValue" type="int" value="0" />
  </variable>
  <variable name="staticIds" shape="maxStaticIds totalIdLen" type="char">
    <attribute name="_FillValue" value="" />
  </variable>
  <variable name="lastRecord" shape="maxStaticIds" type="int">
    <attribute name="_FillValue" type="int" value="-1" />
  </variable>
  <variable name="invTime" shape="recNum" type="int">
    <attribute name="_FillValue" type="int" value="0" />
  </variable>
...

Here's the full XML/NcML output: madis-ncml.xml.txt


2.5 years ago @qqmyers made some suggestions for previewing XML files at IQSS/dataverse.harvard.edu#70 (comment) . Here's his comment:

"FWIW: Something like https://www.jqueryscript.net/other/tree-xml-viewer-formatter.html adapted with the wiki instructions at https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers/wiki/How-to-create-a-previewer might be a quick win. (I didn't search too hard for an XML viewer - there could be better libraries out there to start from.)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    FY26 Sprint 4FY26 Sprint 4 (2025-08-13 - 2025-08-27)Size: 80A percentage of a sprint. 56 hours.pm.netcdf-hdf5.dAll 3 aims are currently under this deliverable

    Type

    No type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions