Skip to content

Improve file type detection of NetCDF and HDF5 #9117

@pdurbin

Description

@pdurbin

According to Wikipedia NetCDF and HDF5 have magic numbers that should let us detect these file types more easily and reliably than guessing based on file extensions.

NetCDF magic number

CDF\001
\211HDF\r\n\032\n

HDF5 magic number

\211HDF\r\n\032\n

I brought this up at standup today and here are some notes from the discussion:

  • We should see if JHOVE can detect them.
  • Normally, detecting file types by seeking into files is part of detecting tabular files, specifically.
  • Given that NetCDF can be big, this might be a case where switching to a ranged request to find the signature might be important.

We should add some NetCDF and HDF5 files to https://github.com/IQSS/dataverse-sample-data to test with, at some point.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    pm.netcdf-hdf5.dAll 3 aims are currently under this deliverable

    Type

    No type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions