Skip to content

open_geotiff drops user-defined CRS WKT on read (crs/crs_wkt not set, only crs_name) #1632

@brendancol

Description

@brendancol

What happened

open_geotiff reads a GeoTIFF whose CRS is user-defined (WKT stored in the GeoTIFF citation, GeoKey GEOKEY_PROJECTED_CS_TYPE == 32767) and emits the WKT under attrs['crs_name'] only. Neither attrs['crs'] nor attrs['crs_wkt'] is populated. to_geotiff ignores attrs['crs_name'], so a read -> write round-trip silently drops the CRS.

Users who only read files see the WKT survive (as crs_name). Pipelines that read, process, and write a user-defined CRS GeoTIFF lose the projection on the output. Files with a recognized EPSG (e.g. EPSG:4326) are unaffected because the EPSG resolves to WKT via pyproj and the read path emits both attrs['crs'] and attrs['crs_wkt'].

Repro

import numpy as np
import xarray as xr
import tifffile
from xrspatial.geotiff import open_geotiff, to_geotiff

WKT = (
    'PROJCS["User defined LCC",GEOGCS["NAD83",DATUM["North American Datum 1983",'
    'SPHEROID["GRS 1980",6378137,298.257222101]],PRIMEM["Greenwich",0],'
    'UNIT["degree",0.0174532925199433]],PROJECTION["Lambert Conformal Conic 2SP"],'
    'PARAMETER["central_meridian",-100],PARAMETER["latitude_of_origin",40],'
    'PARAMETER["standard_parallel_1",30],PARAMETER["standard_parallel_2",50],'
    'UNIT["metre",1]]'
)

arr = np.ones((4, 4), dtype=np.float32)
da = xr.DataArray(arr, dims=['y', 'x'],
                  coords={'y': np.linspace(50, 47, 4),
                          'x': np.linspace(10, 13, 4)},
                  attrs={'crs_wkt': WKT})
to_geotiff(da, "src.tif", compression="none")

rd = open_geotiff("src.tif")
print("crs:",      rd.attrs.get("crs"))      # None
print("crs_wkt:",  rd.attrs.get("crs_wkt"))  # None
print("crs_name:", rd.attrs.get("crs_name"))  # WKT survives here

to_geotiff(rd, "dst.tif", compression="none")
with tifffile.TiffFile("dst.tif") as tif:
    print(tif.pages[0].tags.get(34735).value)  # GeoKeyDirectory: no CRS keys
    print(tif.pages[0].tags.get(34737))         # GeoAsciiParams: None

What should happen

When geo_info.crs_epsg is None but the citation looks like WKT (which is how GDAL stores user-defined CRSes), the read path should expose that WKT as attrs['crs_wkt'] and keep attrs['crs_name'] for back-compat. to_geotiff already accepts attrs['crs_wkt'] and re-emits it via GEOKEY_*_CS_TYPE = 32767 plus the citation.

Severity

HIGH. Read -> write pipelines silently lose the projection on user-defined CRS files. All four read backends (numpy, dask, cupy, dask+cupy) drop the same attr.

Suggested fix

In xrspatial/geotiff/_geotags.py::extract_geo_info, when no EPSG is resolved but the citation parses as WKT (starts with PROJCS[, GEOGCS[, PROJCRS[, GEOGCRS[, BOUNDCRS[, COMPOUNDCRS[, VERTCRS[, or LOCAL_CS[), set geo_info.crs_wkt = crs_name. _populate_attrs_from_geo_info then propagates it correctly to every backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions