Turn TIFF and COG archives into Zarr stores without copying any data.
Virtual TIFF emits a VirtualiZarr-compatible Zarr v3 store backed by byte-range references into the original TIFFs. Persist it with Icechunk and you've published a coherent datacube — readable in any language with a Zarr+Icechunk client — without copying any pixels.
What this lets you do:
- Curate what's exposed. Pick which bands, overviews, and AOIs land in the published store; consumers see one datacube, not hundreds of files.
- Detect source drift. Icechunk records
ETags, so analyses can verify the source TIFFs haven't changed since the manifest was built. - Open non-COG TIFFs without rewriting them. Internally tiled TIFFs that aren't quite COG-compliant still get fast cloud-native access through the virtual store.
- You're building a datacube product over a TIFF/COG archive that should outlive any single Python session.
- You need non-Python clients (zarrs, zarrita.js, zarr-layer) to read the archive without knowing it's TIFF underneath.
- You want Icechunk-versioned access to the archive: snapshots, transactions, time-travel as new acquisitions land.
- The archive is queried many times, and amortizing per-file IFD discovery across all those queries actually matters.
- You want to expose overviews as a native Zarr multiscale group, so downstream tools (visualization, fast analytics) can use them directly.
Virtual TIFF stitches, it doesn't mosaic. Combining files into a single array requires a structured grid — matching CRS and resolution, or resolution that varies systematically along an axis (e.g. via rectilinear chunking). Heterogeneous TIFFs can still coexist as separate arrays in a DataTree, but you lose the unified-cube benefit. Pixel-level mosaicking and reprojection happen downstream in numpy, dask, or rioxarray — Virtual TIFF doesn't do math.
If your workflow is "open a STAC search, get an xarray DataArray, do analysis," you probably don't need a virtual store. Reach for one of:
- lazycogs — STAC + async-geotiff with on-the-fly reprojection, for dynamic queries and heterogeneous-CRS data.
- stackstac / odc-stac — established STAC-to-DataArray loaders for analyst workflows.
- async-tiff / async-geotiff directly — when you just want a fast async TIFF reader and don't need a Zarr surface at all.
Virtual TIFF shares the same async-tiff I/O layer as lazycogs and async-geotiff; stackstac and odc-stac sit on rasterio/GDAL instead. The bigger split is what gets produced: a runtime DataArray versus a publishable virtual Zarr store. Pick the one that matches your output.
The point of Virtual TIFF is that it's not in the read path. It runs once, when the manifest is built. After that, every consumer goes straight from their Zarr client to the manifest to the TIFF byte ranges.
Build-time (once, by the data publisher)
TIFFs / COGs in S3, GCS, Azure, …
│
│ byte-range GETs for IFD metadata
▼
async-tiff + obstore
│
▼
Virtual TIFF ── VirtualiZarr parser, run once
│
▼
manifest committed to an Icechunk repo
Read-time (every time, in any session)
Zarr v3 client + Icechunk store driver
(e.g. zarr-python + icechunk-python,
zarrs + icechunk-rs, …)
│
│ Zarr reads issued through the Icechunk Store
▼
Icechunk repo ── snapshot + manifest
│
│ Icechunk resolves chunk keys to
│ (file_url, offset, length) per chunk
▼
TIFFs / COGs in S3, GCS, Azure, …
│
│ parallel byte-range GETs
▼
decoded chunks via the Zarr codec pipeline
Note the absence of virtual-tiff and async-tiff from the read-time path. They're build-time tools; once the manifest exists, consumers reach the source bytes through Icechunk alone.
python -m pip install virtual-tiffimport obstore
import xarray as xr
from obspec_utils.registry import ObjectStoreRegistry
from virtual_tiff import VirtualTIFF
bucket_url = "s3://e84-earth-search-sentinel-data/"
file_url = f"{bucket_url}sentinel-2-c1-l2a/10/T/FR/2023/12/S2B_T10TFR_20231223T190950_L2A/B04.tif"
s3_store = obstore.store.from_url(bucket_url, region="us-west-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket_url: s3_store})
parser = VirtualTIFF(ifd=0)
manifest_store = parser(url=file_url, registry=registry)
ds = xr.open_zarr(manifest_store, zarr_format=3, consolidated=False)Works equally for GCS, Azure, or any obstore-supported backend — swap the store factory.
from virtualizarr import open_virtual_dataset
from virtual_tiff import VirtualTIFF
ds = open_virtual_dataset(
url=file_url,
registry=registry,
parser=VirtualTIFF(ifd=0),
)| TIFF feature | Supported | Notes |
|---|---|---|
| Strips | ✅ | Image height must be evenly divisible by rows-per-strip |
| Tiles | ✅ | |
| Multiple IFDs | ✅ | |
| Nested pages / IFDs | ❌ | |
| Compressions: Uncompressed, PackBits, Zlib, LZMA, Lerc, PNG, Deflate, LZW, JPEGXL, JPEG8, WebP | ✅ | |
| JPEG | ❌ | Quantization tables (the JPEGTables tag) are not yet supported, which excludes nearly all JPEG-encoded TIFFs in practice. |
| CMYK | ✅ | |
| YCbCr / CIE L*a*b* / Palette-color | ❌ | |
| Grayscale, RGB | ✅ | |
| PlanarConfiguration (chunky and planar) | ✅ | |
| Both byte orders (II & MM) | ✅ | |
| BigTIFF (64-bit offsets) | ✅ |
git clone https://github.com/virtual-zarr/virtual-tiff.gitpixi run -e test download-test-images(downloads ~1.4 GB of test TIFFs)pixi run -e test run-tests— note: some tests are expected to fail while the implementation is in progress.pixi run -e test zshfor a dev shell.
Test data is populated from three upstream sources via sync scripts:
uv run scripts/sync_gdal_tiffs.py— GDAL autotest TIFFsuv run scripts/sync_external_tiffs.py— external TIFFs from various URLsuv run scripts/sync_geotiff_test_data.py— fixtures from geotiff-test-data
virtual-tiff is distributed under the terms of the
MIT license.