Skip to content

Reject VRT SourceFilename paths outside the VRT directory (#1671)#1679

Merged
brendancol merged 2 commits into
mainfrom
issue-1671
May 12, 2026
Merged

Reject VRT SourceFilename paths outside the VRT directory (#1671)#1679
brendancol merged 2 commits into
mainfrom
issue-1671

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Fixes #1671.

Problem

The previous path-traversal fix (#1185) called os.path.realpath on every SourceFilename and stopped there. realpath normalises .. segments and resolves symlinks; it does not enforce that the resulting path lives under any specific root. A VRT that declared

<SourceFilename relativeToVRT="1">../../../../etc/passwd</SourceFilename>

or a symlink inside vrt_dir pointing outside it would happily resolve to the target and then get handed to read_to_array. For trusted VRTs this is benign. For VRTs accepted from user input (uploads, untrusted mounts) it is a path-traversal primitive.

Fix

parse_vrt still canonicalises with realpath, then enforces containment:

  • relativeToVRT='1' sources must resolve under realpath(vrt_dir). Rejected otherwise, even if the resolved path happens to land under an allowlisted root - a relative source that escapes the VRT directory is always an attack primitive.
  • relativeToVRT='0' sources are accepted when they resolve under realpath(vrt_dir) (matches the writer's relative=False round trip) or under any entry in XRSPATIAL_VRT_ALLOWED_ROOTS.

XRSPATIAL_VRT_ALLOWED_ROOTS is a colon-separated (os.pathsep) list of directory paths, modelled on the existing XRSPATIAL_GEOTIFF_STRICT env var. Empty entries are ignored so a stray separator does not silently grant access to /.

Containment uses os.path.commonpath rather than startswith so a /foo allowlist entry does not accidentally cover /foobar.

Rejected paths raise ValueError with the resolved path, the VRT root, and the allowlist entries spelled out so operators can diagnose without re-reading the source XML.

The read_vrt docstring documents the new behaviour.

Tests

xrspatial/geotiff/tests/test_vrt_path_containment_1671.py adds 10 tests:

  • ../ traversal via relativeToVRT='1'
  • symlink-based traversal under vrt_dir
  • absolute paths outside the VRT directory (default reject)
  • absolute paths inside the VRT directory (round trip)
  • XRSPATIAL_VRT_ALLOWED_ROOTS single root / multiple roots / empty entries
  • allowlist does not cover a relative-source escape
  • happy-path regression for legitimate relative sources
  • error message names the rejected path and the trusted root

Two existing tests in test_security.py (test_relative_path_canonicalized, test_absolute_path_also_canonicalized) asserted the old silent-canonicalisation behaviour and were updated to assert rejection. One field-extraction test in test_features.py was using /data/tile.tif as a placeholder source path and was updated to use tmp_path/tile.tif so the containment check accepts it.

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_path_containment_1671.py -v (10 passed)
  • pytest xrspatial/geotiff/tests/test_security.py -v (passes)
  • pytest xrspatial/geotiff/tests/ -k vrt (126 passed)
  • Full geotiff suite passes except for 3 pre-existing matplotlib/Python 3.14 deepcopy recursion failures in TestPalette that reproduce on main and are unrelated to this change.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 15:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the GeoTIFF VRT reader against path traversal by enforcing that resolved SourceFilename paths remain within the VRT directory unless explicitly allowlisted, addressing security issue #1671.

Changes:

  • Enforce VRT source-path containment in parse_vrt, with an opt-in allowlist via XRSPATIAL_VRT_ALLOWED_ROOTS.
  • Expand/adjust tests to cover traversal via .., symlink escapes, absolute-path rejection, and allowlist behavior.
  • Document the new containment policy in the public read_vrt docstring.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xrspatial/geotiff/_vrt.py Adds allowlist parsing + robust containment check using commonpath, raising ValueError on escapes.
xrspatial/geotiff/__init__.py Documents the new VRT SourceFilename containment/allowlist policy in read_vrt.
xrspatial/geotiff/tests/test_vrt_path_containment_1671.py New regression tests covering traversal, symlink escape, absolute-path policy, and allowlist behavior.
xrspatial/geotiff/tests/test_security.py Updates existing security tests to reflect the new “reject on escape” behavior.
xrspatial/geotiff/tests/test_features.py Adjusts a VRT parsing test to use tmp_path so the new containment policy permits the placeholder source path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +235 to +248
def test_allowlist_empty_entries_ignored(tmp_path, monkeypatch):
"""Empty entries in the allowlist (from stray colons) are skipped."""
vrt_dir = _unique_dir(tmp_path, "empty_entry_vrt")
outside_dir = _unique_dir(tmp_path, "empty_entry_data")
outside_tif = os.path.join(outside_dir, 'data.tif')
_write_minimal_tif(outside_tif)

vrt_path = os.path.join(vrt_dir, 'mosaic.vrt')
_build_vrt(vrt_path, outside_tif, relative='0')

# Leading/trailing/embedded empty entries should not crash the parser
# or accidentally grant access to ``/``.
value = f":{outside_dir}::"
monkeypatch.setenv('XRSPATIAL_VRT_ALLOWED_ROOTS', value)
Comment on lines +192 to +194
def test_allowlist_supports_multiple_roots(tmp_path, monkeypatch):
"""A colon-separated list permits sources under any listed root."""
vrt_dir = _unique_dir(tmp_path, "multi_vrt")
Comment thread xrspatial/geotiff/_vrt.py Outdated


# Environment variable name used to opt in to absolute source paths
# outside the VRT directory. Colon-separated (``os.pathsep``) list of

# Plant a symlink inside vrt_dir that points to the outside file.
sym = os.path.join(vrt_dir, 'inside.tif')
os.symlink(outside_target, sym)
brendancol added a commit that referenced this pull request May 12, 2026
Reword "colon-separated" to "os.pathsep-separated" in the VRT module
comment and test docstrings so Windows users (where the separator is
';') are not misled. Build the empty-entries env var with os.pathsep
so the test stays cross-platform on the Windows CI matrix. Guard
os.symlink with try/except OSError + pytest.skip since Windows
requires Developer Mode or admin privileges for symlinks and some
filesystems do not support them at all.

No behaviour change on POSIX.
brendancol added a commit that referenced this pull request May 12, 2026
Reword "colon-separated" to "os.pathsep-separated" in the VRT module
comment and test docstrings so Windows users (where the separator is
';') are not misled. Build the empty-entries env var with os.pathsep
so the test stays cross-platform on the Windows CI matrix. Guard
os.symlink with try/except OSError + pytest.skip since Windows
requires Developer Mode or admin privileges for symlinks and some
filesystems do not support them at all.

No behaviour change on POSIX.
os.path.realpath only normalises ../ and symlinks; it does not enforce
containment. A crafted VRT could ship SourceFilename values like
../../../../etc/passwd and read_to_array would open them. For trusted
VRTs this is benign; for VRTs accepted from user input (uploads,
untrusted mounts) it is a path-traversal primitive.

parse_vrt now canonicalises every SourceFilename with os.path.realpath
and then enforces:

* relativeToVRT='1' sources must resolve under the VRT directory.
* relativeToVRT='0' sources must resolve under the VRT directory or
  under a directory listed in XRSPATIAL_VRT_ALLOWED_ROOTS (colon-
  separated, modelled on XRSPATIAL_GEOTIFF_STRICT). Empty entries
  are ignored.

Anything else raises ValueError naming the rejected path and the
trusted roots. The relative case always rejects escapes even when the
allowlist would otherwise cover the resolved path, so an attacker
cannot chain allowlist entries into a relative-source traversal.

The read_vrt docstring documents the new behaviour. Existing tests
that asserted the old silent-canonicalisation behaviour are updated.
test_vrt_path_containment_1671.py covers ../ traversal, symlink
traversal, absolute rejection by default, allowlist opt-in (single
root, multiple roots, empty entries), relative-source escape vs
allowlist interaction, and a happy-path regression for legitimate
relative sources.
Reword "colon-separated" to "os.pathsep-separated" in the VRT module
comment and test docstrings so Windows users (where the separator is
';') are not misled. Build the empty-entries env var with os.pathsep
so the test stays cross-platform on the Windows CI matrix. Guard
os.symlink with try/except OSError + pytest.skip since Windows
requires Developer Mode or admin privileges for symlinks and some
filesystems do not support them at all.

No behaviour change on POSIX.
@brendancol brendancol merged commit ea1383c into main May 12, 2026
4 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_vrt.py realpath() canonicalises but does not block path traversal

2 participants