Summary
_vrt.read_vrt reads the VRT XML file with an unbounded f.read() (xrspatial/geotiff/_vrt.py:640-641):
with open(vrt_path, 'r') as f:
xml_str = f.read()
safe_fromstring (the parser used downstream) blocks external entity expansion, but it cannot protect against a literally large VRT XML file. A multi-gigabyte VRT file consumes all that memory before parsing even starts.
Why this matters
VRTs are pure XML metadata; pixel data lives in the source TIFFs. A 50k-source VRT runs around 25 MB. There is no realistic scenario where a VRT XML file is hundreds of megabytes, let alone gigabytes. Reading without a cap turns an untrusted (or malformed) VRT path into a memory-exhaustion vector.
This matches the bomb-cap style fixes already applied elsewhere in the geotiff reader (JPEG predecode #1792, DstRect resample cap #1737, max_pixels for VRT source reads #1803).
Proposed fix
Add a configurable size cap on the VRT XML read. Stream the file in bounded chunks; if the total exceeds the cap, raise ValueError with the cap value and the env-var name.
- Default cap: 64 MiB. A 50k-source VRT (~25 MB) fits comfortably with margin.
- Env var:
XRSPATIAL_VRT_MAX_XML_BYTES for operators who legitimately need a larger cap.
- Match the existing style in
_vrt.py for env-driven limits.
Tests
- A small VRT under the cap parses normally.
- A synthetic VRT padded with comment whitespace past the cap raises
ValueError mentioning the cap and the env-var name.
- Setting
XRSPATIAL_VRT_MAX_XML_BYTES to a smaller value lets the padded VRT parse only when the env var is raised above its size.
Summary
_vrt.read_vrtreads the VRT XML file with an unboundedf.read()(xrspatial/geotiff/_vrt.py:640-641):safe_fromstring(the parser used downstream) blocks external entity expansion, but it cannot protect against a literally large VRT XML file. A multi-gigabyte VRT file consumes all that memory before parsing even starts.Why this matters
VRTs are pure XML metadata; pixel data lives in the source TIFFs. A 50k-source VRT runs around 25 MB. There is no realistic scenario where a VRT XML file is hundreds of megabytes, let alone gigabytes. Reading without a cap turns an untrusted (or malformed) VRT path into a memory-exhaustion vector.
This matches the bomb-cap style fixes already applied elsewhere in the geotiff reader (JPEG predecode #1792, DstRect resample cap #1737, max_pixels for VRT source reads #1803).
Proposed fix
Add a configurable size cap on the VRT XML read. Stream the file in bounded chunks; if the total exceeds the cap, raise
ValueErrorwith the cap value and the env-var name.XRSPATIAL_VRT_MAX_XML_BYTESfor operators who legitimately need a larger cap._vrt.pyfor env-driven limits.Tests
ValueErrormentioning the cap and the env-var name.XRSPATIAL_VRT_MAX_XML_BYTESto a smaller value lets the padded VRT parse only when the env var is raised above its size.