geotiff: require urllib3, drop stdlib HTTP fallback (#2050)#2055
Merged
Conversation
_HTTPSource had two transport paths: a urllib3 path that pins the TCP connection to the IP returned by _validate_http_url, and a stdlib urllib.request fallback that re-resolves the hostname at request time. urllib3 was not in install_requires, so a real install could land on the fallback and silently lose the DNS-rebinding IP pin from #1846. This change makes urllib3 a hard install dependency, removes the stdlib fallback in read_range and read_all, and deletes _get_stdlib_opener and _ValidatingRedirectHandler since nothing calls them anymore. The matching test_stdlib_* cases in test_ssrf_hardening_1664.py are removed; redirect re-validation is still covered by the test_urllib3_* cases.
Address review feedback on #2055: - _build_pinned_connection_classes: update stale docstring that called urllib3 an optional import. The lazy import is still fine, but the reason is module-load cost, not optionality. - test_dns_rebinding_pin_issue_1846: drop five pytest.importorskip ("urllib3") calls. urllib3 is required now; the skip can never trigger.
brendancol
added a commit
that referenced
this pull request
May 18, 2026
#2050 / #2055 dropped the stdlib ``urllib.request`` HTTP fallback and made urllib3 a hard dependency. The three ``test_read_all_stdlib_fallback_*`` cells that exercised the fallback's byte-budget enforcement no longer have a code path to hit; the urllib3-only equivalents above keep the contract covered.
brendancol
added a commit
that referenced
this pull request
May 18, 2026
* geotiff: bound HTTP full-image strip reads to declared byte size (#2051) `_HTTPSource.read_all()` was unbounded: no Content-Length validation, no streaming cap. The full-image stripped HTTP path in `_fetch_decode_cog_http_strips` only ran `_check_dimensions` against the TIFF header's declared pixel count, then handed the full body to `_read_strips`. A tiny declared raster (100x100, well within `_check_dimensions`) could still be served as a multi-gigabyte body and the whole body was allocated before TIFF parsing rejected anything. Fix (option b from the security report): - `_HTTPSource.read_all()` gains a `max_bytes: int | None = None` parameter. When set, it rejects an oversized `Content-Length` up front via `OSError`, then streams the body (urllib3 `preload_content=False` / stdlib `resp.read` in 64 KiB chunks) and aborts past `max_bytes + 1`. The `+ 1` is the over-shoot probe: a body that exactly matches the cap passes, but a server that omits or lies about Content-Length is caught the moment the first extra byte arrives. - `_request()` gains `preload_content=False` plumbing and releases the connection on the redirect path so streaming responses do not leak pool slots. - `_fetch_decode_cog_http_strips` computes the budget from the strip table (`max(offset + byte_count) + 4 MiB slack` for TIFF header / trailing metadata) and passes it through. - New `_compute_full_image_byte_budget` helper handles the degenerate cases (missing or all-sparse strip table) by falling back to the per-strip safety cap. `max_bytes=None` preserves the legacy unbounded behaviour for the non-HTTP callers (cloud reads are already gated by `max_cloud_bytes`; file reads are bounded by file size). Tests cover: - Content-Length advertised over budget -> OSError before the read. - Server omits Content-Length entirely (chunked) -> streaming cap catches the over-shoot. - Server lies low (advertises small, sends big) -> urllib3 truncates at the advertised length, locked in as the expected behaviour. - Legitimate full-image COG reads still work end-to-end. - Attacker pads a legitimate COG body with 64 MiB of zeros -> read rejected by either the Content-Length pre-flight or the streaming cap. - Budget helper returns the strip-table maximum plus slack, and falls back to the per-strip cap for empty / degenerate tables. Closes #2051. * geotiff: address review feedback on read_all byte-budget (#2051) - Stdlib fallback: add three tests that force ``self._pool = None`` so the urllib.request branch in ``read_all`` is actually exercised. One for the Content-Length pre-flight check, one for the streaming cap on chunked bodies, and one for the legitimate-body pass-through. - Redirect on the streaming path: drain the redirect response body before ``release_conn`` so urllib3 can return the connection to the pool instead of closing it. 3xx bodies are tiny so the drain is cheap. - Doc tweak in ``_check_content_length``: spell out that returning silently on missing / unparseable headers is intentional because the streaming cap is the real defence. - Test docstring fix: the padded-body test rejects before the body is fully buffered, not before it is allocated at all. * geotiff: drop stdlib-fallback tests now that urllib3 is required (#2051) #2050 / #2055 dropped the stdlib ``urllib.request`` HTTP fallback and made urllib3 a hard dependency. The three ``test_read_all_stdlib_fallback_*`` cells that exercised the fallback's byte-budget enforcement no longer have a code path to hit; the urllib3-only equivalents above keep the contract covered.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2050
_HTTPSourcehad two HTTP transports. The urllib3 path pins the TCP connection to the IP returned by_validate_http_url, which closes the DNS-rebinding TOCTOU window from #1846. The stdliburllib.requestfallback does not pin: it re-resolves the hostname at request time, so a hostile resolver can return a public IP at validation time and a private IP at connect time.urllib3was not ininstall_requires, so a real install could end up on the unpinned path and silently lose the #1846 mitigation.install_requires change
setup.cfg: addedurllib3under[options] install_requiresRemoved
_HTTPSource.read_range(formerly_reader.py:1048)_HTTPSource.read_all(formerly_reader.py:1222)_get_stdlib_openerand the_stdlib_openermodule-level cache_ValidatingRedirectHandler(the urllib redirect handler)try / except ImportErrorin_get_http_pooltest_stdlib_*cases intest_ssrf_hardening_1664.py; thetest_urllib3_*cases still cover redirect re-validation for the path that remainsTest plan
xrspatial/geotiff/tests/test_http_no_stdlib_fallback_2050.py(12 tests) locks in: urllib3 is importable,_get_http_poolreturns aPoolManager, the removed symbols are gone, the reader source no longer importsurllib.request,read_rangeandread_allstill round-trip a mocked urllib3 pool, andsetup.cfglists urllib3pytest xrspatial/geotiff/tests/ -k "http or ssrf or dns_rebinding": 161 passedtest_predictor2_big_endian_gpu_1517.py,test_size_param_validation_gpu_vrt_1776.py) confirmed present onmain, unrelated to this PR