fix: validate redirects in favicon fetching to prevent SSRF#8858
fix: validate redirects in favicon fetching to prevent SSRF#8858sriramveeraghanta merged 2 commits intopreviewfrom
Conversation
The previous SSRF fix (GHSA-jcc6-f9v6-f7jw) only validated redirects for the main page URL but not for the favicon fetch path. An attacker could craft an HTML page with a favicon link that redirects to a private IP, bypassing the IP validation and leaking internal network data as base64. Extract a reusable `safe_get()` function that validates every redirect hop against private/internal IPs and use it for both page and favicon fetches. Resolves: GHSA-9fr2-pprw-pp9j Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughIntroduces Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Improves SSRF protections in work-item link crawling by validating every redirect hop during HTML and favicon fetches.
Changes:
- Introduces a reusable
safe_get()wrapper that manually follows redirects with per-hop private/internal IP validation. - Updates HTML fetching in
crawl_work_item_link_title_and_favicon()to usesafe_get()and logs validation failures. - Updates favicon fetching to use
safe_get()instead of a redirect-followingrequests.get().
- Fix off-by-one in redirect limit: only raise RuntimeError when the response is still a redirect after MAX_REDIRECTS hops, not when the final response is a successful 200 - Return final URL from safe_get() so favicon href resolution uses the correct origin after redirects instead of the original URL - Add unit tests for validate_url_ip and safe_get covering private IP blocking, redirect-following, and redirect limit enforcement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/plane/bgtasks/work_item_link_task.py`:
- Around line 98-111: The current redirect handling has an off-by-one: after
following redirects the code unconditionally raises when redirect_count >=
MAX_REDIRECTS even if the final response is non-redirect; update the check so we
only raise when we hit the limit and the response is still a redirect.
Concretely, modify the post-loop condition that raises to only raise when both
redirect_count >= MAX_REDIRECTS and response.is_redirect (referencing
redirect_count, MAX_REDIRECTS, and response in work_item_link_task.py), so valid
chains that terminate exactly on the MAX_REDIRECTS-th hop are accepted.
- Around line 136-148: The favicon resolution currently uses the pre-redirect
URL; update the call to fetch_and_encode_favicon so it uses the
final/fully-resolved page URL (response.url) as the base for urljoin instead of
the original url; i.e., after safe_get returns, pass response.url to
fetch_and_encode_favicon (and if response is None due to a caught exception,
fall back to the original url) so that link rel="icon" and the /favicon.ico
fallback are resolved against the actual final host/path.
- Around line 91-107: The current validate_url_ip() check is vulnerable to DNS
rebinding TOCTOU because requests/resolver is called again when opening sockets;
fix by making the HTTP connection use the pre-resolved IP instead of letting
requests re-resolve hostnames: resolve the hostname once (reuse
validate_url_ip() to return the vetted IP for each URL/hop), and then perform
requests by connecting to that IP while setting the original hostname in the
Host header (or using a custom TransportAdapter/ConnectionPool that calls
create_connection with the resolved IP) so the socket connect cannot re-resolve;
apply this for the initial fetch and for each redirect hop in
work_item_link_task.py (use the resolved_ip returned by validate_url_ip() when
issuing requests.get and when following redirects), or alternatively enforce
egress firewall rules outside the app if binding to IP is not possible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e4d179a1-3884-4770-b97b-e11dfe3eed42
📒 Files selected for processing (1)
apps/api/plane/bgtasks/work_item_link_task.py
| validate_url_ip(url) | ||
|
|
||
| current_url = url | ||
| response = requests.get( | ||
| current_url, headers=headers, timeout=timeout, allow_redirects=False | ||
| ) | ||
|
|
||
| redirect_count = 0 | ||
| while response.is_redirect and redirect_count < MAX_REDIRECTS: | ||
| redirect_url = response.headers.get("Location") | ||
| if not redirect_url: | ||
| break | ||
| current_url = urljoin(current_url, redirect_url) | ||
| validate_url_ip(current_url) | ||
| redirect_count += 1 | ||
| response = requests.get( | ||
| current_url, headers=headers, timeout=timeout, allow_redirects=False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file to understand the implementation
cat -n apps/api/plane/bgtasks/work_item_link_task.py | head -150Repository: makeplane/plane
Length of output: 6259
🏁 Script executed:
# Let's also look for the validate_url_ip function definition
rg -n "def validate_url_ip" apps/api/plane/bgtasks/work_item_link_task.pyRepository: makeplane/plane
Length of output: 100
🏁 Script executed:
# Search for any imports related to requests or socket
rg -n "^import|^from" apps/api/plane/bgtasks/work_item_link_task.py | head -30Repository: makeplane/plane
Length of output: 429
🏁 Script executed:
# Search for any connection pooling, session management, or DNS caching configuration
rg -n "Session|Pool|Cache|resolve|getaddrinfo|urllib3" apps/api/plane/bgtasks/work_item_link_task.pyRepository: makeplane/plane
Length of output: 250
🏁 Script executed:
# Check if requests library is configured with any session or custom resolver
rg -n "requests.Session|HTTPAdapter|resolver" apps/api/plane/Repository: makeplane/plane
Length of output: 281
🏁 Script executed:
# Check environment setup or global request configuration
fd -type f -name "*.py" apps/api/ | xargs rg -l "requests.packages|urllib3" | head -10Repository: makeplane/plane
Length of output: 229
🏁 Script executed:
# Check the requests library version and any DNS caching behavior in requirements or setup files
find . -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" | xargs grep -l "requests" 2>/dev/null | head -5Repository: makeplane/plane
Length of output: 41
🏁 Script executed:
# Find dependency specifications
fd -type f "requirements|setup|pyproject" | grep -E "(txt|py|toml|cfg)$"Repository: makeplane/plane
Length of output: 229
🌐 Web query:
urllib3 requests library DNS resolution caching behavior
💡 Result:
urllib3(and thereforerequests) does not implement its own DNS cache. When it needs a new TCP connection, it resolves hostnames via the OS resolver by callingsocket.getaddrinfo()(i.e., whatever your system/NSS/DNS configuration does). [1]- Any “DNS caching” you observe typically comes from connection reuse, not cached DNS answers. Both
urllib3andrequests.Session()keep connection pools keyed by host/port and will reuse an existing keep-alive connection when possible—so no new DNS lookup happens if no new connection is created. [2] - When a new connection is created, resolution happens again via the OS resolver path (so whether results are cached/for how long is OS-dependent). [1]
- Implication for DNS changes / load balancers: if DNS for a hostname changes, an already-open pooled connection will keep talking to the old IP until that connection is dropped/closed and a new one is created. This is a consequence of pooling/keep-alive behavior. [2]
- To “refresh” which IP you hit: force new connections (e.g., close the
requests.Session()/ adapter, or disable reuse), or implement DNS caching/overrides outside ofrequests/urllib3(OS-level caching or custom resolver approaches). It’s commonly noted that requests/urllib3 themselves don’t cache DNS. [3][4]
Sources:
[1] urllib3.util.connection.create_connection uses socket.getaddrinfo() (no library DNS cache) — https://dokk.org/documentation/urllib3/1.6/_modules/urllib3/util/connection/ (dokk.org)
[2] Requests docs: keep-alive is automatic within a Session (connection pooling/reuse) — https://docs.python-requests.org/_/downloads/en/v2.7.0/pdf/ (docs.python-requests.org)
[3] AWS re:Invent slide explicitly stating requests/urllib3 don’t cache DNS — https://reinvent.awsevents.com/content/dam/reinvent/2024/slides/svs/SVS218-NEW_Accelerate-Python-and-NET-Lambda-functions-with-SnapStart.pdf (reinvent.awsevents.com)
[4] Discussion noting requests (via urllib3) calls getaddrinfo for hostname resolution and suggests OS/app-level DNS caching if needed — https://stackoverflow.com/questions/45479393/connection-error-while-using-requests-to-get-response-from-google-distance-matri (stackoverflow.com)
Citations:
- 1: https://dokk.org/documentation/urllib3/1.6/_modules/urllib3/util/connection/?utm_source=openai
- 2: https://docs.python-requests.org/_/downloads/en/v2.7.0/pdf/?utm_source=openai
- 3: https://reinvent.awsevents.com/content/dam/reinvent/2024/slides/svs/SVS218-NEW_Accelerate-Python-and-NET-Lambda-functions-with-SnapStart.pdf?utm_source=openai
- 4: https://stackoverflow.com/questions/45479393/connection-error-while-using-requests-to-get-response-from-google-distance-matri?utm_source=openai
The SSRF guard is bypassable via DNS rebinding—a real TOCTOU window exists in the validation approach.
validate_url_ip() resolves the hostname via socket.getaddrinfo() to check the IP, but requests.get() must resolve the same hostname again when opening the socket. Because urllib3 (underlying requests) performs no DNS caching and each socket creation triggers an independent resolver call, an attacker-controlled DNS server can return a public IP during validation, then serve a private IP on the next lookup—allowing the actual request to hit internal addresses. This same TOCTOU gap applies to each redirect hop (line 106 validates, line 108-110 requests). A proper fix requires either binding the socket connect to the pre-validated IP address or enforcing egress controls outside the application layer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/plane/bgtasks/work_item_link_task.py` around lines 91 - 107, The
current validate_url_ip() check is vulnerable to DNS rebinding TOCTOU because
requests/resolver is called again when opening sockets; fix by making the HTTP
connection use the pre-resolved IP instead of letting requests re-resolve
hostnames: resolve the hostname once (reuse validate_url_ip() to return the
vetted IP for each URL/hop), and then perform requests by connecting to that IP
while setting the original hostname in the Host header (or using a custom
TransportAdapter/ConnectionPool that calls create_connection with the resolved
IP) so the socket connect cannot re-resolve; apply this for the initial fetch
and for each redirect hop in work_item_link_task.py (use the resolved_ip
returned by validate_url_ip() when issuing requests.get and when following
redirects), or alternatively enforce egress firewall rules outside the app if
binding to IP is not possible.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/api/plane/tests/unit/bg_tasks/test_work_item_link_task.py (1)
126-126: Minor: Add trailing newline.The file should end with a newline character for POSIX compliance and to avoid diff noise in future commits.
📝 Suggested fix
assert response is final_resp assert not response.is_redirect +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/plane/tests/unit/bg_tasks/test_work_item_link_task.py` at line 126, Add a POSIX trailing newline to the end of the test file containing the assertion "assert not response.is_redirect" (test_work_item_link_task.py) by ensuring the file ends with a single '\n' character so editors and git show no spurious diffs; simply append the newline after the final line and save.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/api/plane/tests/unit/bg_tasks/test_work_item_link_task.py`:
- Line 126: Add a POSIX trailing newline to the end of the test file containing
the assertion "assert not response.is_redirect" (test_work_item_link_task.py) by
ensuring the file ends with a single '\n' character so editors and git show no
spurious diffs; simply append the newline after the final line and save.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 582b6412-227a-4906-92ea-6d6e6eda63bf
📒 Files selected for processing (2)
apps/api/plane/bgtasks/work_item_link_task.pyapps/api/plane/tests/unit/bg_tasks/test_work_item_link_task.py
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/api/plane/bgtasks/work_item_link_task.py
Summary
safe_get()function that validates every redirect hop against private/internal IPs before following itrequests.get()call infetch_and_encode_favicon()which followed redirects blindly, allowing attackers to redirect favicon URLs to private IPs and exfiltrate internal data as base64Test plan
<link>tag pointing to a URL that redirects to a private IP is blockedSummary by CodeRabbit
Bug Fixes
Tests