Skip to content

fix(install): honour REQUESTS_CA_BUNDLE in package validator#911

Merged
danielmeppiel merged 6 commits intomicrosoft:mainfrom
abi-jey:main
Apr 26, 2026
Merged

fix(install): honour REQUESTS_CA_BUNDLE in package validator#911
danielmeppiel merged 6 commits intomicrosoft:mainfrom
abi-jey:main

Conversation

@abi-jey
Copy link
Copy Markdown
Contributor

@abi-jey abi-jey commented Apr 24, 2026

Description

Behind corporate TLS-intercepting proxies, users often set REQUESTS_CA_BUNDLE (or SSL_CERT_FILE) to point at their org's CA. The rest of APM uses requests and honours REQUESTS_CA_BUNDLE correctly -- but the package validator was the one outlier: it used stdlib urllib, which ignores it. Result: validation failed with a misleading "package not accessible" error, sending users down the PAT troubleshooting path instead of fixing CA trust.

Fix: replace urllib with requests at this single point so the whole CLI has unified HTTP usage and REQUESTS_CA_BUNDLE is honoured everywhere. On SSLError, log a CA-trust hint under --verbose and skip the auth-error context render.

Fixes # (issue)

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass
  • Added tests for new functionality (if applicable)

The validator probed the GitHub API via stdlib urllib, which only honours SSL_CERT_FILE. Behind a TLS-intercepting proxy this surfaced as a misleading "package not accessible" error, sending users down the PAT troubleshooting path instead of the CA-trust one.

Switch both _check_repo helpers to requests.get, catch SSLError explicitly, log a CA-trust hint to --verbose, and skip the auth-error context render on TLS failures.
Copilot AI review requested due to automatic review settings April 24, 2026 17:45
@abi-jey
Copy link
Copy Markdown
Contributor Author

abi-jey commented Apr 24, 2026

@abi-jey please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@abi-jey abi-jey requested a review from Copilot April 24, 2026 18:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread src/apm_cli/install/validation.py Outdated
Comment on lines +52 to +55
# requests.exceptions.SSLError is the canonical type once we stop
# using urllib; check by name to avoid importing requests at module
# import time.
if type(cur).__name__ == "SSLError":
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLS classification checks type(cur).__name__ == "SSLError", which can misclassify unrelated exception types that happen to share that class name (leading to skipping auth error context incorrectly). Since this module already imports requests, use a precise check (e.g., isinstance(cur, requests.exceptions.SSLError)) and update/remove the comment about avoiding a requests import at module import time (it’s no longer accurate here).

Suggested change
# requests.exceptions.SSLError is the canonical type once we stop
# using urllib; check by name to avoid importing requests at module
# import time.
if type(cur).__name__ == "SSLError":
# requests.exceptions.SSLError is the canonical TLS verification error
# type for requests-based probes.
if isinstance(cur, requests.exceptions.SSLError):

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +84
with patch(
"requests.get",
side_effect=requests.exceptions.SSLError("CERTIFICATE_VERIFY_FAILED"),
):
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching "requests.get" works but is less robust than patching the call site used by the code under test. Prefer patching apm_cli.install.validation.requests.get so the test remains correct even if validation.py changes how it imports/aliases requests.

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +169
def test_validation_module_does_not_import_urllib_request_urlopen(self):
from pathlib import Path

src = Path(validation.__file__).read_text(encoding="utf-8")
# Forbid the specific call form; importing urllib for other reasons
# remains acceptable.
assert "urllib.request.urlopen" not in src, (
"install/validation.py must use 'requests' for HTTP probes so it "
"honours REQUESTS_CA_BUNDLE the same way the rest of the codebase "
"does. Replace urllib.request.urlopen with requests.get."
)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression guard is brittle because it relies on substring matching in source text (it can miss equivalent forms like from urllib.request import urlopen, and can false-trigger if mentioned in a comment or docstring). Consider replacing it with an AST-based check for urllib.request.urlopen usage, or a runtime behavior assertion (e.g., patch urllib.request.urlopen to fail and exercise _validate_package_exists to ensure it’s never called).

Suggested change
def test_validation_module_does_not_import_urllib_request_urlopen(self):
from pathlib import Path
src = Path(validation.__file__).read_text(encoding="utf-8")
# Forbid the specific call form; importing urllib for other reasons
# remains acceptable.
assert "urllib.request.urlopen" not in src, (
"install/validation.py must use 'requests' for HTTP probes so it "
"honours REQUESTS_CA_BUNDLE the same way the rest of the codebase "
"does. Replace urllib.request.urlopen with requests.get."
)
def test_validation_module_does_not_call_urllib_request_urlopen(self):
resolver = MagicMock()
fake_resp = MagicMock(ok=True, status_code=200, reason="OK")
with patch(
"urllib.request.urlopen",
side_effect=AssertionError(
"install/validation.py must use 'requests' for HTTP probes so it "
"honours REQUESTS_CA_BUNDLE the same way the rest of the codebase "
"does. Replace urllib.request.urlopen with requests.get."
),
) as urlopen_mock, patch("requests.get", return_value=fake_resp):
result = validation._validate_package_exists(
"octocat/hello-world", verbose=False, auth_resolver=resolver
)
assert result is True
urlopen_mock.assert_not_called()

Copilot uses AI. Check for mistakes.
Comment thread src/apm_cli/install/validation.py Outdated
verbose_log(
" hint: the CA bundle does not trust the certificate presented "
"(common behind a corporate TLS-intercepting proxy). "
"Set REQUESTS_CA_BUNDLE to your organisation's CA bundle and retry."
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verbose hint only mentions REQUESTS_CA_BUNDLE, but the PR description also calls out SSL_CERT_FILE as a common alternative. Consider mentioning both environment variables in the hint to reduce user troubleshooting time in environments that standardize on SSL_CERT_FILE.

Suggested change
"Set REQUESTS_CA_BUNDLE to your organisation's CA bundle and retry."
"Set REQUESTS_CA_BUNDLE or SSL_CERT_FILE to your organisation's "
"CA bundle and retry."

Copilot uses AI. Check for mistakes.
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 25, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with two minor pre-merge fixes)


Per-persona findings

Python Architect: This is a routine bug-fix PR (no class hierarchy changes). The module is purely procedural -- no OO structure is added or changed; two new module-level helpers (_is_tls_failure, _log_tls_failure) and an update to two inner closures (_check_repo, _check_repo_fallback).

1. OO / class diagram

The PR is purely procedural -- no class changes anywhere in scope. Diagram shows module boundaries and function entry points, annotated with patterns where they apply.

classDiagram
    direction LR
    class validation_module {
        <<IOBoundary>>
        +_validate_package_exists(package, verbose, auth_resolver, logger) bool
        -_check_repo(token, git_env) bool
        -_check_repo_fallback(token, git_env) bool
    }
    class tls_helpers {
        <<Pure>>
        +_is_tls_failure(exc) bool
        +_log_tls_failure(host_display, exc, verbose_log) None
    }
    class AuthResolver {
        <<Strategy>>
        +classify_host(host) HostInfo
        +try_with_fallback(host, op, ...) any
        +build_error_context(host, ...) str
    }
    class HostInfo {
        <<ValueObject>>
        +api_base str
        +display_name str
    }
    validation_module ..> tls_helpers : uses
    validation_module ..> AuthResolver : delegates auth
    AuthResolver ..> HostInfo : returns
    note for tls_helpers "New in this PR: classifies SSLError vs auth\nfailures so users get the right hint"
Loading

tls_helpers is the touched surface (new functions). validation_module inner closures are updated but the module boundary is unchanged.

2. Execution flow diagram

flowchart TD
    A["_validate_package_exists(package, ...)"] --> B["[NET] auth_resolver.classify_host(host)"]
    B --> C["[NET] auth_resolver.try_with_fallback(host, _check_repo, unauth_first=True)"]
    C --> D["[NET] requests.get(api_url, headers, timeout=15)"]
    D --> E{Response?}
    E -->|SSLError| F["_log_tls_failure(host_display, exc, verbose_log)\nVERBOSE ONLY"]
    F --> G["raise RuntimeError('TLS verification failed ...')"]
    G --> H["outer: except Exception as exc:"]
    H --> I{"_is_tls_failure(exc)?"}
    I -->|True| J["return False -- skips build_error_context"]
    I -->|False| K["auth_resolver.build_error_context(...)\nlog auth hint"]
    K --> L["return False"]
    E -->|resp.ok == True| M["return True"]
    E -->|404 + token| N["raise RuntimeError('API returned 404')"]
    N --> H
    E -->|other 4xx/5xx| O["raise RuntimeError('API returned N: reason')"]
    O --> H
Loading

3. Design patterns

Design patterns

  • Used in this PR: none -- straight-line procedural code with exception-type discrimination via a predicate function, appropriate for the scope.
  • Pragmatic suggestion: none -- the current shape is the simplest correct design at this scope.

One required fix (Architect): _is_tls_failure contains a stale comment at validation.py (the type(cur).__name__ == "SSLError" branch): "check by name to avoid importing requests at module import time". Since import requests is now a top-level import, the justification is gone and the comment is misleading. Either replace the type-name check with isinstance(cur, requests.exceptions.SSLError) or drop the comment.


CLI Logging Expert: Output paths are correct. _log_tls_failure gates on if not verbose_log: return -- consistent with the existing verbose-callback pattern in validation.py. The CA-trust hint "Set REQUESTS_CA_BUNDLE to your organisation's CA bundle and retry" is actionable. Naming REQUESTS_CA_BUNDLE explicitly is acceptable here -- it is a standard Python/OS env var, not an APM-specific token env var (which would require build_error_context). No direct _rich_* usage; no CommandLogger anti-patterns.


DevX UX Expert: Before: proxy users got "package not accessible" -> PAT troubleshooting rabbit hole. After: with --verbose, they get a precise CA-trust hint and the right next action. This is a meaningful improvement. However, the hint is gated behind --verbose. At default verbosity, a user hitting a TLS failure still only sees "package not accessible" with no actionable next step. Since TLS failures are rare and always require one specific action (REQUESTS_CA_BUNDLE), showing the hint unconditionally would further reduce friction for exactly the users who most need it. This is flagged as a follow-up (see Optional follow-ups below) -- the PR's current approach is a net improvement over the status quo, and scope-creep risk argues against changing it now.

No CLI command surface changes; no cli-commands.md update needed.


Supply Chain Security Expert: Net positive. requests honours REQUESTS_CA_BUNDLE via certifi; urllib.request.urlopen does not. This removes a class of forced-insecure workarounds in corporate environments. Token flow (Authorization: Bearer {token}) is unchanged. TLS failure returns False (fail-closed for package validation -- correct behavior). The seen < 8 bound in _is_tls_failure prevents loop DoS on self-referential exception chains (test test_is_tls_failure_bounded_chain_walk confirms this). Exception chain walk via cur.__cause__ or cur.__context__ is the standard Python idiom. No new attack surfaces.


Auth Expert: Activated -- src/apm_cli/install/validation.py is a fast-path trigger file.

Auth flow is fully preserved. auth_resolver.try_with_fallback() call signature unchanged (unauth_first=True, same verbose_callback). Authorization: Bearer {token} header construction unchanged. auth_resolver.classify_host(host) unchanged. host_info.api_base / host_info.display_name usage unchanged. TLS RuntimeError propagates through try_with_fallback and is caught by the outer handler -- same propagation path as before for non-HTTP exceptions. build_error_context is correctly skipped on TLS failures (TLS is not an auth problem; PAT guidance would misdirect users). No regression to AuthResolver token-precedence chain. No credential leakage in verbose output (exception message only, never token value).

Minor observation (not a blocker): if try_with_fallback retries on generic RuntimeError, _log_tls_failure could emit duplicate verbose lines. The tests mock try_with_fallback with a single-call shim so this path is not exercised. Worth a quick inspection of try_with_fallback's retry logic against RuntimeError.


OSS Growth Hacker: Corporate proxy compatibility is a silent enterprise adoption gate -- teams behind TLS-intercepting proxies hitting "package not accessible" errors churn silently without filing issues. This fix unblocks a real enterprise segment. Story angle: "APM now works out-of-the-box in corporate environments with TLS inspection." Missing: no CHANGELOG.md entry in the diff. The CHANGELOG is a key conversion surface -- existing users who hit this bug and moved on won't return unless they see it in release notes.

Side-channel to CEO: A "Corporate proxy / REQUESTS_CA_BUNDLE" troubleshooting doc entry would capture enterprise search traffic and convert frustrated evaluators. Low effort, high leverage for the enterprise adoption story.


CEO arbitration

The change is well-scoped, well-tested, and materially improves APM's story for enterprise users. All specialists agree on correctness and security posture. Two items are non-negotiable before merge: the missing CHANGELOG entry (repo convention, every code-change PR must have one) and the dead comment (misleads future contributors about why the type-name check exists). The DevX UX gap -- CA hint only under --verbose -- is real but the right call is to ship the current improvement and address it as a follow-up; adding unconditional output risks confusing non-proxy users and is out of scope for a bug fix. The growth hacker's docs suggestion (corporate proxy troubleshooting page) is the right complement to this PR and worth tracking as a follow-up issue.

Disposition ratified: APPROVE with two minor pre-merge fixes.


Required actions before merge

  1. Add a CHANGELOG.md entry under ## [Unreleased] > Fixed. Suggested line: "Fixed TLS validation failure behind corporate TLS-intercepting proxies: install/validation.py now uses requests (honouring REQUESTS_CA_BUNDLE) instead of stdlib urllib, and logs a CA-trust hint under --verbose instead of a misleading auth error (fix(install): honour REQUESTS_CA_BUNDLE in package validator #911)."
  2. Fix stale comment in _is_tls_failure (src/apm_cli/install/validation.py, the type(cur).__name__ == "SSLError" branch): remove or update the "check by name to avoid importing requests at module import time" comment -- requests is now a top-level import, so this justification is no longer accurate. Replacing the name check with isinstance(cur, requests.exceptions.SSLError) is the cleanest fix.

Optional follow-ups

  • Consider emitting the CA-trust hint at default verbosity (not just --verbose) when a TLS failure is detected -- TLS failures are rare, always actionable, and the one-liner "Set REQUESTS_CA_BUNDLE ..." would save the most common class of confused enterprise users from having to re-run with --verbose.
  • Add a troubleshooting docs entry ("APM behind a corporate proxy / TLS-intercepting proxy") in docs/src/content/docs/ covering REQUESTS_CA_BUNDLE setup. Low-effort, high-leverage for enterprise adoption and search discoverability.
  • Audit auth_resolver.try_with_fallback retry logic: confirm whether a RuntimeError from _check_repo triggers a retry attempt (which would cause duplicate _log_tls_failure verbose output).

Generated by PR Review Panel for issue #911 · ● 369.4K ·

Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required actions before merge

  • Add a CHANGELOG.md entry under ## [Unreleased] > Fixed. Suggested line: "Fixed TLS validation failure behind corporate TLS-intercepting proxies: install/validation.py now uses requests (honouring REQUESTS_CA_BUNDLE) instead of stdlib urllib, and logs a CA-trust hint under --verbose instead of a misleading auth error (fix(install): honour REQUESTS_CA_BUNDLE in package validator #911)."
  • Fix stale comment in _is_tls_failure (src/apm_cli/install/validation.py, the type(cur).name == "SSLError" branch): remove or update the "check by name to avoid importing requests at module import time" comment -- requests is now a top-level import, so this justification is no longer accurate. Replacing the name check with isinstance(cur, requests.exceptions.SSLError) is the cleanest fix.
  • Consider emitting the CA-trust hint at default verbosity (not just --verbose) when a TLS failure is detected -- TLS failures are rare, always actionable, and the one-liner "Set REQUESTS_CA_BUNDLE ..." would save the most common class of confused enterprise users from having to re-run with --verbose.
  • Add a troubleshooting docs entry ("APM behind a corporate proxy / TLS-intercepting proxy") in docs/src/content/docs/ covering REQUESTS_CA_BUNDLE setup. Low-effort, high-leverage for enterprise adoption and search discoverability.
    Audit auth_resolver.try_with_fallback retry logic: confirm whether a RuntimeError from _check_repo triggers a retry attempt (which would cause duplicate _log_tls_failure verbose output).

abi-jey and others added 3 commits April 26, 2026 16:22
isinstance(SSLError) check, dedup TLS hint across retry chain, surface
CA-trust hint at default verbosity, drop dead fallback, tighten verbose
output, add troubleshooting docs page and CHANGELOG entry.

Assisted-by: Claude:claude-sonnet-4-6
@abi-jey
Copy link
Copy Markdown
Contributor Author

abi-jey commented Apr 26, 2026

@danielmeppiel I have updated the change log, fixed stale comment, added two hints, one verbose and one liner on TLS failure, but confirmed that retries will still be done, but hints will appear once, added a docs section on troubleshooting and moved the TLS validation error so it.

@abi-jey abi-jey requested a review from danielmeppiel April 26, 2026 18:27
@danielmeppiel danielmeppiel added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 26, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (one optional follow-up noted below)


Per-persona findings

Python Architect:

This is a routine bug-fix PR. No classes are added or modified anywhere in scope -- validation.py is a pure-functions module. The problem-space diagram below shows module boundaries and function entry points, annotated with their roles.

1. OO / class diagram

classDiagram
    direction LR
    class validation {
        <<Module>>
        <<IOBoundary>>
        +_validate_package_exists(package, verbose, auth_resolver, logger) bool
        +_is_tls_failure(exc) bool
        +_log_tls_failure(host_display, exc, verbose_log, logger) None
    }
    class AuthResolver {
        <<Strategy>>
        +try_with_fallback(host, op, ...) Any
        +classify_host(host) HostInfo
        +build_error_context(...) str
    }
    class HostInfo {
        <<ValueObject>>
        +api_base str
        +display_name str
        +kind str
    }
    class requests_lib {
        <<ExternalLibrary>>
        +get(url, headers, timeout) Response
        exceptions_SSLError
        exceptions_RequestException
    }
    validation ..> AuthResolver : inject
    validation ..> requests_lib : uses
    AuthResolver ..> HostInfo : returns
    note for validation "touched: _is_tls_failure + _log_tls_failure added\n_check_repo + _check_repo_fallback migrated to requests"
    class validation:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution flow diagram

flowchart TD
    A["apm install\n_validate_package_exists()"] --> B{"URL parse\nsucceeds?"}
    B -- yes --> C["auth_resolver.try_with_fallback\n(host, _check_repo, unauth_first=True)"]
    B -- no --> D["auth_resolver.try_with_fallback\n(host, _check_repo_fallback, unauth_first=True)"]
    C --> E["[NET] requests.get(api_url, headers, timeout=15)"]
    D --> F["[NET] requests.get(api_url, headers, timeout=15)"]
    E -- SSLError --> G["raise RuntimeError\n('TLS verification failed for host')"]
    F -- SSLError --> G
    E -- ok 200 --> H["return True"]
    F -- ok 200 --> H
    E -- 404+token --> I["raise RuntimeError(status_code)"]
    F -- 4xx --> I
    G --> J{"outer except\n_is_tls_failure(exc)?"}
    I --> J
    J -- YES --> K["_log_tls_failure()\nlogger.warning: CA-trust hint (always-on)\nverbose_log: host + exc text (if verbose)"]
    K --> L["return False"]
    J -- NO --> M["build_error_context / verbose log\nreturn False"]
Loading

Design patterns

  • Used in this PR: none -- straight-line procedural code, appropriate for the scope. _is_tls_failure / _log_tls_failure are a clean detection/logging separation but do not rise to a named pattern at this size.
  • Pragmatic suggestion: none -- the current shape is the simplest correct design at this scope. Extracting a TlsErrorClassifier class would be over-engineering for two functions.

Architecture assessment: clean. The bounded chain walk (max 8) and self-referential guard in _is_tls_failure are correct. Failure path exits cleanly without leaking exception internals to the user.


CLI Logging Expert: Correct use of the output architecture throughout.

  • logger.warning(...) for the CA-trust hint: always-on, correct per the traffic-light rule (yellow = should know, actionable).
  • verbose_log(f"underlying error from {host_display}: {exc}") behind the if verbose_log: guard: correct -- the host/exception detail is agent-level signal, not user-level.
  • The verbose_log callable is None when verbose=False, and the guard handles that without branching at the call site.
  • Hardcoding REQUESTS_CA_BUNDLE in the warning message is the right call here -- AuthResolver.build_error_context exists for auth-token errors; it has no concept of CA bundles and would route the user down the wrong troubleshooting path.
  • Message passes the "So What?" test: user reads it, knows the exact variable to set, and has a docs URL for OS-specific steps.
  • ASCII-safe throughout -- no emojis, no Unicode.

One nuance worth noting: build_error_context is deliberately skipped when _is_tls_failure returns True. This is correct; calling it would surface PAT/SSO advice for what is purely a CA-trust configuration problem.


DevX UX Expert: Significant UX improvement for enterprise users.

The previous behavior pushed users into the auth troubleshooting funnel for a problem that had nothing to do with credentials -- a classic "misleading failure mode" anti-pattern. The fix addresses this correctly:

  1. Recovery: The warning is visible at default verbosity (no --verbose needed). The message names the env var, the fix, and the docs URL in one line. That is one copy-paste away.
  2. Docs page (troubleshooting/ssl-issues.md): clean, OS-specific code blocks for Linux/macOS and Windows/PowerShell, with the --verbose fallback for non-proxy cases.
  3. No new CLI surface: no flags or commands changed, so docs/reference/cli-commands.md does not need updating.
  4. Familiarity: REQUESTS_CA_BUNDLE is the standard env var known to pip users. Pointing to it matches developer expectations perfectly.

The "Not behind a proxy or firewall?" section in the docs page correctly redirects users to --verbose rather than dead-ending them.


Supply Chain Security Expert: No security regressions; net improvement.

  1. TLS not downgraded: requests.get(api_url, headers=headers, timeout=15) uses verify=True by default. No verify=False anywhere in the diff.
  2. Fail-closed: SSLError -> RuntimeError -> _is_tls_failure -> return False. Package validation fails; install does not proceed.
  3. Credential handling: Authorization: Bearer {token} header path is unchanged from the urllib version. requests does not log auth headers.
  4. _TLS_ERROR_PREFIX substring match: An adversary who controls the package repo URL cannot manipulate this to force a True return -- the function always returns False on TLS match. At worst they suppress the auth context hint; they cannot make validation pass.
  5. Chain walk bound: seen < 8 prevents infinite loops on self-referential exception chains. The self-referential test in the test suite confirms this.
  6. Dependency surface: requests is already a declared dependency of APM. No new transitive surface introduced.

Auth Expert: AuthResolver flow is fully preserved.

  1. try_with_fallback(host, _check_repo, ..., unauth_first=True) is unchanged in both code paths. Token precedence (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill) is not touched.
  2. build_error_context bypass on TLS: correct. Showing PAT/SSO troubleshooting for a CA-trust failure would be actively harmful. The bypass is intentional and well-commented.
  3. Single-warning guarantee: _log_tls_failure is called only in the outer except block, after try_with_fallback has exhausted its retry chain. Even if try_with_fallback retries _check_repo three times (unauth + PAT + credential-fill), each raising the same SSLError, the outer except fires once. The regression test test_ssl_error_logs_hint_exactly_once_when_token_present verifies this explicitly.
  4. Scope note (not a bug): the fallback path at line 449 passes raw host to _log_tls_failure while the main path at line 392 passes host_info.display_name. This is correct -- host_info is scoped inside the _check_repo_fallback closure, not available in the outer except. Both values are the same hostname string for the common case.

OSS Growth Hacker: High-signal enterprise adoption unlock.

Corporate-proxy TLS failures are the #1 silent killer for "APM doesn't work at my company" moments. Zscaler, Netskope, and Palo Alto collectively cover the majority of enterprise network environments. This fix removes a first-day blocker with no user-visible config changes required (beyond setting one env var they likely already know from pip).

Specific conversion wins:

  • The new troubleshooting/ssl-issues.md page is a searchable SEO target for "apm install TLS error corporate proxy" -- a query that generates zero results today.
  • The docs URL embedded in the warning message creates a direct error-to-fix feedback loop.
  • CHANGELOG entry is story-shaped and reusable in a release note.

Side-channel to CEO: this is worth a dedicated callout in the release note for the next minor -- "APM now works out of the box behind corporate TLS-inspecting proxies (Zscaler, Netskope, Palo Alto). Set REQUESTS_CA_BUNDLE and you're done." TLS fixes travel fast in corporate IT Slack channels.


CEO arbitration

All five specialists plus the Auth Expert reviewed this PR and reached the same conclusion independently: the change is correct, well-scoped, and a net improvement on every axis. There are no disagreements to arbitrate. The Growth Hacker's side-channel is noted -- the enterprise adoption narrative is real and the next release note should lead with it. The CHANGELOG entry already has the right framing; no change needed before merge. This is a textbook targeted bug fix: narrow diff, comprehensive tests (7 test cases covering detection, default verbosity, verbose detail, single-emission guarantee, auth-context bypass, 404 regression, and 200 happy path), matching docs, and a clean CHANGELOG entry. APPROVE.


Required actions before merge

  1. None. The PR is ready to merge as-is.

Optional follow-ups

  • _log_tls_failure's verbose_log and logger parameters lack type hints (Optional[Callable[[str], None]] and CommandLogger respectively). The repo convention requires type hints on all public APIs; adding them in a follow-up would bring the new helpers into full compliance with **/*.py instructions.
  • The comment at line 448 (# See note above: logged once here, skip auth context render.) references a "note above" that does not exist as an explicit comment at line 392. A brief symmetric comment at the main-path _log_tls_failure call would make the intent self-documenting for future readers.
  • Consider adding the new troubleshooting/ssl-issues.md page URL to the project's sitemap or 404 redirect config if the docs site has one, so users who land on expired links from search engines find it.

Generated by PR Review Panel for issue #911 · ● 867.9K ·

@danielmeppiel danielmeppiel enabled auto-merge April 26, 2026 19:19
@danielmeppiel danielmeppiel added this pull request to the merge queue Apr 26, 2026
Merged via the queue into microsoft:main with commit 1418b06 Apr 26, 2026
9 checks passed
@danielmeppiel danielmeppiel mentioned this pull request Apr 27, 2026
3 tasks
danielmeppiel added a commit that referenced this pull request Apr 27, 2026
* chore(release): cut 0.9.4

CHANGELOG entry for 0.9.4 covers all 7 PRs merged since v0.9.3:

- #974 SKILL_BUNDLE day-0 install parity (Added)
- #954 automate apm-triage-panel workflow (Added)
- #970 python-architect mermaid classDiagram trap (Changed)
- #911 REQUESTS_CA_BUNDLE TLS validation (Fixed)
- #971 triage-panel project-sync dispatch (Fixed)
- #910 CLI consistency cleanup (Fixed)
- #958 issue templates label taxonomy (Fixed)
- #953 docs auto-deploy after bot-cut releases (Fixed)

Open milestone 0.9.4 issues (41) reassigned to 0.9.5.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(changelog): tighten 0.9.4 entries (so-what for developers)

Refactor per Keep-a-Changelog spirit: lead with developer impact,
trim agent-internals prose, group maintainer-only changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(changelog): add #660 install.sh air-gapped entry to 0.9.4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants