Skip to content

Tests: scrub uv lock-contaminating env so the runner doesn't mutate template uv.lock#206

Open
dhruv0811 wants to merge 5 commits intomainfrom
ci/uv-sync-frozen
Open

Tests: scrub uv lock-contaminating env so the runner doesn't mutate template uv.lock#206
dhruv0811 wants to merge 5 commits intomainfrom
ci/uv-sync-frozen

Conversation

@dhruv0811
Copy link
Copy Markdown
Contributor

@dhruv0811 dhruv0811 commented Apr 30, 2026

CI: 8/8 greenrun 25148355169.
Local: green — tests still run and pass!

Summary

Fixes a nightly CI regression after databricks/app-templates#201 switched all templates to checked-in uv.lock files (so the Apps runtime now runs uv sync --locked during the build phase).

Two pieces of runner-workflow uv config were leaking into every uv run / uv sync the test framework spawned against a template subdir, rewriting the template's uv.lock with resolution-context metadata the Apps build environment doesn't share:

  1. UV_EXCLUDE_NEWER (workflow-wide env var) — bakes a global cutoff into the lock; deploy fails with Ignoring existing lockfile due to removal of global exclude newer.
  2. The runner repo's uv.toml (auto-discovered by uv walking up to the workspace root) — its [exclude-newer-package] entries bake per-package cutoffs in; deploy fails with Ignoring existing lockfile due to removal of exclude newer for package 'databricks-openai'.

End users running bundle deploy from their own shell don't have either configured, so the test environment was diverging from the user-facing path.

Fix

Strip both at module-import time in helpers.py:

os.environ.pop("UV_EXCLUDE_NEWER", None)
os.environ.pop("UV_CONFIG_FILE", None)
os.environ["UV_NO_CONFIG"] = "1"  # disable walk-up discovery of the runner's uv.toml

UV_NO_CONFIG=1 is needed alongside the pop because uv finds uv.toml via cwd ancestry, not just the env var. Pytest imports helpers.py after the workflow's Sync test dependencies step has already run, so the runner's own deps were pinned with full config in scope; from there on the env vars only matter as accidental contamination.

Test plan

  • CI 8/8 green in run 25148355169 on the same workflow that was failing nightly.
  • Failure reproduced before fix on identical workflow (run 25147991251).
  • Local testing done as well

#201 switched templates from `requirements.txt` to checked-in `uv.lock`
files (and removed `uv.lock` from .gitignore). The Databricks Apps
runtime now runs `uv sync --locked` against that lockfile during the
build step.

The integration-test runner sets `UV_EXCLUDE_NEWER` to pin third-party
churn in the runner's own deps. That env var also applied to the
runner's `setup:uv-sync` step on the template subdir — `uv sync`
re-resolved under the cutoff and rewrote the template's `uv.lock`
with `excluded-newer = 2026-03-19T...` baked into its metadata.
Bundle deploy then uploaded the contaminated lock, and the Apps
runtime — which doesn't have UV_EXCLUDE_NEWER set — detected the
cutoff was removed, ignored the lock, re-resolved to a different
package set, and failed with:

    Ignoring existing lockfile due to removal of global exclude newer
    The lockfile at `uv.lock` needs to be updated, but `--locked`
    was provided.

Repro'd in nightly run 25145086773
(agent-non-conversational, 2026-04-30).

Switching to `uv sync --frozen` installs straight from the lock
without re-resolving, so the on-disk lockfile stays byte-identical
to what's checked in. Deploy then uploads the same lock that every
end user gets via `bundle deploy`, and the Apps runtime build
succeeds against an unmodified lock — matching the post-#201 flow
exactly.

Co-authored-by: Isaac
Iterating on the previous --frozen-only fix. `uv sync --frozen` keeps
the lockfile pristine, but every other uv invocation — `uv run
quickstart`, `uv run start-server`, `uv run agent-evaluate` — does
its own auto-sync by default and STILL re-resolves under
UV_EXCLUDE_NEWER, rewriting the template's uv.lock with
`excluded-newer` baked in.

CI run 25147080149 confirmed: setup:uv-sync ran cleanly with --frozen
(10.6s), then setup:quickstart's `uv run quickstart` (13.6s) re-synced
and contaminated the lock again. Bundle deploy uploaded the
contaminated lockfile. Apps runtime's `uv sync --locked` rejected it
identically to the original failure.

Fix: scrub UV_EXCLUDE_NEWER from the environment of every `uv`
subprocess call in `_run_cmd`. The runner's own deps still get the
cutoff applied via the workflow's `Sync test dependencies` shell step
(which is the only place the cutoff is meant to live). Template-side
uv calls now run with whatever cutoff end users have — none — matching
the end-user `bundle deploy` flow exactly.

Keep `--frozen` on uv_sync as belt-and-suspenders: even if a future
change forgets to scrub, --frozen makes uv error loudly instead of
silently rewriting the lock.

Co-authored-by: Isaac
Iterating again. CI run 25147595689 confirmed the previous fix was
incomplete: scrubbing UV_EXCLUDE_NEWER in `_run_cmd` only covered uv
calls that went through that helper. `_start_server_once` uses
`subprocess.Popen` directly with `["uv", "run", "start-server", ...]`
and bypasses _run_cmd, so the local-server thread inherited the
workflow's env unchanged and re-synced the lockfile in parallel with
the deploy thread. Deploy then uploaded the freshly-contaminated lock
and Apps runtime rejected it identically.

Move the scrub to module-import time: `os.environ.pop("UV_EXCLUDE_NEWER",
None)` at the top of helpers.py. Every subprocess pytest spawns from
that point on inherits the cleaned env, regardless of whether it goes
through _run_cmd, subprocess.Popen, or anywhere else.

Safe to do at import time: pytest imports helpers AFTER the workflow's
"Sync test dependencies" step has already run, so the runner's own
deps were pinned with UV_EXCLUDE_NEWER applied. From this point on the
env var only mattered as accidental contamination of template-side
calls — which is exactly what we don't want.

Also drop the --frozen flag from uv_sync. With the env var scrubbed,
uv has no cutoff to bake in, so the lockfile stays clean naturally.
The --frozen / scrub-in-_run_cmd combination was inconsistent
(--frozen only on uv_sync, not on the dozen `uv run` calls
elsewhere); the import-time scrub covers them all uniformly.

Co-authored-by: Isaac
The previous scrub of UV_EXCLUDE_NEWER alone wasn't sufficient.
CI run 25147991251 deploy logs show two distinct contamination
messages, not one:

  [BUILD] Ignoring existing lockfile due to removal of global exclude newer
  [BUILD] Ignoring existing lockfile due to removal of exclude newer
          for package `databricks-openai`

The first is UV_EXCLUDE_NEWER (already scrubbed). The second comes
from the runner repo's `uv.toml` `[exclude-newer-package]` overrides
for databricks-openai / databricks-ai-bridge / databricks-langchain /
databricks-agents. uv discovers that config file by walking up from
the template subdir to the workspace root — so popping UV_CONFIG_FILE
alone doesn't help.

Fix: pop UV_CONFIG_FILE *and* set UV_NO_CONFIG=1 to disable walk-up
discovery entirely. Now both contamination sources are blocked at the
process level for any subprocess helpers.py spawns, and the
template's checked-in lockfile reaches the Apps runtime byte-identical
to what Bryan committed in #201.

Co-authored-by: Isaac
@dhruv0811 dhruv0811 changed the title Tests: uv_sync --frozen so the runner doesn't mutate the checked-in lock Tests: scrub uv lock-contaminating env so the runner doesn't mutate template uv.lock Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant