fix: prevent starting container if one already exists for check #189

mithro · 2025-12-07T01:52:22Z

Summary

Add does_check_container_exist() function to check for any container (running or stopped) with the check's label
Call this check at the start of _start_container() to abort if a duplicate would be created
Raises ValueError which is caught by existing exception handling

Problem

Race conditions can occur where check_process_job runs multiple times for the same check (e.g., Celery retry), potentially creating duplicate Docker containers.

Solution

Before creating a new container, query Docker for any existing container with the wafer.space.check_id label matching this check. If found, raise an error to abort the duplicate.

Test plan

Verify normal check flow still works
Manually test duplicate prevention by calling check_process_job twice for same check

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Enhanced validation to prevent duplicate containers from being created for the same check. Attempting to start a container that already exists now raises a clear error, improving system reliability and preventing conflicts.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add does_check_container_exist() to verification.py that checks for any container (running or stopped) with the check's label. Called at the start of _start_container() to abort if a duplicate would be created. This prevents race conditions where check_process_job runs multiple times for the same check (e.g., Celery retry). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2025-12-07T01:52:30Z

Walkthrough

These changes add a duplicate container detection mechanism by introducing a new function that checks for existing Docker containers (running or stopped) for a given check, and integrating it as a pre-check in the container startup process to prevent duplicate container creation.

Changes

Cohort / File(s)	Summary
Docker container existence check `wafer_space/projects/verification.py`	Added new exported function `does_check_container_exist(check)` that queries Docker for any container (running or stopped) with label `wafer.space.check_id={check.id}`, returning `True` if found, `False` if none exist, and `None` if Docker is unavailable. Includes logging and proper Docker client cleanup.
Duplicate container prevention `wafer_space/projects/tasks.py`	Imported `does_check_container_exist` and added a pre-check in `_start_container` that raises `ValueError` if a container already exists for the given check. Updated docstring with Raises section and added clarifying code comments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Docker container existence check: Verify that the label query format (wafer.space.check_id) is consistent with existing labeling conventions and that error handling for Docker unavailability is correct.
ValueError behavior: Confirm the ValueError is raised at the appropriate point and that the error message is clear for debugging.

Poem

🐰 A container rises, but wait—
We check if one's already there!
No duplicates to frustrate,
Just clean checks with proper care. ✓

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title directly and clearly summarizes the main change: preventing duplicate container creation by checking if a container already exists before starting a new one.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/prevent-duplicate-containers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds duplicate container detection to prevent race conditions where multiple Celery task retries could create multiple Docker containers for the same manufacturability check.

Introduces does_check_container_exist() function to query for any existing containers (running or stopped) with the check's label
Adds a safety check in _start_container() to abort container creation if a duplicate would be created
Leverages existing exception handling by raising ValueError when duplicates are detected

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
wafer_space/projects/verification.py	Adds new `does_check_container_exist()` function to check for any existing containers (running or stopped) with the check's label, preventing duplicate container creation
wafer_space/projects/tasks.py	Imports and calls the new duplicate check function at the start of `_start_container()`, raising `ValueError` if a container already exists for the check

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-07T01:54:22Z

wafer_space/projects/tasks.py

+    if does_check_container_exist(context.check) is True:
+        msg = f"Container already exists for check {context.check.id}"


The check for is True allows the function to proceed when Docker is unavailable (None return). This means duplicate containers could still be created if Docker connection temporarily fails during the check. Consider either:

Treating None as a failure case: if does_check_container_exist(context.check) is not False:

Or raising an error when Docker is unavailable to prevent any container creation without verification

This ensures containers are only created when we can definitively verify no duplicates exist.

Suggested change

if does_check_container_exist(context.check) is True:

msg = f"Container already exists for check {context.check.id}"

exists = does_check_container_exist(context.check)

if exists is not False:

msg = (

f"Cannot verify container does not already exist for check {context.check.id} (result: {exists})"

)

Copilot · 2025-12-07T01:54:22Z

wafer_space/projects/tasks.py

+    # Safety check: prevent duplicate containers
+    if does_check_container_exist(context.check) is True:
+        msg = f"Container already exists for check {context.check.id}"
+        raise ValueError(msg)


While this check prevents some duplicates, a race condition still exists between checking for containers (line 543) and creating one (later in the function). Two simultaneous calls could both pass the check and create duplicate containers. Consider using Docker labels with a unique constraint or implementing distributed locking (e.g., Redis lock with the check_id as the key) to ensure atomicity between check and create operations.

Copilot · 2025-12-07T01:54:22Z

wafer_space/projects/verification.py

+def does_check_container_exist(check: ManufacturabilityCheck) -> bool | None:
+    """Check if any Docker container exists for this check (running or stopped).
+
+    This prevents creating duplicate containers for the same check.
+    Unlike is_check_container_running(), this checks ALL containers.
+
+    Args:
+        check: ManufacturabilityCheck to verify
+
+    Returns:
+        True if any container found (running or stopped)
+        False if no container found
+        None if Docker unavailable
+    """
+    try:
+        client = docker.from_env()
+    except docker.errors.DockerException:
+        logger.warning(
+            "Failed to connect to Docker - cannot check container for %s",
+            check.id,
+        )
+        return None
+
+    try:
+        containers = client.containers.list(
+            all=True,  # Include stopped containers
+            filters={"label": f"wafer.space.check_id={check.id}"},
+        )
+    except docker.errors.DockerException:
+        logger.exception("Error querying Docker for check %s", check.id)
+        return None
+    finally:
+        if hasattr(client, "close"):
+            client.close()
+
+    if containers:
+        container_info = [(c.short_id, c.status) for c in containers]
+        logger.warning(
+            "Check %s already has container(s): %s",
+            check.id,
+            container_info,
+        )
+        return True
+
+    return False


The new does_check_container_exist() function follows the same pattern as is_check_container_running() but lacks test coverage. Since comprehensive tests exist for the similar function in test_verification.py (lines 670-796), tests should be added to cover:

Container found (returns True)

No container found (returns False)

Docker connection failure (returns None)

Docker query error (returns None)

Multiple containers found (returns True)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

wafer_space/projects/tasks.py (1)

528-546: Duplicate-container guard is a good safety net; note remaining race semantics

The pre-check using does_check_container_exist(context.check) and documenting the ValueError in the docstring nicely prevents starting a new container when Docker already reports one (running or stopped) for this check, while preserving behavior when Docker is unavailable (only True short-circuits).

Two behavioral nuances to be aware of:

There is still a narrow race where two workers can both enter _start_container concurrently, each seeing no existing containers and both calling containers.run(). Preventing that scenario entirely would require some DB-side coordination (e.g., row-level locking or an invariant on “at most one active RUNNING/DISPATCHED check per id”) rather than relying solely on Docker state.

Raising ValueError here means a duplicate attempt will currently mark the check ERROR via the generic ValueError handler, even though another worker may be legitimately processing it. If you prefer duplicates to be treated as “already running elsewhere” rather than an error, you may want a more specific exception or a dedicated early-return path.

If these semantics are acceptable for now, the added guard is still a meaningful improvement; otherwise, consider tightening with a small state-machine/locking adjustment.

Also applies to: 915-916

wafer_space/projects/verification.py (1)

83-127: New does_check_container_exist helper matches intended semantics; minor DRY opportunity

The tri-state contract (True/False/None), Docker error handling, and client cleanup all align with the docstring and with the patterns already used in is_check_container_running, making this safe to call from the task layer.

If you find yourself adding more container-verification helpers, consider extracting a small internal helper to encapsulate the shared “create client → list containers by label → close client” pattern used here and in is_check_container_running, to keep the Docker access logic in one place. Not urgent, but would reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8753237 and be24577.

📒 Files selected for processing (2)

wafer_space/projects/tasks.py (3 hunks)
wafer_space/projects/verification.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{py,pyi}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint errors must be fixed, never suppressed. Never add # noqa, # type: ignore, or similar without explicit user permission. Always run make lint-fix && make lint && make type-check before committing.

Files:

wafer_space/projects/tasks.py
wafer_space/projects/verification.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Never create circular imports. Maintain layer separation: Models never import tasks/views/services; Services can import models and tasks; Views can import models and services; Tasks can import models only. Never use local imports as a workaround for circular dependencies.
Avoid print statements. Use logging.getLogger(__name__) instead (Ruff rule T201).
Lines must not exceed 88 characters (Ruff rule E501). Break long lines into multiple lines.
Never use f-strings directly in exception messages (Ruff rule EM102). Assign the message to a variable first before using it in the exception.
Avoid boolean positional arguments (Ruff rule FBT002/3). Use keyword-only arguments instead: def fn(*, flag=True).
Never hardcode passwords directly in code (Ruff rule S105). Use constants like TEST_PASSWORD = '...' instead.
All public functions must have type hints. Use from __future__ import annotations for forward references. Fix mypy errors, don't ignore them.
Use specific exception handling with exception chaining. Catch specific exception types and use raise NewException(msg) from exc to chain exceptions. Never use bare except: or overly broad except Exception:.
Celery task names in CELERY_BEAT_SCHEDULE must reference the actual decorated function name (e.g., 'app.tasks.process_check_queue'), not a variable assignment. Task registration comes from the @shared_task decorator, not Python variable assignments.

Files:

wafer_space/projects/tasks.py
wafer_space/projects/verification.py

🧬 Code graph analysis (2)

wafer_space/projects/tasks.py (1)

wafer_space/projects/verification.py (1)

does_check_container_exist (83-127)

wafer_space/projects/verification.py (1)

wafer_space/projects/models.py (1)

ManufacturabilityCheck (1114-1822)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Upload results
GitHub Check: pytest

🔇 Additional comments (1)

wafer_space/projects/tasks.py (1)

57-61: Import of duplicate-container helper is appropriate

Bringing does_check_container_exist in alongside the other verification helpers keeps related logic centralized and does not introduce a runtime circular import (verification only imports models under TYPE_CHECKING). No issues here.

Copilot AI review requested due to automatic review settings December 7, 2025 01:52

Copilot started reviewing on behalf of mithro December 7, 2025 01:52 View session

Copilot AI reviewed Dec 7, 2025

View reviewed changes

coderabbitai bot reviewed Dec 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent starting container if one already exists for check #189

fix: prevent starting container if one already exists for check #189

Uh oh!

mithro commented Dec 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 7, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 7, 2025

Uh oh!

Copilot AI Dec 7, 2025

Uh oh!

Copilot AI Dec 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if does_check_container_exist(context.check) is True:
		msg = f"Container already exists for check {context.check.id}"

-    if does_check_container_exist(context.check) is True:
-        msg = f"Container already exists for check {context.check.id}"
+    exists = does_check_container_exist(context.check)
+    if exists is not False:
+        msg = (
+            f"Cannot verify container does not already exist for check {context.check.id} (result: {exists})"
+        )

fix: prevent starting container if one already exists for check #189

Are you sure you want to change the base?

fix: prevent starting container if one already exists for check #189

Uh oh!

Conversation

mithro commented Dec 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mithro commented Dec 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 7, 2025 •

edited

Loading