Skip to content

Conversation

@mithro
Copy link
Contributor

@mithro mithro commented Dec 7, 2025

Summary

  • Add does_check_container_exist() function to check for any container (running or stopped) with the check's label
  • Call this check at the start of _start_container() to abort if a duplicate would be created
  • Raises ValueError which is caught by existing exception handling

Problem

Race conditions can occur where check_process_job runs multiple times for the same check (e.g., Celery retry), potentially creating duplicate Docker containers.

Solution

Before creating a new container, query Docker for any existing container with the wafer.space.check_id label matching this check. If found, raise an error to abort the duplicate.

Test plan

  • Verify normal check flow still works
  • Manually test duplicate prevention by calling check_process_job twice for same check

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced validation to prevent duplicate containers from being created for the same check. Attempting to start a container that already exists now raises a clear error, improving system reliability and preventing conflicts.

✏️ Tip: You can customize this high-level summary in your review settings.

Add does_check_container_exist() to verification.py that checks for any
container (running or stopped) with the check's label. Called at the
start of _start_container() to abort if a duplicate would be created.

This prevents race conditions where check_process_job runs multiple
times for the same check (e.g., Celery retry).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 7, 2025 01:52
@coderabbitai
Copy link

coderabbitai bot commented Dec 7, 2025

Walkthrough

These changes add a duplicate container detection mechanism by introducing a new function that checks for existing Docker containers (running or stopped) for a given check, and integrating it as a pre-check in the container startup process to prevent duplicate container creation.

Changes

Cohort / File(s) Summary
Docker container existence check
wafer_space/projects/verification.py
Added new exported function does_check_container_exist(check) that queries Docker for any container (running or stopped) with label wafer.space.check_id={check.id}, returning True if found, False if none exist, and None if Docker is unavailable. Includes logging and proper Docker client cleanup.
Duplicate container prevention
wafer_space/projects/tasks.py
Imported does_check_container_exist and added a pre-check in _start_container that raises ValueError if a container already exists for the given check. Updated docstring with Raises section and added clarifying code comments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Docker container existence check: Verify that the label query format (wafer.space.check_id) is consistent with existing labeling conventions and that error handling for Docker unavailability is correct.
  • ValueError behavior: Confirm the ValueError is raised at the appropriate point and that the error message is clear for debugging.

Poem

🐰 A container rises, but wait—
We check if one's already there!
No duplicates to frustrate,
Just clean checks with proper care. ✓

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly and clearly summarizes the main change: preventing duplicate container creation by checking if a container already exists before starting a new one.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/prevent-duplicate-containers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds duplicate container detection to prevent race conditions where multiple Celery task retries could create multiple Docker containers for the same manufacturability check.

  • Introduces does_check_container_exist() function to query for any existing containers (running or stopped) with the check's label
  • Adds a safety check in _start_container() to abort container creation if a duplicate would be created
  • Leverages existing exception handling by raising ValueError when duplicates are detected

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
wafer_space/projects/verification.py Adds new does_check_container_exist() function to check for any existing containers (running or stopped) with the check's label, preventing duplicate container creation
wafer_space/projects/tasks.py Imports and calls the new duplicate check function at the start of _start_container(), raising ValueError if a container already exists for the check

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +543 to +544
if does_check_container_exist(context.check) is True:
msg = f"Container already exists for check {context.check.id}"
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for is True allows the function to proceed when Docker is unavailable (None return). This means duplicate containers could still be created if Docker connection temporarily fails during the check. Consider either:

  1. Treating None as a failure case: if does_check_container_exist(context.check) is not False:
  2. Or raising an error when Docker is unavailable to prevent any container creation without verification

This ensures containers are only created when we can definitively verify no duplicates exist.

Suggested change
if does_check_container_exist(context.check) is True:
msg = f"Container already exists for check {context.check.id}"
exists = does_check_container_exist(context.check)
if exists is not False:
msg = (
f"Cannot verify container does not already exist for check {context.check.id} (result: {exists})"
)

Copilot uses AI. Check for mistakes.
Comment on lines +542 to +545
# Safety check: prevent duplicate containers
if does_check_container_exist(context.check) is True:
msg = f"Container already exists for check {context.check.id}"
raise ValueError(msg)
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this check prevents some duplicates, a race condition still exists between checking for containers (line 543) and creating one (later in the function). Two simultaneous calls could both pass the check and create duplicate containers. Consider using Docker labels with a unique constraint or implementing distributed locking (e.g., Redis lock with the check_id as the key) to ensure atomicity between check and create operations.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +127
def does_check_container_exist(check: ManufacturabilityCheck) -> bool | None:
"""Check if any Docker container exists for this check (running or stopped).
This prevents creating duplicate containers for the same check.
Unlike is_check_container_running(), this checks ALL containers.
Args:
check: ManufacturabilityCheck to verify
Returns:
True if any container found (running or stopped)
False if no container found
None if Docker unavailable
"""
try:
client = docker.from_env()
except docker.errors.DockerException:
logger.warning(
"Failed to connect to Docker - cannot check container for %s",
check.id,
)
return None

try:
containers = client.containers.list(
all=True, # Include stopped containers
filters={"label": f"wafer.space.check_id={check.id}"},
)
except docker.errors.DockerException:
logger.exception("Error querying Docker for check %s", check.id)
return None
finally:
if hasattr(client, "close"):
client.close()

if containers:
container_info = [(c.short_id, c.status) for c in containers]
logger.warning(
"Check %s already has container(s): %s",
check.id,
container_info,
)
return True

return False
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new does_check_container_exist() function follows the same pattern as is_check_container_running() but lacks test coverage. Since comprehensive tests exist for the similar function in test_verification.py (lines 670-796), tests should be added to cover:

  • Container found (returns True)
  • No container found (returns False)
  • Docker connection failure (returns None)
  • Docker query error (returns None)
  • Multiple containers found (returns True)

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
wafer_space/projects/tasks.py (1)

528-546: Duplicate-container guard is a good safety net; note remaining race semantics

The pre-check using does_check_container_exist(context.check) and documenting the ValueError in the docstring nicely prevents starting a new container when Docker already reports one (running or stopped) for this check, while preserving behavior when Docker is unavailable (only True short-circuits).

Two behavioral nuances to be aware of:

  • There is still a narrow race where two workers can both enter _start_container concurrently, each seeing no existing containers and both calling containers.run(). Preventing that scenario entirely would require some DB-side coordination (e.g., row-level locking or an invariant on “at most one active RUNNING/DISPATCHED check per id”) rather than relying solely on Docker state.
  • Raising ValueError here means a duplicate attempt will currently mark the check ERROR via the generic ValueError handler, even though another worker may be legitimately processing it. If you prefer duplicates to be treated as “already running elsewhere” rather than an error, you may want a more specific exception or a dedicated early-return path.

If these semantics are acceptable for now, the added guard is still a meaningful improvement; otherwise, consider tightening with a small state-machine/locking adjustment.

Also applies to: 915-916

wafer_space/projects/verification.py (1)

83-127: New does_check_container_exist helper matches intended semantics; minor DRY opportunity

The tri-state contract (True/False/None), Docker error handling, and client cleanup all align with the docstring and with the patterns already used in is_check_container_running, making this safe to call from the task layer.

If you find yourself adding more container-verification helpers, consider extracting a small internal helper to encapsulate the shared “create client → list containers by label → close client” pattern used here and in is_check_container_running, to keep the Docker access logic in one place. Not urgent, but would reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8753237 and be24577.

📒 Files selected for processing (2)
  • wafer_space/projects/tasks.py (3 hunks)
  • wafer_space/projects/verification.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{py,pyi}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint errors must be fixed, never suppressed. Never add # noqa, # type: ignore, or similar without explicit user permission. Always run make lint-fix && make lint && make type-check before committing.

Files:

  • wafer_space/projects/tasks.py
  • wafer_space/projects/verification.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Never create circular imports. Maintain layer separation: Models never import tasks/views/services; Services can import models and tasks; Views can import models and services; Tasks can import models only. Never use local imports as a workaround for circular dependencies.
Avoid print statements. Use logging.getLogger(__name__) instead (Ruff rule T201).
Lines must not exceed 88 characters (Ruff rule E501). Break long lines into multiple lines.
Never use f-strings directly in exception messages (Ruff rule EM102). Assign the message to a variable first before using it in the exception.
Avoid boolean positional arguments (Ruff rule FBT002/3). Use keyword-only arguments instead: def fn(*, flag=True).
Never hardcode passwords directly in code (Ruff rule S105). Use constants like TEST_PASSWORD = '...' instead.
All public functions must have type hints. Use from __future__ import annotations for forward references. Fix mypy errors, don't ignore them.
Use specific exception handling with exception chaining. Catch specific exception types and use raise NewException(msg) from exc to chain exceptions. Never use bare except: or overly broad except Exception:.
Celery task names in CELERY_BEAT_SCHEDULE must reference the actual decorated function name (e.g., 'app.tasks.process_check_queue'), not a variable assignment. Task registration comes from the @shared_task decorator, not Python variable assignments.

Files:

  • wafer_space/projects/tasks.py
  • wafer_space/projects/verification.py
🧬 Code graph analysis (2)
wafer_space/projects/tasks.py (1)
wafer_space/projects/verification.py (1)
  • does_check_container_exist (83-127)
wafer_space/projects/verification.py (1)
wafer_space/projects/models.py (1)
  • ManufacturabilityCheck (1114-1822)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Upload results
  • GitHub Check: pytest
🔇 Additional comments (1)
wafer_space/projects/tasks.py (1)

57-61: Import of duplicate-container helper is appropriate

Bringing does_check_container_exist in alongside the other verification helpers keeps related logic centralized and does not introduce a runtime circular import (verification only imports models under TYPE_CHECKING). No issues here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants