Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ COPY pyproject.toml uv.lock /opt/Curator/
COPY nemo_curator/__init__.py nemo_curator/package_info.py /opt/Curator/nemo_curator/

# Install Curator
Comment on lines 71 to 73
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overbroad recursive deletion

find /opt/venv ... -exec rm -rf {} + will delete every matching directory under the virtualenv, including any aiohttp* path segments that happen to appear outside Ray’s vendored thirdparty_files tree (e.g., if the path layout changes or another package path happens to match). This is a destructive step during image build and can silently remove real installed dependencies. Consider constraining the deletion to the exact expected directory (or verifying it exists before removing), and avoid the aiohttp* wildcard unless you really need it.

RUN uv sync --link-mode copy --locked --extra all --all-groups --no-cache
RUN uv sync --link-mode copy --locked --extra all --all-groups --no-cache && \
find /opt/venv -type d -path "*ray/_private/runtime_env/agent/thirdparty_files/aiohttp*" -exec rm -rf {} +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find command could be more robust. The current pattern matches any aiohttp* directory anywhere in the path, which could unintentionally delete legitimate dependencies if the directory structure changes. Consider making the deletion more explicit and fail-safe:

Suggested change
find /opt/venv -type d -path "*ray/_private/runtime_env/agent/thirdparty_files/aiohttp*" -exec rm -rf {} +
ray_aiohttp_dir="/opt/venv/lib/python*/site-packages/ray/_private/runtime_env/agent/thirdparty_files/aiohttp" && \
[ -d "$ray_aiohttp_dir"* ] && rm -rf "$ray_aiohttp_dir"* || true

This approach:

  • Uses a more precise path pattern
  • Checks directory exists before deletion
  • Prevents build failure if path doesn't exist
  • Makes the intent clearer


COPY . /opt/Curator

Expand Down
Loading