Skip to content

build: add .dockerignore and move docling from runtime to dev deps#9469

Merged
ogabrielluiz merged 3 commits into
mainfrom
add-docling-as-dep
Aug 22, 2025
Merged

build: add .dockerignore and move docling from runtime to dev deps#9469
ogabrielluiz merged 3 commits into
mainfrom
add-docling-as-dep

Conversation

@ogabrielluiz
Copy link
Copy Markdown
Contributor

@ogabrielluiz ogabrielluiz commented Aug 21, 2025

Summary by CodeRabbit

  • Chores
    • Optimized Docker builds by excluding unnecessary frontend assets, caches, and environment files from the build context.
    • Streamlined dependencies by moving a documentation-related package to development-only, reducing runtime footprint.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 21, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a new .dockerignore with 12 ignore patterns and updates pyproject.toml to move the docling package from optional/runtime dependencies to dev-only dependencies.

Changes

Cohort / File(s) Change Summary
Docker build context ignores
./.dockerignore
Added 12 ignore patterns for frontend artifacts and common cache/env files (e.g., src/frontend/* outputs, **/__pycache__, **/*.pyc, **/.venv, **/.env, .DS_Store).
Dependency configuration
./pyproject.toml
Moved docling>=2.36.1 to development dependencies; removed it from [project.optional-dependencies] docling group.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

size:M, lgtm

Suggested reviewers

  • jordanrfrazier
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add-docling-as-dep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ogabrielluiz ogabrielluiz enabled auto-merge August 21, 2025 13:24
@coderabbitai coderabbitai Bot changed the title @coderabbitai build: add .dockerignore and move docling from runtime to dev deps Aug 21, 2025
@github-actions github-actions Bot added the build label Aug 21, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pyproject.toml (2)

127-129: Align docling/docling_core version expectations (avoid accidental incompat).

You now ship docling_core>=2.36.1 at runtime while docling>=2.36.1 is dev-only. If tests rely on features that require a newer docling_core than production gets, CI may pass while users hit mismatches.

Two lightweight options:

  • Pin compatible minor ranges explicitly (if upstream follows them), e.g.:
- "docling_core>=2.36.1",
+ "docling_core>=2.36.1,<3.0.0",
  • Or add a comment next to both lines documenting the known compatible major/minor window to help future bumps.

Also applies to: 182-183


112-118: Duplicate packages in main and dev groups (scrapegraph-py, pydantic-ai, ruff).

This predates the PR but worth cleaning up to avoid resolver surprises and drift:

  • In main deps: scrapegraph-py>=1.12.0, pydantic-ai>=0.0.19, ruff>=0.9.7.
  • In dev: scrapegraph-py>=1.10.2, pydantic-ai>=0.0.19, ruff>=0.9.7,<0.10.

Prefer keeping tools like ruff/dev-only libraries in the dev group only. Also ensure version ranges don’t diverge between groups. Example clean-up:

- "ruff>=0.9.7",
+ # ruff is dev-only below

...

- "scrapegraph-py>=1.10.2",
- "pydantic-ai>=0.0.19",
+ # duplicates removed; rely on main constraints

If any of these are truly needed at runtime, remove them from dev to prevent conflicting constraints.

Also applies to: 174-176

.dockerignore (1)

1-12: Great start. Add dist and broader globs to shrink build context and avoid leaking env files.

Given the frontend build output lives under src/frontend/dist/ (per our project conventions), add it to the ignore list. Also consider:

  • Glob node_modules everywhere, not just the top-level frontend folder.
  • Ignore .env variants like .env.local, .env.production, etc.
  • Ignore common caches: .mypy_cache, .ruff_cache, .tox, htmlcov, coverage.xml, .ipynb_checkpoints.
  • Optional: ignore Vite cache src/frontend/.vite/.

Apply something like:

 src/frontend/node_modules
 src/frontend/build
+src/frontend/dist
@@
 **/.DS_Store
 **/__pycache__
 **/*.pyc
 **/.pytest_cache
+**/.mypy_cache
+**/.ruff_cache
+**/.tox
+htmlcov
+coverage.xml
 **/.venv
-**/.env
+**/.env
+**/.env.*
+**/.secrets*
+**/node_modules
+**/venv
+**/env
+**/.direnv
+src/frontend/.vite
+**/.ipynb_checkpoints

This reduces Docker context size and prevents accidental inclusion of secrets.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 43119d0 and f4d8571.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • .dockerignore (1 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-07-18T18:27:12.609Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/frontend_development.mdc:0-0
Timestamp: 2025-07-18T18:27:12.609Z
Learning: Applies to src/frontend/src/**/*.{ts,tsx,js,jsx} : All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-21T14:16:14.125Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/frontend/**/*.@(test|spec).{ts,tsx,js,jsx} : Frontend test files should be located in 'src/frontend/' and use '.test.{ts,tsx,js,jsx}' or '.spec.{ts,tsx,js,jsx}' extensions.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-18T18:27:12.609Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/frontend_development.mdc:0-0
Timestamp: 2025-07-18T18:27:12.609Z
Learning: Frontend builds should output static files to src/frontend/dist/ for production deployment.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-18T18:27:12.609Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/frontend_development.mdc:0-0
Timestamp: 2025-07-18T18:27:12.609Z
Learning: Applies to src/frontend/@(package*.json|tsconfig.json|tailwind.config.*|vite.config.*) : Frontend configuration files such as package.json, tsconfig.json, and Tailwind/Vite configs must be maintained and updated as needed.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-18T18:27:12.609Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/frontend_development.mdc:0-0
Timestamp: 2025-07-18T18:27:12.609Z
Learning: Applies to src/frontend/src/**/__tests__/**/*.{ts,tsx,js,jsx} : All frontend code should be tested using appropriate component and integration tests.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-21T14:16:14.125Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/frontend/**/*.@(test|spec).{ts,tsx,js,jsx} : Test error handling and edge cases in frontend test files.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-21T14:16:14.125Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/frontend/**/*.@(test|spec).{ts,tsx,js,jsx} : Validate input/output behavior and test component initialization and configuration in frontend test files.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-21T14:16:14.125Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/frontend/**/*.@(test|spec).{ts,tsx,js,jsx} : Test both sync and async code paths in frontend test files.

Applied to files:

  • .dockerignore
📚 Learning: 2025-07-21T14:16:14.125Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/frontend/**/*.@(test|spec).{ts,tsx,js,jsx} : Mock external dependencies appropriately in frontend test files to isolate unit tests from external services.

Applied to files:

  • .dockerignore

Comment thread pyproject.toml
Comment on lines +182 to 183
"docling>=2.36.1"
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Moving docling to dev-only: verify no runtime imports and no remaining extras references.

Looks fine, but this is a behavioral change. Please confirm:

  • No production code imports docling at runtime; only tests/dev tools should.
  • Docs, Dockerfiles, Makefiles, and CI don’t instruct users to install langflow[docling] anymore (since the extra group is gone).

Run the following checks from repo root:

If runtime imports exist, consider reintroducing an optional extra docling and guarding imports with graceful fallbacks. I can draft that if needed.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Search for runtime imports of docling/docling_core:"
rg -nC2 -g '!**/tests/**' -g '!**/docs/**' -P '\b(import|from)\s+docling\b|\bfrom\s+docling\b' src || true

echo "2) Search for any extras usage or install docs mentioning [docling]:"
rg -nC2 -P '\[docling\]' || true

echo "3) Search for any explicit `pip install docling` or `uv add docling` in scripts/CI:"
rg -nC2 -P '(pip|uv)\s+(install|add).*\bdocling\b' || true

Length of output: 3399


The initial CI search was skipped due to backticks in the echo. Let’s re-run step 3 properly and also check Dockerfiles, Makefiles, and GitHub workflows for any lingering docling install references.


Let’s re-run step 3 properly against CI/Docker/Make scripts and also inspect the lazy‐import fallback in the docling integration:


🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail

echo "3) Search for docling install references in Dockerfiles, Makefiles, CI workflows, and scripts/:"
rg -nC2 -g 'Dockerfile*' -g 'Makefile*' -g '.github/workflows/**/*.yml' -g 'scripts/**/*' -P '(pip|uv)\s+(install|add).*\bdocling\b' || true

echo
echo "4) Inspect lazy‐import/__getattr__ and try/except in the docling component:"
rg -nC5 '__getattr__' -P src/backend/base/langflow/components/docling/__init__.py || true
rg -nC5 'from docling' -P src/backend/base/langflow/components/docling/__init__.py || true

Length of output: 1794


Update installation references for Docling now that the extra group is removed

The dev‐dependency move is fine—Docling is only imported lazily within its integration module—but we still have stale install instructions in both code and docs. Please update the following:

• In src/backend/base/langflow/components/docling/__init__.py (lines 111–115), the fallback message still reads:
“Install with uv pip install 'langflow[docling]' …”
Change this to install Docling directly, e.g.:

msg = (
    "Docling is an optional dependency of Langflow. "
    "Install with `uv pip install docling` or refer to the documentation."
)

• In docs/docs/Integrations/Docling/integrations-docling.mdx (around line 21), replace

Install the Docling extra in Langflow OSS with `uv pip install 'langflow[docling]'`.

with

Install Docling separately with `uv pip install docling`.

• Verified no other pip install docling or langflow[docling] references remain in Dockerfiles, Makefiles, CI workflows, or scripts/ directories.
• Runtime imports of Docling occur only within the lazy-import logic of the docling integration module, as intended.

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2025
@github-actions github-actions Bot added build and removed build labels Aug 22, 2025
@sonarqubecloud
Copy link
Copy Markdown

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.92%. Comparing base (8af1909) to head (5b88d83).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (3.80%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9469      +/-   ##
==========================================
- Coverage   33.93%   33.92%   -0.01%     
==========================================
  Files        1195     1195              
  Lines       55950    55950              
  Branches     5331     5331              
==========================================
- Hits        18984    18979       -5     
- Misses      36896    36901       +5     
  Partials       70       70              
Flag Coverage Δ
backend 56.57% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ogabrielluiz ogabrielluiz added this pull request to the merge queue Aug 22, 2025
Merged via the queue into main with commit 72ad30e Aug 22, 2025
27 of 28 checks passed
@ogabrielluiz ogabrielluiz deleted the add-docling-as-dep branch August 22, 2025 14:11
lucaseduoli pushed a commit that referenced this pull request Aug 22, 2025
…9469)

* feat: add .dockerignore to exclude build artifacts and environment files

* feat: move docling to main dependencies and remove docling extra
lucaseduoli pushed a commit that referenced this pull request Aug 25, 2025
…9469)

* feat: add .dockerignore to exclude build artifacts and environment files

* feat: move docling to main dependencies and remove docling extra
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants