fix(apply_tweaks): skip tweaks to code field and log warning#9467
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a guard in apply_tweaks to skip tweaks targeting the "code" field and log a security warning. Existing tweak handling remains unchanged. Adds unit tests validating that "code" cannot be overridden while other fields can be tweaked, and that warnings are emitted. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant apply_tweaks
participant Logger
Caller->>apply_tweaks: apply_tweaks(node, tweaks)
loop For each (tweak_name, tweak_value)
alt tweak_name == "code"
apply_tweaks->>Logger: warn("Security: Code field cannot be overridden via tweaks.")
Note right of apply_tweaks: Skip applying "code" tweak
else Other fields
apply_tweaks->>apply_tweaks: Apply per existing logic (NestedDict/dict/scalar)
end
end
apply_tweaks-->>Caller: updated node (code unchanged)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend/base/langflow/processing/process.py (1)
166-170: Missing guard in apply_tweaks_on_vertex allows code override via graph-level tweaks.Tweaks applied through process_tweaks_on_graph → apply_tweaks_on_vertex can still set vertex.params["code"]. This bypasses the protection added in apply_tweaks and reintroduces the same risk.
Patch:
def apply_tweaks_on_vertex(vertex: Vertex, node_tweaks: dict[str, Any]) -> None: - for tweak_name, tweak_value in node_tweaks.items(): - if tweak_name and tweak_value and tweak_name in vertex.params: - vertex.params[tweak_name] = tweak_value + for tweak_name, tweak_value in node_tweaks.items(): + if tweak_name == "code": + logger.warning("Security: Code field cannot be overridden via tweaks.") + continue + if tweak_name and tweak_value and tweak_name in vertex.params: + vertex.params[tweak_name] = tweak_valueOptionally, mirror the type-based check if you have access to parameter metadata here; otherwise the name-based block is a solid immediate fix.
src/backend/tests/unit/test_process.py (1)
347-377: Add coverage for graph-level path (apply_tweaks_on_vertex).apply_tweaks_on_vertex currently lacks the same guard as apply_tweaks, allowing code override at the vertex level. Once the guard is added, include a focused unit test like below to prevent regression.
Proposed test:
def test_apply_tweaks_on_vertex_code_prevention(): """Vertex-level tweaks must not allow overriding 'code'.""" from unittest.mock import patch from langflow.processing.process import apply_tweaks_on_vertex class FakeVertex: def __init__(self): self.params = {"code": "original_code", "param1": "v"} v = FakeVertex() with patch("langflow.processing.process.logger") as mock_logger: apply_tweaks_on_vertex(v, {"code": "bad", "param1": "new"}) mock_logger.warning.assert_called_once_with("Security: Code field cannot be overridden via tweaks.") assert v.params["code"] == "original_code" assert v.params["param1"] == "new"Optional: If you enrich the warning with node context (id/display_name), bind that context in the log and assert accordingly.
🧹 Nitpick comments (3)
src/backend/base/langflow/processing/process.py (2)
147-154: Small cleanup: redundant membership check.We early-continue when tweak_name not in template_data (Line 148-149). The subsequent if tweak_name in template_data (Line 153) is redundant.
You can remove the second membership check and directly branch on the field type/value handling to simplify the flow.
151-152: Consider adding node context to the warning.Including the node id/display_name in the warning improves triage (e.g., "… for node {node_id}"). If you do this, update the tests accordingly.
I can update both the log message and tests in this PR if you’d like.
src/backend/tests/unit/test_process.py (1)
311-345: Good coverage for name-based "code" protection.Test verifies both the warning and that benign params still apply. Looks correct and readable.
Two follow-ups to solidify coverage:
- Add a test where the field name isn’t "code" but its type is "code" (e.g., "script": {"type": "code"}). This will guard against regressions once we extend the check to be type-based.
- Add a test that uses process_tweaks with {"code": "..."} (no node id) to ensure broadcast tweaks can’t override code fields on any node and that warnings are emitted per affected node.
Proposed additional tests (append to this file):
def test_apply_tweaks_code_type_override_prevention(): """Fields with type='code' (even if not named 'code') must not be overridden.""" from unittest.mock import patch from langflow.processing.process import apply_tweaks node = { "id": "test_node", "data": { "node": { "template": { "script": {"value": "original_script", "type": "code"}, "param1": {"value": "original_value", "type": "str"}, } } }, } node_tweaks = {"script": "attempted_code_injection", "param1": "new_value"} with patch("langflow.processing.process.logger") as mock_logger: apply_tweaks(node, node_tweaks) mock_logger.warning.assert_called_once_with("Security: Code field cannot be overridden via tweaks.") assert node["data"]["node"]["template"]["script"]["value"] == "original_script" assert node["data"]["node"]["template"]["param1"]["value"] == "new_value" def test_process_tweaks_code_broadcast_prevention(): """Tweaks without node id ('code': '...') must not override any node code and should warn per node.""" from unittest.mock import patch from langflow.processing.process import process_tweaks graph_data = { "data": { "nodes": [ {"id": "n1", "data": {"node": {"template": {"code": {"value": "c1", "type": "code"}}}}}, {"id": "n2", "data": {"node": {"template": {"code": {"value": "c2", "type": "code"}}}}}, ] } } tweaks = {"code": "payload"} with patch("langflow.processing.process.logger") as mock_logger: result = process_tweaks(graph_data, tweaks) # Expect 2 warnings (one per node) assert mock_logger.warning.call_count == 2 assert result["data"]["node s"][0]["data"]["node"]["template"]["code"]["value"] == "c1" assert result["data"]["nodes"][1]["data"]["node"]["template"]["code"]["value"] == "c2"
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/backend/base/langflow/processing/process.py(1 hunks)src/backend/tests/unit/test_process.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/base/langflow/processing/process.pysrc/backend/tests/unit/test_process.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/test_process.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/test_process.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Ruff Style Check (3.13)
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Ruff Style Check (3.13)
- GitHub Check: Update Starter Projects
- GitHub Check: Optimize new Python code in this PR
🔇 Additional comments (1)
src/backend/base/langflow/processing/process.py (1)
1-216: Ensure backend formatting and linting before mergeI’ve confirmed that the Makefile includes the required targets for formatting and linting:
format_backendrunsruff check . --fixandruff format .(Makefile lines 103–106)lintinstalls backend dependencies then runsmypy(Makefile lines 109–113)Since
makeisn’t available in this sandbox, please manually verify by running in your local environment:make format_backend make lintThis ensures the new code adheres to our style and type-safety guidelines.
| if tweak_name == "code": | ||
| logger.warning("Security: Code field cannot be overridden via tweaks.") | ||
| continue |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Also block fields with type "code", not only name "code".
Right now only the key named "code" is protected. Any template field with a different name but type "code" (e.g., "script", "python") can still be overridden. That’s a security gap.
Apply this minimal change to block by either name or type:
- if tweak_name == "code":
+ # Block overrides for fields named 'code' or any field with type 'code'
+ if tweak_name == "code" or template_data.get(tweak_name, {}).get("type") == "code":
logger.warning("Security: Code field cannot be overridden via tweaks.")
continueAs a follow-up, consider adding corresponding tests to cover a field like {"script": {"type": "code", ...}} (see test suggestions in the tests file comments).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if tweak_name == "code": | |
| logger.warning("Security: Code field cannot be overridden via tweaks.") | |
| continue | |
| # Block overrides for fields named 'code' or any field with type 'code' | |
| if tweak_name == "code" or template_data.get(tweak_name, {}).get("type") == "code": | |
| logger.warning("Security: Code field cannot be overridden via tweaks.") | |
| continue |
🤖 Prompt for AI Agents
In src/backend/base/langflow/processing/process.py around lines 150-152, the
tweak override only blocks fields named "code" but not fields whose schema type
is "code"; update the check to also inspect the field's schema type (e.g.,
lookup the field schema from the template/fields map for tweak_name and test if
field_schema.get("type") == "code") and if so log the same security warning and
continue, so overrides are blocked by either name or type.
There was a problem hiding this comment.
Nope. The code field is a specific one. Having a field of type code is field.
There was a problem hiding this comment.
@ogabrielluiz Thank you for the clarification! I understand now that the "code" field is a specific field name that needs protection, and fields with type "code" are legitimate fields that should remain tweakable. The current implementation correctly targets only the specific "code" field for security reasons.
∩───∩
( ◕ ◕ )
/ ● \
( ____/ )
\ /
_ /
| ___ |
\ _/ /
_/
learning noted
✏️ Learnings added
Learnt from: ogabrielluiz
PR: langflow-ai/langflow#9467
File: src/backend/base/langflow/processing/process.py:150-152
Timestamp: 2025-08-21T11:57:27.108Z
Learning: In Langflow's apply_tweaks function, the "code" field is a specific field name that contains executable code and must be blocked from tweaks for security reasons. Fields with type "code" are different and legitimate - they should remain tweakable. The security protection is specifically for the field named "code", not for all fields of type "code".
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
…n-tweaks`) Here is a highly optimized rewrite of your program, focusing on *major* runtime bottlenecks visible in your line profiler, while preserving behavioral and stylistic constraints. **Key optimizations:** - **`validate_and_repair_json`:** The `repair_json` call is the dominant bottleneck. If the input is already valid and parseable with `json.loads`, you can *skip* the expensive repair step. - **`apply_tweaks`:** - Move invariants like `template_data[tweak_name]` and `'type'` to local variables once per loop to avoid unnecessary repeated lookups. - Unify/flatten inner checks and avoid redundant dictionary accesses. - Optimize the repeated check for `"file"` type, do it once per tweak per subkey rather than per item. - Use local variables for looked-up dictionary entries to avoid repeated hash table ops. - Slight loop unrolling and early `continue` for the most frequent paths. - Avoid unnecessary checks: the presence of `tweak_name in template_data` is checked once at the start, and not inside branches. **Code below:** --- **Summary of improvements:** - `validate_and_repair_json` now avoids calling `repair_json` on already-valid JSON strings, drastically reducing unnecessary repair time. - `apply_tweaks` now pulls `tpl_entry` and its `"type"` to local variables, saving dictionary lookups. - Inner `"file"` type check is performed once per entry, not per subkey. - Short-circuit continues remove redundant checks. - All dictionary gets are batched/hoisted for speed. **Behavior and structure remain unchanged, so all logging, warnings, returns, and type annotations are preserved per your requirements.**
⚡️ Codeflash found optimizations for this PR📄 99% (0.99x) speedup for
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (3.80%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #9467 +/- ##
=======================================
Coverage 33.80% 33.81%
=======================================
Files 1196 1196
Lines 56386 56389 +3
Branches 5335 5335
=======================================
+ Hits 19063 19069 +6
+ Misses 37253 37250 -3
Partials 70 70
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
* fix: add security warning for overriding code field in tweaks * test: add tests for preventing code field overrides in tweaks
…ing (#9192) * feat: add dependency analysis utilities for custom components - Introduced `dependency_analyzer.py` to analyze and classify dependencies in Python code. - Implemented functions to extract import information and categorize dependencies as standard library, local, or external. - Enhanced `build_component_metadata` to include dependency analysis results in component metadata. - Added unit tests to validate the functionality of the dependency analysis features. * refactor: streamline dependency analysis by filtering out stdlib and local imports - Updated `dependency_analyzer.py` to focus on external dependencies only, removing standard library and local imports from analysis results. - Simplified the `DependencyInfo` class by eliminating unnecessary attributes and adjusting the deduplication logic. - Modified `build_component_metadata` to reflect changes in dependency structure, removing counts for stdlib and local dependencies. - Enhanced unit tests to validate the new filtering behavior and ensure no duplicates in external dependencies. * feat: update starter project metadata with dependency information - Added dependency sections to multiple starter project JSON files, specifying required packages and their versions. - Included `langflow` version `1.5.0.post1` and other relevant dependencies such as `orjson`, `fastapi`, and `pydantic` across various projects. - Enhanced project metadata to improve clarity on external dependencies for better maintainability and user guidance. * ⚡️ Speed up function `_classify_dependency` by 7,582% in PR #9192 (`add-deps-metadata`) (#9193) Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * fix: ensure distribution version is returned correctly in `_get_distribution_version` - Updated `_get_distribution_version` function to return the distribution version after successfully retrieving it, addressing a potential issue where `None` could be returned prematurely. * fix: improve distribution version lookup in `_get_distribution_version` * fix: handle distribution version lookup exceptions more gracefully * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix(apply_tweaks): skip tweaks to code field and log warning (#9467) * fix: add security warning for overriding code field in tweaks * test: add tests for preventing code field overrides in tweaks * ref: Refactor vectorstore components structure (#9486) * Refactor vectorstore components structure Moved vectorstore components for Chroma, ClickHouse, Couchbase, DataStax, Elastic, Milvus, MongoDB, Pinecone, Qdrant, Supabase, Upstash, Vectara, and Weaviate into dedicated subfolders with __init__.py files for each. Updated Redis vectorstore implementation to reside in redis.py and removed the old vectorstores/redis.py. Adjusted starter project JSONs and frontend constants to reflect new module paths and sidebar entries for these vectorstores. * Refactor vectorstore components and add lazy imports Moved Datastax-related files from vectorstores to a dedicated datastax directory. Added lazy import logic to __init__.py files for chroma, clickhouse, couchbase, elastic, milvus, mongodb, pinecone, qdrant, supabase, upstash, vectara, and weaviate components. Cleaned up vectorstores/__init__.py to only include local and faiss components, improving modularity and import efficiency. * [autofix.ci] apply automated fixes * Refactor vectorstore components structure Moved FAISS, Cassandra, and pgvector components to dedicated subdirectories with lazy-loading __init__.py files. Updated imports and references throughout the backend and frontend to reflect new locations. Removed obsolete datastax Cassandra component. Added new sidebar bundle entries for FAISS, Cassandra, and pgvector in frontend constants and style utilities. * Add lazy imports and Redis chat memory component Refactored the Redis module to support lazy imports for RedisIndexChatMemory and RedisVectorStoreComponent, improving import efficiency. Added a new redis_chat.py file implementing RedisIndexChatMemory for chat message storage and retrieval using Redis. * Fix vector store astra imports * Revert package lock changes * More test fixes * Update test_vector_store_rag.py * Update test_dynamic_imports.py * Update vector_store_rag.py * Update test_dynamic_imports.py * Refactor the cassandra chat component * Fix frontend tests for bundle * Mark Local DB as legacy * Update inputComponent.spec.ts * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com> * feat: add dependencies metadata to starter projects * feat: add caching for packages_distributions to improve performance * refactor: update test descriptions and remove unused imports in metadata tests --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>



Summary by CodeRabbit
Bug Fixes
Tests