fix: mcp component dyanamic tool#12779
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (49.81%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.10.0 #12779 +/- ##
==================================================
- Coverage 52.78% 52.70% -0.09%
==================================================
Files 2025 2025
Lines 184225 183515 -710
Branches 27364 27654 +290
==================================================
- Hits 97244 96721 -523
+ Misses 85886 85699 -187
Partials 1095 1095
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
erichare
left a comment
There was a problem hiding this comment.
Hi @mansura-habiba ! This is a great improvement, just some things to address before merge:
Overall: The underlying intent (auth-aware cache keys, serialized session access, per-request tool refresh) is sound and aligned with the concurrency work on fix/mcp-session-concurrent-access. But the implementation layers many overlapping fixes and introduces several real correctness/safety regressions.
🔴 CRITICAL
-
_ttl_tool_cacheis a class-level dict, shared across ALL instances — [mcp_component.py:106](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:106) declares_ttl_tool_cache: dict = {}at class scope, and [init](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:108-120) never re-binds it to a fresh dict. The write at [:1027](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:1027) (self._ttl_tool_cache[ttl_key] = …) mutates the class attribute, so everyMCPToolsComponentin the process shares the same dict.- Doc-comment claims "per-instance". It isn't.
- Unbounded memory growth (no eviction, no max size).
- Two users on separate flows that happen to hash to the same
server:digestkey will receive each other's cached tool lists._mcp_servers_cache_keyis deterministic onserver_name + sha256(headers)— collisions across tenants with identical auth are fully possible. - Fix:
self._ttl_tool_cache: dict = {}inside__init__, plus an upper bound + LRU/TTL sweep.
-
_temporary_mcp_server_timeoutmutates a shared globalsettingsobject — [:136-147](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:136-147) doessettings.mcp_server_timeout = override_secon the singleton returned byget_settings_service().settings, yields, then restores. With concurrentupdate_tool_listcalls across different components (which is explicitly the fix target!), two coroutines interleave:A: save previous=20, set=120 B: save previous=120 (!), set=60 A: restore 20 ← clobbers B's override B: restore 120 ← wrong; should be 20Final global state is non-deterministic, and in-flight callers on other code paths will see wrong timeouts. The per-component lock at
_update_tool_list_lockis per instance so it doesn't serialize this. Pass the override down explicitly instead of mutating shared state.
🟠 HIGH
-
[to_toolkit](https://claude.ai/epitaxy/src/lfx/src/lfx/components/models_and_agents/mcp_component.py:1031-1043)override silently disablestools_metadata/enabled_toolsfiltering — The new to_toolkit doesreturn await self._get_tools(). The base implementation in [component.py:1400-1425](src/lfx/src/lfx/custom/custom_component/component.py:1400-1425) applies_filter_tools_by_statusand_update_tools_with_metadata. The component declarestools_metadatahandling at [:770](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:770), so users who disabled specific tools in the UI will have them silently re-enabled. This is a behavioral regression, not a fix. The "version drift" justification in the docstring needs concrete repro — if the base-class chain is actually broken for some deployment, fix the base class. -
Cookie-mirror heuristic is both over-eager and under-specified — [:393-397](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:393-397) takes the first header whose name isn't in an eight-item allow-list and writes
Cookie: {HeaderName}={value}.X-Request-ID,X-Organization-ID,X-Forwarded-For,Origin,Referer,X-Trace-Id,Accept-Encoding,Accept-Language— none are in the allow-list, all get mirrored.- Cookie names are case-sensitive and have their own grammar;
Cookie: X-Forwarded-For=10.1.2.3isn't a valid cookie and will confuse downstream parsers. - Dict iteration order is insertion order — the choice of "first" header becomes a subtle user-input-ordering dependency.
- If the user actually wants a cookie, they can set
Cookieexplicitly. The heuristic should be opt-in (a boolean input or a named list of "mirror-these" headers) rather than implicit.
-
Header hash becomes part of the shared cross-request cache key with no eviction —
_mcp_servers_cache_keypushes every distinct (server, header-set) pair into_shared_component_cache["servers"]. Each tenant/session token creates a new entry that never expires. For a tenant with rotating session tokens, memory grows monotonically until process restart. Needs a bound or TTL. -
Global
mcp_server_timeoutdefault bumped 20s → 90s in a release branch — [settings/base.py](src/lfx/src/lfx/services/settings/base.py) — this is a 4.5× default. Any call site that readssettings.mcp_server_timeout(not just MCP Tools) now waits 4.5× as long on unhealthy servers. If the per-nodemcp_operation_timeout_secondsalready covers the legitimate use case, revert the global bump and let operators setLANGFLOW_MCP_SERVER_TIMEOUTwhen they need it.
🟡 RECOMMENDED
-
Three-layer redundancy for disabling
Output.cache— [_build_tool_output](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:149), [map_outputs](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:161), and [_get_output_result](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:167) all setcache=False. If the base chain is respected, only_build_tool_outputshould be needed. Pick one; comment why. -
Logging volume on hot path —
await logger.ainfo(...)atupdate_tool_liststart/end,_get_toolsstart/end with fulltool_nameslist on every agent step. In a many-agent production run this produces a lot of INFO-level traffic. Demote toadebugor gate behind an explicit diagnostic flag. -
No unit tests added. The body's test plan is a manual checklist with unchecked boxes. For logic this subtle (cache key building, cookie mirroring, TTL eviction, settings override under concurrency), unit tests are essential — especially since two similar concurrency fixes are already in flight on [util.py](src/lfx/src/lfx/base/mcp/util.py) on this local branch.
✅ What's good
_mcp_servers_cache_keycorrectly separates auth contexts.asyncio.Lockto serializeupdate_tool_listper instance is the right idea.Output.cache=Falsereasoning is correct even if over-applied.- Generally well-commented.
Recommendation: Request changes. Must-fix before merge: #1 (instance dict), #2 (global settings race), #3 (tools_metadata bypass). Should-fix: #4 (cookie heuristic), #5 (cache growth), #6 (global default).
d57b312 to
a0b31d3
Compare
Problem
-------
When Langflow runs an agent that has an MCPTools component attached and an
API request supplies authentication headers via tweaks, the MCP server can
legitimately return an *expanded* tool list that is broader than the
unauthenticated base set (the server filters its listing by the caller's
identity). Today those extra tools never reach the agent at runtime: the
component always binds the first preloaded tool list and ignores the new
auth context for the lifetime of the flow.
Root cause
----------
Several caching / binding layers combine to prevent per-request tool
refresh:
1. Flow JSON persists ``component_as_tool`` with ``cache: true`` and
Langflow's ``Output`` memoization reuses the first ``to_toolkit()``
result for all subsequent requests regardless of headers.
2. The shared MCP server cache was keyed by server name only, so
requests with different auth contexts collided on the same cache slot.
3. In tool-mode, ``_get_tools()`` returned ``[]`` when
``_not_load_actions`` was true, starving the agent of MCP tools.
4. Concurrent ``update_tool_list`` calls raced on the same Streamable
HTTP client session, producing intermittent HTTP 404 /
"Session terminated" errors from the MCP SDK.
What changed
------------
``src/lfx/src/lfx/components/models_and_agents/mcp_component.py``:
- FIX 1 - Output cache bypass: ``_build_tool_output`` declares the
Toolset output with ``cache=False``, and ``map_outputs`` overrides any
``cache: true`` value persisted in saved flow JSON. Together these
guarantee every run resolves a fresh ``to_toolkit()`` call regardless
of whether the flow was loaded from disk. Independent of the
user-facing ``use_cache`` / "Use Cached Server" toggle.
- FIX 2 - Header-aware cache key: ``_mcp_servers_cache_key`` now hashes
the component headers into the shared server-cache key so requests
with different auth contexts land in distinct cache slots instead of
masking each other. The shared cache is bounded at
``SHARED_SERVERS_CACHE_MAX_ENTRIES`` = 64 with FIFO eviction so a
tenant rotating session tokens does not grow the map without limit.
- FIX 3 - ``_get_tools`` always fetches: removed the ``_not_load_actions``
short-circuit that returned ``[]`` in tool-mode, so the agent always
binds the current (possibly expanded) tool list. ``_not_load_actions``
still gates only the UI build_config dropdown.
- FIX 4 - Serialized ``update_tool_list``: an ``asyncio.Lock`` prevents
concurrent refreshes from racing on the same Streamable HTTP client
session, eliminating the intermittent HTTP 404 / "Session terminated"
errors the MCP SDK raised when session DELETE and POST overlapped.
- FIX 5 - TTL tool cache: ``_get_tools`` keeps a per-instance,
header-hash-keyed TTL cache (``TOOL_TTL_SECS`` = 30s, disable with 0)
bounded at ``TOOL_TTL_MAX_ENTRIES`` = 32 with FIFO eviction and
stale-on-read drop. Parallel agent steps that share the same auth
reuse the tool list instead of each paying for a fresh MCP round-trip.
This is deliberately distinct from ``use_cache`` / "Use Cached Server"
which controls the cross-request shared cache. The dict is initialized
in ``__init__`` so every instance gets its own cache.
Impact on users
---------------
- Backward compatible. Unauthenticated / non-tweaked flows behave
identically: the TTL cache is per-instance and scoped to a single
(server, header-hash) pair, and the base-class ``to_toolkit`` chain is
unchanged (no override of filtering via ``tools_metadata`` /
``enabled_tools``).
- New behaviour only triggers when tweaks/UI headers include an auth
context; the header-hash cache key then isolates per-caller tool lists
and the agent binds the correctly-filtered expanded set.
- No changes to ``lfx/services/settings/base.py``: the global
``mcp_server_timeout`` default stays at its existing value.
Test plan
---------
- ``python -m py_compile`` passes on the edited module.
- ``python -m ruff check`` reports no new violations introduced by this
change.
- Local verification against an MCP server that returns different tool
lists per caller identity:
* Unauthenticated request returns the base tools as before.
* Authenticated request (auth headers via tweaks) returns the
expanded tool set on the first call and on subsequent calls with
the same auth - TTL cache hits log
``MCP _get_tools: TTL cache hit``.
* Switching to a different auth context on the same flow returns the
correct per-caller tool list (header-hash cache key isolates the
two contexts).
* Parallel agent runs no longer surface HTTP 404 / "Session
terminated" from the MCP SDK.
* Disabling a tool via the UI (``tools_metadata``) correctly hides it
from the bound agent - base-class filtering is preserved.
0431ac0 to
0b9f2ed
Compare
erichare
left a comment
There was a problem hiding this comment.
Awesome work @mansura-habiba. LGTM.
Every CRITICAL and HIGH issue has been addressed from my prior review:
🔴 CRITICAL — all resolved:
- ✅
_ttl_tool_cachenow initialized in__init__as per-instance dict, withTOOL_TTL_MAX_ENTRIES=32bound + FIFO eviction — [mcp_component.py:108](src/lfx/src/lfx/components/models_and_agents/mcp_component.py:108) - ✅
_temporary_mcp_server_timeoutremoved entirely — no more sharedsettingssingleton mutation under concurrency - ✅
to_toolkitoverride removed — base-classtools_metadata/enabled_toolsfiltering is preserved
🟠 HIGH — all resolved:
4. ✅ Cookie-mirror heuristic removed entirely
5. ✅ Shared servers cache now bounded by SHARED_SERVERS_CACHE_MAX_ENTRIES=64 with FIFO eviction
6. ✅ Global mcp_server_timeout bump reverted (settings/base.py no longer in the diff)
🟡 RECOMMENDED:
7. ✅ Cache-disable redundancy reduced from 3 layers to 2 (_build_tool_output declares cache=False; map_outputs force-overrides persisted flow JSON — justified in docstring)
8. ✅ Logging demoted from ainfo → adebug
Add unit coverage for the fixes introduced in this PR so regressions in the cache-key, TTL cache, concurrency lock, shared-cache eviction, and Toolset Output.cache=False behaviour are caught by CI instead of by users: - Header normalization across list / dict / None / malformed shapes - `_mcp_servers_cache_key` determinism + header-hash scoping across auth contexts (different headers → different keys; order-independent) - Per-instance `_ttl_tool_cache` isolation (no cross-instance leakage) - `_get_tools` TTL cache: hit skips `update_tool_list`, expired entries are refetched, FIFO-eviction caps growth at `TOOL_TTL_MAX_ENTRIES`, and TTL=0 disables the cache - `_update_tool_list_lock` serialises concurrent calls (peak concurrency 1) - Shared `servers` cross-request cache evicts oldest entry when the map reaches `SHARED_SERVERS_CACHE_MAX_ENTRIES` - `_build_tool_output` declares `cache=False`; `map_outputs` overrides persisted `cache=True` from saved flow JSON
* fix(mcp): support real-time tool onboarding via per-request auth headers
Problem
-------
When Langflow runs an agent that has an MCPTools component attached and an
API request supplies authentication headers via tweaks, the MCP server can
legitimately return an *expanded* tool list that is broader than the
unauthenticated base set (the server filters its listing by the caller's
identity). Today those extra tools never reach the agent at runtime: the
component always binds the first preloaded tool list and ignores the new
auth context for the lifetime of the flow.
Root cause
----------
Several caching / binding layers combine to prevent per-request tool
refresh:
1. Flow JSON persists ``component_as_tool`` with ``cache: true`` and
Langflow's ``Output`` memoization reuses the first ``to_toolkit()``
result for all subsequent requests regardless of headers.
2. The shared MCP server cache was keyed by server name only, so
requests with different auth contexts collided on the same cache slot.
3. In tool-mode, ``_get_tools()`` returned ``[]`` when
``_not_load_actions`` was true, starving the agent of MCP tools.
4. Concurrent ``update_tool_list`` calls raced on the same Streamable
HTTP client session, producing intermittent HTTP 404 /
"Session terminated" errors from the MCP SDK.
What changed
------------
``src/lfx/src/lfx/components/models_and_agents/mcp_component.py``:
- FIX 1 - Output cache bypass: ``_build_tool_output`` declares the
Toolset output with ``cache=False``, and ``map_outputs`` overrides any
``cache: true`` value persisted in saved flow JSON. Together these
guarantee every run resolves a fresh ``to_toolkit()`` call regardless
of whether the flow was loaded from disk. Independent of the
user-facing ``use_cache`` / "Use Cached Server" toggle.
- FIX 2 - Header-aware cache key: ``_mcp_servers_cache_key`` now hashes
the component headers into the shared server-cache key so requests
with different auth contexts land in distinct cache slots instead of
masking each other. The shared cache is bounded at
``SHARED_SERVERS_CACHE_MAX_ENTRIES`` = 64 with FIFO eviction so a
tenant rotating session tokens does not grow the map without limit.
- FIX 3 - ``_get_tools`` always fetches: removed the ``_not_load_actions``
short-circuit that returned ``[]`` in tool-mode, so the agent always
binds the current (possibly expanded) tool list. ``_not_load_actions``
still gates only the UI build_config dropdown.
- FIX 4 - Serialized ``update_tool_list``: an ``asyncio.Lock`` prevents
concurrent refreshes from racing on the same Streamable HTTP client
session, eliminating the intermittent HTTP 404 / "Session terminated"
errors the MCP SDK raised when session DELETE and POST overlapped.
- FIX 5 - TTL tool cache: ``_get_tools`` keeps a per-instance,
header-hash-keyed TTL cache (``TOOL_TTL_SECS`` = 30s, disable with 0)
bounded at ``TOOL_TTL_MAX_ENTRIES`` = 32 with FIFO eviction and
stale-on-read drop. Parallel agent steps that share the same auth
reuse the tool list instead of each paying for a fresh MCP round-trip.
This is deliberately distinct from ``use_cache`` / "Use Cached Server"
which controls the cross-request shared cache. The dict is initialized
in ``__init__`` so every instance gets its own cache.
Impact on users
---------------
- Backward compatible. Unauthenticated / non-tweaked flows behave
identically: the TTL cache is per-instance and scoped to a single
(server, header-hash) pair, and the base-class ``to_toolkit`` chain is
unchanged (no override of filtering via ``tools_metadata`` /
``enabled_tools``).
- New behaviour only triggers when tweaks/UI headers include an auth
context; the header-hash cache key then isolates per-caller tool lists
and the agent binds the correctly-filtered expanded set.
- No changes to ``lfx/services/settings/base.py``: the global
``mcp_server_timeout`` default stays at its existing value.
Test plan
---------
- ``python -m py_compile`` passes on the edited module.
- ``python -m ruff check`` reports no new violations introduced by this
change.
- Local verification against an MCP server that returns different tool
lists per caller identity:
* Unauthenticated request returns the base tools as before.
* Authenticated request (auth headers via tweaks) returns the
expanded tool set on the first call and on subsequent calls with
the same auth - TTL cache hits log
``MCP _get_tools: TTL cache hit``.
* Switching to a different auth context on the same flow returns the
correct per-caller tool list (header-hash cache key isolates the
two contexts).
* Parallel agent runs no longer surface HTTP 404 / "Session
terminated" from the MCP SDK.
* Disabling a tool via the UI (``tools_metadata``) correctly hides it
from the bound agent - base-class filtering is preserved.
* [autofix.ci] apply automated fixes
* [autofix.ci] apply automated fixes (attempt 2/3)
* [autofix.ci] apply automated fixes (attempt 3/3)
* test(mcp): cover dynamic-tool onboarding fixes
Add unit coverage for the fixes introduced in this PR so regressions in the
cache-key, TTL cache, concurrency lock, shared-cache eviction, and Toolset
Output.cache=False behaviour are caught by CI instead of by users:
- Header normalization across list / dict / None / malformed shapes
- `_mcp_servers_cache_key` determinism + header-hash scoping across auth
contexts (different headers → different keys; order-independent)
- Per-instance `_ttl_tool_cache` isolation (no cross-instance leakage)
- `_get_tools` TTL cache: hit skips `update_tool_list`, expired entries are
refetched, FIFO-eviction caps growth at `TOOL_TTL_MAX_ENTRIES`, and TTL=0
disables the cache
- `_update_tool_list_lock` serialises concurrent calls (peak concurrency 1)
- Shared `servers` cross-request cache evicts oldest entry when the map
reaches `SHARED_SERVERS_CACHE_MAX_ENTRIES`
- `_build_tool_output` declares `cache=False`; `map_outputs` overrides
persisted `cache=True` from saved flow JSON
* [autofix.ci] apply automated fixes
---------
Co-authored-by: Mansura <mansura.nw@gmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Hare <ericrhare@gmail.com>
Summary
Fixes stale MCP tool binding when a request supplies per-call auth headers (e.g. a session token via tweaks). The agent was always bound to the first preloaded tool list and never saw the expanded set that authenticated sessions legitimately unlock.
Seven targeted changes in
src/lfx/src/lfx/components/models_and_agents/mcp_component.py:cache=Falseon the Toolset output in_build_tool_output/map_outputs/_get_output_resultso saved flows don't memoize a stale list._mcp_servers_cache_keyhashes component headers into the shared server-cache key so different auth contexts land in distinct slots.Authorization/Content-Type/Accept/User-Agent/Cookie/Host/Connection/Cache-Control) is mirrored intoCookiewhen none is explicitly set, for middleware that only reads session auth from cookies._get_toolsalways fetches;_not_load_actionsnow gates only the UI dropdown, not agent binding.asyncio.Lockserializesupdate_tool_listso concurrent calls don't race on the Streamable HTTP client session (eliminates intermittent HTTP 404 / "Session terminated").to_toolkitoverride bypasses base-class memoization for version drift across lfx builds.TOOL_TTL_SECS=30s, header-hash-keyed) so parallel agent steps with identical auth don't each pay for a fresh MCP round-trip. Distinct fromuse_cache.src/lfx/src/lfx/services/settings/base.py: raises defaultmcp_server_timeout20s → 90s (still overridable viaLANGFLOW_MCP_SERVER_TIMEOUT).Test plan
python -m py_compileon the edited modulepython -m ruff check— no new violationsMCP _get_tools: TTL cache hitin logs)Session terminatedfrom the MCP SDK