Skip to content

fix: improved better health check and stream URL check on MCP, improved JSON recognition#8982

Merged
ogabrielluiz merged 8 commits into
mainfrom
fix/mcp_component_type
Jul 10, 2025
Merged

fix: improved better health check and stream URL check on MCP, improved JSON recognition#8982
ogabrielluiz merged 8 commits into
mainfrom
fix/mcp_component_type

Conversation

@lucaseduoli
Copy link
Copy Markdown
Collaborator

@lucaseduoli lucaseduoli commented Jul 10, 2025

This pull request introduces enhancements to error handling, session management, and validation logic in the src/backend/base/langflow/base/mcp/util.py file. Key changes include improved session health checks, retry logic for closed resources, and adjustments to HTTP validation for SSE endpoints.

This fixes the Figma MCP server, both in STDIO and SSE.

Session Management Improvements:

  • Added a comprehensive health check for session streams, ensuring both background tasks and streams are functional before reusing a session.
  • Implemented retry logic for handling ClosedResourceError during tool execution, with automatic session cleanup and retries for up to two attempts. [1] [2]

Error Handling Enhancements:

  • Improved error handling in run_tool by distinguishing between expected errors (e.g., ConnectionError, TimeoutError) and unexpected ones, with proper re-raising and logging. [1] [2]

HTTP Validation for SSE Endpoints:

  • Modified HTTP validation logic to use GET requests with SSE headers instead of HEAD, accommodating servers that don't support HEAD requests. [1] [2]
  • Added handling for specific HTTP status codes (e.g., 404, 400, 500) in SSE validation, allowing graceful fallback for unsupported endpoints.

Miscellaneous:

  • Introduced constants for HTTP status codes (HTTP_NOT_FOUND, HTTP_BAD_REQUEST, HTTP_INTERNAL_SERVER_ERROR) to improve code readability and maintainability.

Summary by CodeRabbit

  • New Features

    • Improved session health checks and error handling for more reliable client connections.
    • Added automatic retries for tool operations on connection or timeout errors.
    • Enhanced validation of streaming endpoints for better compatibility.
  • Bug Fixes

    • Prevented hangs by adding timeouts to session creation and tool calls.
    • Improved detection and handling of server switches during active sessions.

@lucaseduoli lucaseduoli requested a review from ogabrielluiz July 10, 2025 13:11
@lucaseduoli lucaseduoli self-assigned this Jul 10, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 10, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The update enhances session management and error handling for MCP client sessions, introducing improved health checks, session recreation on server switch, robust retry logic for tool execution, and more accurate SSE endpoint validation. New internal methods and modifications to existing ones ensure better detection of session issues and more reliable client-server interactions.

Changes

File(s) Change Summary
src/backend/base/langflow/base/mcp/util.py Added session connectivity validation, improved session health checks, forced session recreation on server switch, session creation with timeouts, enhanced session cleanup, retry logic for tool execution, and SSE validation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MCPSessionManager
    participant MCPClient (Stdio/SSE)
    participant Server

    User->>MCPSessionManager: get_session(context_id, params, transport_type)
    MCPSessionManager->>MCPClient: Check background task and stream health
    alt Session healthy
        MCPSessionManager->>MCPClient: _validate_session_connectivity()
        alt Connectivity OK
            MCPSessionManager-->>User: Return session
        else Connectivity failed or server switched
            MCPSessionManager->>MCPClient: disconnect()
            MCPSessionManager->>MCPSessionManager: Create new session
            MCPSessionManager-->>User: Return new session
        end
    else Session unhealthy
        MCPSessionManager->>MCPClient: disconnect()
        MCPSessionManager->>MCPSessionManager: Create new session
        MCPSessionManager-->>User: Return new session
    end

    User->>MCPClient: run_tool(tool_name, arguments)
    MCPClient->>Server: Call tool (with 30s timeout)
    alt Success
        MCPClient-->>User: Return result
    else Connection/Timeout error
        MCPClient->>MCPSessionManager: Clear session cache, reset state
        MCPClient->>Server: Retry call tool (max 2 attempts)
        alt Success
            MCPClient-->>User: Return result
        else Failure
            MCPClient-->>User: Raise error
        end
    end
Loading

Possibly related PRs

  • langflow-ai/langflow#8908: Refactors session health checks in get_session, but this PR adds more comprehensive checks and retry mechanisms.

Suggested labels

bug, size:L, lgtm

Suggested reviewers

  • phact
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/mcp_component_type

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jul 10, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
src/backend/base/langflow/base/mcp/util.py (1)

776-887: Critical: Extract common retry logic to eliminate code duplication.

The run_tool implementations in both MCPStdioClient and MCPSseClient are nearly identical (100+ lines of duplicated code). This violates the DRY principle and makes maintenance difficult.

Additionally, the error detection using string matching (e.g., "ClosedResourceError" in str(type(e))) is fragile and could break with error message changes.

Create a base class or mixin with the common retry logic:

class MCPClientBase:
    """Base class for MCP clients with common retry logic."""
    
    async def _run_tool_with_retry(
        self, 
        tool_name: str, 
        arguments: dict[str, Any],
        get_session_func: Callable,
        session_context: str,
        cleanup_func: Callable
    ) -> Any:
        """Common retry logic for running tools."""
        max_retries = 2
        last_error_type = None
        
        for attempt in range(max_retries):
            try:
                logger.debug(f"Attempting to run tool '{tool_name}' (attempt {attempt + 1}/{max_retries})")
                session = await get_session_func()
                
                result = await asyncio.wait_for(
                    session.call_tool(tool_name, arguments=arguments),
                    timeout=30.0
                )
                logger.debug(f"Tool '{tool_name}' completed successfully")
                return result
                
            except Exception as e:
                current_error_type = type(e).__name__
                logger.warning(f"Tool '{tool_name}' failed on attempt {attempt + 1}: {current_error_type} - {e}")
                
                # Better error detection
                is_connection_error = self._is_connection_error(e)
                is_timeout_error = isinstance(e, (asyncio.TimeoutError, TimeoutError))
                
                if last_error_type == current_error_type and attempt > 0:
                    logger.error(f"Repeated {current_error_type} error for tool '{tool_name}', not retrying")
                    break
                    
                last_error_type = current_error_type
                
                if is_connection_error and attempt < max_retries - 1:
                    logger.warning(f"MCP session connection issue for tool '{tool_name}', retrying...")
                    await cleanup_func(session_context)
                    await asyncio.sleep(0.5)
                    continue
                    
                if is_timeout_error and attempt < max_retries - 1:
                    logger.warning(f"Tool '{tool_name}' timed out, retrying...")
                    await asyncio.sleep(1.0)
                    continue
                    
                # Handle final error
                if is_connection_error or is_timeout_error:
                    msg = f"Failed to run tool '{tool_name}' after {attempt + 1} attempts: {e}"
                    logger.error(msg)
                    self._connected = False
                    raise ValueError(msg) from e
                raise
                
        msg = f"Failed to run tool '{tool_name}': Maximum retries exceeded"
        logger.error(msg)
        raise ValueError(msg)
    
    def _is_connection_error(self, e: Exception) -> bool:
        """Check if exception is a connection-related error."""
        # Standard connection errors
        if isinstance(e, (ConnectionError, OSError, ValueError)):
            return True
            
        # MCP-specific errors - check type instead of string
        error_type_name = type(e).__name__
        if error_type_name in ("ClosedResourceError", "McpError"):
            return True
            
        # Check error message as fallback
        error_str = str(e)
        connection_keywords = [
            "Connection closed", "Connection lost", 
            "Transport closed", "Stream closed"
        ]
        return any(keyword in error_str for keyword in connection_keywords)

Then simplify the client implementations:

class MCPStdioClient(MCPClientBase):
    async def run_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any:
        # ... validation code ...
        
        return await self._run_tool_with_retry(
            tool_name=tool_name,
            arguments=arguments,
            get_session_func=self._get_or_create_session,
            session_context=self._session_context,
            cleanup_func=lambda ctx: self._get_session_manager()._cleanup_session(ctx)
        )

Also applies to: 1063-1174

🧹 Nitpick comments (3)
src/backend/base/langflow/base/mcp/util.py (3)

26-30: Consider using httpx status code constants directly.

The file already imports httpx_codes (line 13). Instead of defining duplicate constants, use the existing httpx constants for consistency.

-# HTTP status codes used in validation
-HTTP_NOT_FOUND = 404
-HTTP_BAD_REQUEST = 400
-HTTP_INTERNAL_SERVER_ERROR = 500

Then update the usage in lines 950, 957-958 to:

-if response.status_code == HTTP_NOT_FOUND:
+if response.status_code == httpx_codes.NOT_FOUND:

-if (
-    HTTP_BAD_REQUEST <= response.status_code < HTTP_INTERNAL_SERVER_ERROR
-    and response.status_code != HTTP_NOT_FOUND
-):
+if (
+    httpx_codes.BAD_REQUEST <= response.status_code < httpx_codes.INTERNAL_SERVER_ERROR
+    and response.status_code != httpx_codes.NOT_FOUND
+):

461-489: Consider extracting stream health check logic for better maintainability.

The stream health check logic is comprehensive but complex with multiple conditional branches. Consider extracting this into a separate method for better readability and testability.

async def _check_stream_health(self, session) -> bool:
    """Check if the session's write stream is still healthy."""
    try:
        if not hasattr(session, "_write_stream"):
            return True  # Can't check, assume healthy
            
        write_stream = session._write_stream
        
        # Check for explicit closed state
        if hasattr(write_stream, "_closed") and write_stream._closed:
            return False
            
        # Check anyio stream state for send channels
        if hasattr(write_stream, "_state") and hasattr(write_stream._state, "open_send_channels"):
            return write_stream._state.open_send_channels > 0
            
        # Check for other stream closed indicators
        if hasattr(write_stream, "is_closing") and callable(write_stream.is_closing):
            return not write_stream.is_closing()
            
        # Default to healthy if we can't determine state
        return True
        
    except (AttributeError, TypeError) as e:
        logger.debug(f"Could not check stream health: {e}")
        return True

Then use it in the main method:

-# Additional check for stream health
-stream_is_healthy = True
-try:
-    # Check if the session's write stream is still open
-    if hasattr(session, "_write_stream"):
-        write_stream = session._write_stream
-        # ... (rest of the checks)
-except (AttributeError, TypeError) as e:
-    # ...
-    stream_is_healthy = True
+stream_is_healthy = await self._check_stream_health(session)

888-899: Consider including disconnect in the base class refactor.

The disconnect method is also duplicated between MCPStdioClient and MCPSseClient. When creating the base class suggested earlier, include this common cleanup logic.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75e930d and e9c41cc.

📒 Files selected for processing (1)
  • src/backend/base/langflow/base/mcp/util.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`src/backend/**/*.py`: Run make format_backend to format Python code early and often Run make lint to check for linting issues in backend Python code

src/backend/**/*.py: Run make format_backend to format Python code early and often
Run make lint to check for linting issues in backend Python code

📄 Source: CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)

List of files the instruction was applied to:

  • src/backend/base/langflow/base/mcp/util.py
🧠 Learnings (1)
src/backend/base/langflow/base/mcp/util.py (1)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.464Z
Learning: Applies to src/backend/base/langflow/components/**/*.py : Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Update Starter Projects
  • GitHub Check: Run Ruff Check and Format
  • GitHub Check: Ruff Style Check (3.13)
  • GitHub Check: Optimize new Python code in this PR
🔇 Additional comments (4)
src/backend/base/langflow/base/mcp/util.py (4)

388-429: Well-implemented session connectivity validation.

The method provides a robust lightweight connectivity test with:

  • Appropriate 3-second timeout for fast failure
  • Comprehensive error handling for both standard and MCP-specific errors
  • Proper response validation
  • Clear debug logging

580-598: Excellent timeout handling for session creation.

The 10-second timeout with proper cleanup ensures sessions don't hang indefinitely. The error handling and task cancellation logic is well-implemented in both STDIO and SSE session creation methods.

Also applies to: 643-661


931-973: Excellent SSE endpoint validation improvements.

The changes properly handle SSE server compatibility:

  • Using GET with Accept: text/event-stream header instead of HEAD
  • Treating 404 as potentially valid (many SSE servers don't support non-SSE requests)
  • Interpreting timeouts as expected behavior for streaming endpoints

This should significantly improve compatibility with various SSE server implementations.


1-1174: Reminder: Run backend formatting and linting.

As per the coding guidelines for src/backend/**/*.py files, ensure you run:

  • make format_backend to format the code
  • make lint to check for any linting issues

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 10, 2025
Comment thread src/backend/base/langflow/base/mcp/util.py Outdated
Comment thread src/backend/base/langflow/base/mcp/util.py Outdated
@lucaseduoli lucaseduoli changed the title fix: improved better health check and stream URL check on MCP fix: improved better health check and stream URL check on MCP, improved JSON recognition Jul 10, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
@lucaseduoli lucaseduoli requested a review from ogabrielluiz July 10, 2025 13:24
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
@github-actions github-actions Bot removed the bug Something isn't working label Jul 10, 2025
@github-actions github-actions Bot added the bug Something isn't working label Jul 10, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jul 10, 2025
@ogabrielluiz ogabrielluiz requested a review from mfortman11 July 10, 2025 15:16
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jul 10, 2025
@github-actions
Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 71%
73.58% (117/159) 57.46% (77/134) 59.52% (25/42)

Unit Test Results

Tests Skipped Failures Errors Time
45 0 💤 0 ❌ 0 🔥 2.613s ⏱️

@ogabrielluiz ogabrielluiz added this pull request to the merge queue Jul 10, 2025
Merged via the queue into main with commit 8779593 Jul 10, 2025
77 of 78 checks passed
@ogabrielluiz ogabrielluiz deleted the fix/mcp_component_type branch July 10, 2025 17:20
smatiolids pushed a commit to smatiolids/langflow-dev that referenced this pull request Jul 10, 2025
…ed JSON recognition (langflow-ai#8982)

* Improved health check and stream URL check on MCP

* Improved health check by validating session connectivity

* Changed mcp servers from json checks

* Fixed imports

* Fixed mcp server tab test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants