Skip to content

Comments

Add SupportedUnderstudyActions to observe system prompt#1609

Merged
miguelg719 merged 5 commits intomainfrom
miguelgonzalez/stg-1206-fix-observe-returning-supported-methods
Jan 26, 2026
Merged

Add SupportedUnderstudyActions to observe system prompt#1609
miguelg719 merged 5 commits intomainfrom
miguelgonzalez/stg-1206-fix-observe-returning-supported-methods

Conversation

@miguelg719
Copy link
Collaborator

@miguelg719 miguelg719 commented Jan 26, 2026

why

Observe is failing to return doubleClick as the method because it has no knowledge of the supported actions enum. Identified in #1608

what changed

  • Renamed SupportedPlaywrightAction to SupportedUnderstudyAction
  • Included the list of supported actions in the observe system prompt (similar pattern to act's)
  • While testing dragAndDrop functionality realized there's no conversion between elementId and xpath/selector for the target (specified in args by model consistently), so added logic to replace for that specific method

test plan


Summary by cubic

Make observe return only supported methods by passing SupportedUnderstudyAction into the system prompt. Adds doubleClick and dragAndDrop, updates handlers to use the new enum, and fixes dragAndDrop target id-to-xpath conversion; addresses STG-1206.

  • Bug Fixes

    • Inject supportedActions into observe inference and prompt so returned elements include a valid, supported method.
    • Convert dragAndDrop target id to xpath in act/observe and skip invalid targets while logging.
  • Refactors

    • Renamed SupportedPlaywrightAction to SupportedUnderstudyAction and updated act/observe handlers.
    • Expanded the supported actions with doubleClick and dragAndDrop.

Written for commit 3e70a5f. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Jan 26, 2026

🦋 Changeset detected

Latest commit: 3e70a5f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@miguelg719 miguelg719 marked this pull request as ready for review January 26, 2026 02:12
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

This PR fixes a bug where the observe method was unable to return doubleClick and dragAndDrop as valid action methods because the LLM had no knowledge of which actions were supported.

Changes:

  • Renamed SupportedPlaywrightAction enum to SupportedUnderstudyAction for better clarity
  • Added doubleClick and dragAndDrop to the supported actions enum
  • Injected the list of supported actions into the observe system prompt (matching the existing pattern in act)
  • Added logic in both actHandler and observeHandler to convert dragAndDrop target element IDs to xpath selectors

The implementation correctly mirrors the pattern already established in the act functionality, where supported actions are passed to the LLM prompt. The dragAndDrop target conversion logic is duplicated in both handlers, which maintains consistency but could be refactored in the future.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured, consistent with existing patterns in the codebase, and directly address the bug described in issue LLM generates unsupported method names, eg. dblclick #1608. The enum rename improves code clarity, and the dragAndDrop target conversion logic is properly implemented in both handlers using the same approach.
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/inference.ts Added supportedActions parameter to observe function and passed it to buildObserveSystemPrompt
packages/core/lib/prompt.ts Enhanced buildObserveSystemPrompt to include supported actions list in the prompt, matching the pattern used in act
packages/core/lib/v3/types/private/handlers.ts Renamed SupportedPlaywrightAction to SupportedUnderstudyAction and added doubleClick and dragAndDrop actions
packages/core/lib/v3/handlers/actHandler.ts Updated to use renamed SupportedUnderstudyAction enum and added dragAndDrop target element ID to xpath conversion logic
packages/core/lib/v3/handlers/observeHandler.ts Added supportedActions to observe call and implemented dragAndDrop target element ID to xpath conversion logic

Sequence Diagram

sequenceDiagram
    participant Client
    participant ObserveHandler
    participant observe
    participant buildObserveSystemPrompt
    participant LLM
    participant ActHandler
    
    Client->>ObserveHandler: observe(instruction)
    ObserveHandler->>ObserveHandler: Get SupportedUnderstudyAction values
    ObserveHandler->>observe: observe({instruction, supportedActions})
    observe->>buildObserveSystemPrompt: buildObserveSystemPrompt(userInstructions, supportedActions)
    buildObserveSystemPrompt->>buildObserveSystemPrompt: Include "Supported actions: click, fill, type, ..." in prompt
    buildObserveSystemPrompt-->>observe: System prompt with actions
    observe->>LLM: Send prompt with supported actions list
    LLM-->>observe: Response with elements + valid methods (e.g., doubleClick, dragAndDrop)
    observe-->>ObserveHandler: observationResponse.elements
    ObserveHandler->>ObserveHandler: Map elementId to xpath selector
    ObserveHandler->>ObserveHandler: For dragAndDrop: convert target elementId to xpath
    ObserveHandler-->>Client: Return actions with selectors
    
    Client->>ActHandler: act(action)
    ActHandler->>ActHandler: Get SupportedUnderstudyAction values
    ActHandler->>ActHandler: For dragAndDrop: convert target elementId to xpath
    ActHandler->>ActHandler: performUnderstudyMethod(method, xpath, args)
    ActHandler-->>Client: Action completed
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Confidence score: 3/5

  • There is a concrete behavior inconsistency: in packages/core/lib/v3/handlers/observeHandler.ts, a failed xpath lookup for dragAndDrop keeps the original element ID instead of a selector, which could break downstream element resolution.
  • Given the medium severity (6/10) and user-facing impact when xpath lookup fails, this carries some regression risk despite being localized.
  • Pay close attention to packages/core/lib/v3/handlers/observeHandler.ts - failed xpath lookup keeps the raw element ID instead of a valid selector.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/handlers/observeHandler.ts">

<violation number="1" location="packages/core/lib/v3/handlers/observeHandler.ts:165">
P2: When the xpath lookup fails for a `dragAndDrop` target element, the original element ID (e.g., `"1-67"`) is silently kept as the argument instead of a valid xpath selector. This is inconsistent with how the main element handles failed lookups (returns `undefined` to filter it out). Consider either filtering out the element when target resolution fails, or at minimum logging a warning.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client
    participant Obs as ObserveHandler
    participant Lib as Inference Lib
    participant LLM as LLM Client
    participant Snap as Snapshot/XPathMap

    Note over Client,Obs: Primary Flow: Observe
    
    Client->>Obs: observe(instruction)
    Obs->>Snap: Capture Hybrid Snapshot
    Snap-->>Obs: DOM Tree & XPath Map

    Note over Obs,Lib: CHANGED: Enum now includes 'doubleClick' & 'dragAndDrop'
    Obs->>Lib: observe(instruction, dom, supportedActions=[...])

    Note over Lib: NEW: System Prompt Injection
    Lib->>Lib: Append "Supported actions: click, doubleClick..."<br/>to System Prompt string
    
    Lib->>LLM: Send Chat Completion
    LLM-->>Lib: Return JSON (Method + Args)
    Lib-->>Obs: Raw Inference Result

    Note over Obs,Snap: Response Normalization (Also applied in ActHandler)
    
    loop Process Actions
        Obs->>Obs: Parse method & arguments
        
        alt NEW: method == "dragAndDrop" AND arg is Element ID (e.g. "1-67")
            Obs->>Snap: Lookup ID in XPath Map
            alt ID Found
                Snap-->>Obs: xpath string (e.g. "//div[@id='target']")
                Obs->>Obs: Replace ID argument with "xpath=..."
            end
        end
        
        Obs->>Obs: Finalize Action object
    end

    Obs-->>Client: Return Action[]
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants