Skip to content

feat: add CUA computer-use sandbox support#7828

Merged
zouyonghe merged 33 commits intoAstrBotDevs:masterfrom
zouyonghe:feat/cua-computer-use
Apr 27, 2026
Merged

feat: add CUA computer-use sandbox support#7828
zouyonghe merged 33 commits intoAstrBotDevs:masterfrom
zouyonghe:feat/cua-computer-use

Conversation

@zouyonghe
Copy link
Copy Markdown
Member

@zouyonghe zouyonghe commented Apr 27, 2026

Summary

  • Add an optional CUA sandbox booter for AstrBot Computer Use with shell, Python, filesystem, screenshot, mouse, and keyboard support.
  • Add CUA-specific GUI tools and expose them in main-agent and handoff runtime tool paths.
  • Add config metadata for selecting CUA sandbox settings, including telemetry disabled by default.

Test Plan

  • uv run pytest tests/unit/test_func_tool_manager.py tests/unit/test_cua_computer_use.py -q
  • uv run ruff check .

Notes

  • The CUA SDK remains an optional runtime dependency; real sandbox execution requires installing cua in the AstrBot environment.

Summary by Sourcery

Add CUA-based computer-use sandbox runtime with GUI tools and configuration, and improve shell and config behavior to support it.

New Features:

  • Introduce a CUA sandbox booter with shell, Python, filesystem, and GUI components for screenshots, mouse clicks, and keyboard typing.
  • Expose new CUA GUI tools to the main agent and handoff runtime so agents can control desktop environments via sandbox.
  • Add configuration metadata and defaults for selecting and tuning CUA sandbox settings, including image, OS type, TTL, telemetry, local/remote mode, and API key.
  • Document CUA sandbox usage, configuration, and scenarios in the Chinese computer-use and agent sandbox docs.

Bug Fixes:

  • Ensure ExecuteShellTool defaults to foreground execution, avoids double-backgrounding self-detached commands, copies user env per call, and reports error types even when exception messages are blank.
  • Prevent configuration integrity logs from leaking default secret-like values by removing keys and values from log messages.
  • Ensure sandbox booters are cleanly shut down when sandbox boot or skill sync fails.

Enhancements:

  • Extend the computer booter abstraction with a GUI capability and update the overlay layer to include a GUI component type.
  • Enhance sandbox tool selection and main-agent prompts to add CUA-specific GUI guidance and tools when using the CUA booter.
  • Improve CUA filesystem behavior by providing POSIX shell fallbacks for file operations while clearly erroring on non-POSIX images and normalizing SDK result shapes.
  • Normalize CUA shell and Python result shapes across different SDK representations and support both sync and async SDK methods.

Tests:

  • Add extensive unit tests for CUA booter, GUI components, filesystem and shell fallbacks, tool registration, config metadata exposure, and screenshot tool behavior.
  • Add new tests validating ExecuteShellTool foreground/background behavior, env handling, error reporting, and self-detached command detection.
  • Add a test ensuring config integrity logging omits secret values and keys.

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 27, 2026
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Apr 27, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

AstrBotTeam's Space

pr4697的改动
View Suggested Changes
@@ -109,7 +109,9 @@
 
 **运行时工具解析**
 
-新增 `_get_runtime_computer_tools()` 方法,根据 `provider_settings.computer_use_runtime` 配置动态解析运行时工具。
+新增 `_get_runtime_computer_tools()` 方法,根据 `provider_settings.computer_use_runtime` 配置动态解析运行时工具。该方法现在接受 `booter` 参数,用于确定当运行时为 `"sandbox"` 时需要包含哪些工具集。
+
+当 `runtime="sandbox"` 且 `booter="cua"` 时,除了标准的沙盒工具外,该方法还会额外注册 CUA 特有的 GUI 工具(`CuaScreenshotTool`、`CuaMouseClickTool`、`CuaKeyboardTypeTool`),确保这些工具在 SubAgent handoff 场景下也可正常使用。
 
 **Booter 选择优化(PR #6064)**
 
@@ -117,7 +119,7 @@
 
 - 如果 `computer_use_runtime` 设置为 `"local"`,立即返回本地 booter,无需检查 sandbox booter 配置
 - 如果 `computer_use_runtime` 设置为 `"none"`,抛出 `RuntimeError` 提示沙箱运行时已禁用
-- 只有在以上条件均不满足时,才继续检查 sandbox booter 配置(如 `shipyard_neo` 等)
+- 只有在以上条件均不满足时,才继续检查 sandbox booter 配置(如 `shipyard_neo`、`cua` 等)
 
 该优化避免了在运行时类型已明确时进行不必要的沙箱配置检查,提高了边缘场景的性能。
 
@@ -133,6 +135,11 @@
   - `FILE_WRITE_TOOL`:写入 UTF-8 文本文件
   - `FILE_EDIT_TOOL`:通过精确字符串替换编辑文件
   - `GREP_TOOL`:使用 ripgrep 搜索文件内容
+  
+  当 `provider_settings.sandbox.booter` 设置为 `"cua"` 时,sandbox 运行时还包含以下 CUA 特有的 GUI 交互工具:
+  - `CuaScreenshotTool` (`astrbot_cua_screenshot`):捕获 CUA 沙盒桌面的屏幕截图,可选择是否将图片发送给用户
+  - `CuaMouseClickTool` (`astrbot_cua_mouse_click`):在 CUA 沙盒桌面的指定坐标执行鼠标点击操作(支持 left、right、middle 按钮)
+  - `CuaKeyboardTypeTool` (`astrbot_cua_keyboard_type`):在 CUA 沙盒桌面中键入文本内容
 
 - **`runtime="local"`** 或 **`runtime="local_sandboxed"`** 提供以下工具:
   - `LOCAL_EXECUTE_SHELL_TOOL`:在本地环境中执行 Shell 命令
@@ -179,6 +186,35 @@
 - `shipyard_neo_access_token`:Bay 的 API 密钥(`sk-bay-...`),留空时自动从 `credentials.json` 发现
 - `shipyard_neo_profile`:沙箱 profile 名称,如 `python-default`(留空时自动选择能力最多的 profile)
 - `shipyard_neo_ttl`:沙箱生存时间(秒),默认 3600
+
+**CUA 沙箱支持(PR #7828)**
+
+[PR #7828](https://github.com/AstrBotDevs/AstrBot/pull/7828) 新增了 `cua` 沙箱运行时选项,基于 CUA SDK 提供 Computer-Use Automation 能力,支持 GUI 交互和桌面自动化:
+
+- **Booter 类型**:`provider_settings.sandbox.booter` 现支持 `"cua"` 选项
+- **能力支持**:python、shell、filesystem、gui(screenshot、mouse、keyboard)
+- **可选依赖**:需要在 AstrBot 环境中安装 `cua` 包(`pip install cua`),该包为可选运行时依赖
+- **遥测默认禁用**:`cua_telemetry_enabled` 配置项默认为 `False`,用户可选择是否启用 CUA SDK 的遥测功能
+
+**配置项(provider_settings.sandbox)**:
+
+- `cua_image`:CUA 沙箱镜像/系统类型,默认 `"linux"`(可填写 linux、macos、windows、android,具体取决于 CUA SDK 支持)
+- `cua_os_type`:CUA 沙箱操作系统类型,默认 `"linux"`(可选:linux、macos、windows、android)
+- `cua_ttl`:CUA 沙箱生存时间(秒),默认 3600(作为会话配置保存,具体生效取决于 CUA SDK)
+- `cua_telemetry_enabled`:是否允许 CUA SDK 发送遥测数据,默认 `False`
+
+**CUA GUI 工具使用方式**:
+
+主代理在启用 CUA booter 时,系统提示词中会自动注入以下指引:
+
+```
+[CUA Desktop Control]
+Use `astrbot_cua_screenshot` to inspect the current desktop before 
+clicking or typing. Use coordinates from screenshots for 
+`astrbot_cua_mouse_click`, then `astrbot_cua_keyboard_type` for text input.
+```
+
+该指引引导 LLM 先使用截图工具查看桌面状态,再根据截图中的坐标执行鼠标点击和键盘输入操作,确保 GUI 自动化的准确性。
 
 **本地沙箱增强模式(local_sandboxed)**
 

[Accept] [Decline]

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In _screenshot_to_bytes, isinstance(raw, bytes | bytearray) will raise at runtime because isinstance does not accept PEP 604 union types; switch this to isinstance(raw, (bytes, bytearray)) (or similar tuple form) to avoid a TypeError.
  • Consider normalizing the shape of filesystem responses in CuaFileSystemComponent.list_dir (string vs. list of entries depending on whether a native filesystem is present) so downstream consumers see a consistent type for entries regardless of the backend.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_screenshot_to_bytes`, `isinstance(raw, bytes | bytearray)` will raise at runtime because `isinstance` does not accept PEP 604 union types; switch this to `isinstance(raw, (bytes, bytearray))` (or similar tuple form) to avoid a `TypeError`.
- Consider normalizing the shape of filesystem responses in `CuaFileSystemComponent.list_dir` (string vs. list of entries depending on whether a native filesystem is present) so downstream consumers see a consistent type for `entries` regardless of the backend.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="321-322" />
<code_context>
+        return {"success": bool(payload.get("success", True)), **payload}
+
+
+def _screenshot_to_bytes(raw: Any) -> bytes:
+    if isinstance(raw, bytes | bytearray):
+        return bytes(raw)
+    if isinstance(raw, str):
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix `isinstance` usage with union types in `_screenshot_to_bytes`.

`isinstance(raw, bytes | bytearray)` will raise a `TypeError` at runtime because `isinstance` doesn’t accept PEP 604 unions. This will break when a bytes-like value is returned. Please change it to `if isinstance(raw, (bytes, bytearray)):` so it works correctly.
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="413-416" />
<code_context>
+            kwargs["telemetry_enabled"] = self.telemetry_enabled
+        return kwargs
+
+    async def shutdown(self) -> None:
+        if self._sandbox_cm is not None:
+            await self._sandbox_cm.__aexit__(None, None, None)
+            self._sandbox_cm = None
+            self._sandbox = None
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Reset cached components on shutdown to avoid using a closed sandbox.

Currently `shutdown` only clears `_sandbox_cm` and `_sandbox`, leaving `_shell`, `_python`, `_fs`, and `_gui` non-None. This allows callers to keep using these properties against a closed sandbox, which can lead to subtle failures. Please also clear these component fields and/or have `available()` reflect post-shutdown unusability (e.g., by checking `_sandbox_cm` or the components).
</issue_to_address>

### Comment 3
<location path="astrbot/core/astr_agent_tool_exec.py" line_range="258-261" />
<code_context>
         runtime_computer_tools = cls._get_runtime_computer_tools(
             runtime,
             tool_mgr,
+            str(provider_settings.get("sandbox", {}).get("booter", "")),
         )

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid coercing a potentially `None` booter to the string "None".

Here `booter` is wrapped in `str(...)`, so an explicit `None` becomes the literal "None", which won’t match any runtime and is hard to debug. Instead, pass the raw value (e.g. `booter=provider_settings.get("sandbox", {}).get("booter")`) and handle `None` inside `_get_runtime_computer_tools`, or explicitly normalize falsy values to `""` there.

Suggested implementation:

```python
        runtime_computer_tools = cls._get_runtime_computer_tools(
            runtime,
            tool_mgr,
            provider_settings.get("sandbox", {}).get("booter"),
        )

```

To fully implement the suggestion, you should also update `_get_runtime_computer_tools` in the same file (or wherever it is defined) to:

1. Accept a possibly-None `booter` parameter without assuming it is a string.
2. Normalize/handle `booter` internally, for example:
   - If `booter` is `None` or another falsy value, treat it as the default (e.g. `""` or a default booter constant).
   - Otherwise ensure it is in the expected format (e.g. `str(booter)` if string is required, after the `None` check).
3. Ensure any runtime selection logic uses this normalized value instead of relying on `str(None) == "None"`.

These changes will avoid the accidental "None" string and keep the normalization logic localized to `_get_runtime_computer_tools`.
</issue_to_address>

### Comment 4
<location path="astrbot/core/computer/booters/cua.py" line_range="60" />
<code_context>
+    return ""
+
+
+class CuaShellComponent(ShellComponent):
+    def __init__(self, sandbox: Any) -> None:
+        self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider resolving sandbox method variants and filesystem capabilities once during construction and delegating via small adapters instead of doing per-call duck-typing and branching.

You can localize most of the duck-typing without losing the flexibility by doing the method resolution once at construction time and by introducing tiny adapters.

### 1) Resolve `run` vs `exec` once instead of `_call_first`

Instead of calling `_call_first` on every invocation, choose the method once in `__init__` and keep the rest of the logic unchanged:

```python
class CuaShellComponent(ShellComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        shell = sandbox.shell
        self._shell_exec = getattr(shell, "run", None) or getattr(shell, "exec", None)
        if self._shell_exec is None:
            raise RuntimeError("CUA shell must provide `run` or `exec`")

    async def exec(...):
        ...
        result = await self._shell_exec(command, **kwargs)
        payload = _maybe_model_dump(result)
        ...
```

Similarly for `CuaPythonComponent`:

```python
class CuaPythonComponent(PythonComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        python = getattr(sandbox, "python", None)
        if python is not None:
            self._python_exec = getattr(python, "run", None) or getattr(python, "exec", None)
        else:
            self._python_exec = None

    async def exec(...):
        _ = kernel_id
        if self._python_exec is not None:
            result = await self._python_exec(code, timeout=timeout)
            payload = _maybe_model_dump(result)
        else:
            shell = CuaShellComponent(self._sandbox)
            ...
```

You can then remove `_call_first` entirely. This keeps compatibility with old/new SDKs but eliminates per-call introspection and branching.

### 2) Small filesystem adapter to centralize capability resolution

The per-method `hasattr` branching in `CuaFileSystemComponent` (especially `delete_file` and `list_dir`) can be moved into a tiny adapter built once:

```python
class CuaFilesystemAdapter:
    def __init__(self, sandbox: Any, shell: CuaShellComponent) -> None:
        self._fs = getattr(sandbox, "filesystem", None)
        self._shell = shell
        self._delete = None
        self._list_dir = None

        if self._fs is not None:
            if hasattr(self._fs, "delete"):
                self._delete = self._fs.delete
            elif hasattr(self._fs, "delete_file"):
                self._delete = self._fs.delete_file

            if hasattr(self._fs, "list_dir"):
                self._list_dir = self._fs.list_dir

    async def delete(self, path: str) -> None:
        if self._delete is not None:
            await self._delete(path)
        else:
            await self._shell.exec(f"rm -rf {path!r}")

    async def list_dir(self, path: str, show_hidden: bool) -> dict[str, Any]:
        if self._list_dir is not None:
            entries = await self._list_dir(path)
            return {"success": True, "path": path, "entries": entries}

        flags = "-la" if show_hidden else "-l"
        result = await self._shell.exec(f"ls {flags} {path!r}")
        return {
            "success": not bool(result.get("stderr")),
            "path": path,
            "entries": result.get("stdout", ""),
            "error": result.get("stderr", ""),
        }
```

Then `CuaFileSystemComponent` uses this adapter:

```python
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        self._shell = CuaShellComponent(sandbox)
        self._adapter = CuaFilesystemAdapter(sandbox, self._shell)

    async def delete_file(self, path: str) -> dict[str, Any]:
        await self._adapter.delete(path)
        return {"success": True, "path": path}

    async def list_dir(self, path: str = ".", show_hidden: bool = False) -> dict[str, Any]:
        return await self._adapter.list_dir(path, show_hidden)
```

This keeps all existing fallbacks (filesystem vs shell, `delete` vs `delete_file`, etc.) but removes repeated nested conditionals and runtime introspection from each call site, making the component methods much easier to read.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py Outdated
Comment thread astrbot/core/computer/booters/cua.py Outdated
Comment thread astrbot/core/astr_agent_tool_exec.py Outdated
Comment thread astrbot/core/computer/booters/cua.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the CUA (Computer Use Agent) sandbox booter, enabling desktop GUI automation capabilities. It adds new tools for taking screenshots, clicking, and typing within the sandbox, along with the necessary booter implementation, configuration options, and unit tests. The review feedback suggests improving the robustness of the Python execution heredoc delimiter and ensuring consistency in the directory listing output format.

Comment thread astrbot/core/computer/booters/cua.py Outdated
payload = _maybe_model_dump(result)
else:
shell = CuaShellComponent(self._sandbox)
result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The heredoc delimiter PY is too short and common, which could lead to premature termination of the Python code if the string PY appears at the beginning of a line within the code block. Using a more unique and descriptive delimiter (e.g., _ASTRBOT_PYTHON_CODE_) would be safer.

Suggested change
result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
result = await shell.exec(f"python3 - <<'_ASTRBOT_PYTHON_CODE_'\n{code}\n_ASTRBOT_PYTHON_CODE_", timeout=timeout)

Comment thread astrbot/core/computer/booters/cua.py Outdated
Comment on lines +283 to +290
flags = "-la" if show_hidden else "-l"
result = await self._shell.exec(f"ls {flags} {path!r}")
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": result.get("stdout", ""),
"error": result.get("stderr", ""),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list_dir fallback implementation returns a formatted string from ls -l, whereas the SDK-based branch (line 281) likely returns a list of entries. This inconsistency can cause issues for callers expecting a structured list. It's better to use ls -1 and split the output into a list of filenames to maintain consistency with the protocol.

Suggested change
flags = "-la" if show_hidden else "-l"
result = await self._shell.exec(f"ls {flags} {path!r}")
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": result.get("stdout", ""),
"error": result.get("stderr", ""),
}
flags = "-1a" if show_hidden else "-1"
result = await self._shell.exec(f"ls {flags} {path!r}")
stdout = result.get("stdout", "")
entries = [line.strip() for line in stdout.splitlines() if line.strip()]
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": entries,
"error": result.get("stderr", ""),
}

@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The helper _call_first in the CUA booter assumes all target methods are async and awaitable; to make this more robust against different CUA SDK versions, consider detecting and handling synchronous callables (e.g., via inspect.iscoroutinefunction / iscoroutine) instead of always await-ing.
  • The shell-based fallback implementation of CuaFileSystemComponent.list_dir returns raw ls -l lines as entries, which may be hard for downstream tools to consume as filenames; you might want to normalize this to a more structured or filename-only format to keep behavior closer to the filesystem-backed variant.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The helper `_call_first` in the CUA booter assumes all target methods are async and awaitable; to make this more robust against different CUA SDK versions, consider detecting and handling synchronous callables (e.g., via `inspect.iscoroutinefunction` / `iscoroutine`) instead of always `await`-ing.
- The shell-based fallback implementation of `CuaFileSystemComponent.list_dir` returns raw `ls -l` lines as `entries`, which may be hard for downstream tools to consume as filenames; you might want to normalize this to a more structured or filename-only format to keep behavior closer to the filesystem-backed variant.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="162" />
<code_context>
+        }
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+    def __init__(self, sandbox: Any) -> None:
+        self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralising the filesystem backend branching and replacing the generic `_call_first` helper with explicit shell/python adapters to make control flow clearer and easier to follow.

You can reduce a fair bit of cognitive load in this file by localising the “multi-path” logic instead of spreading it across multiple public methods and helpers.

### 1. Centralise filesystem backend selection

`CuaFileSystemComponent` repeats the same “fs vs shell” branching in `read_file`, `write_file`, `delete_file`, and `list_dir`. Pulling this into a couple of small internal helpers keeps behaviour but makes each public method linear and easier to read.

For example:

```python
class CuaFileSystemComponent(FileSystemComponent):
    ...

    async def _fs_read_text(
        self,
        path: str,
        encoding: str = "utf-8",
    ) -> str:
        fs = self._filesystem
        if fs is not None and hasattr(fs, "read_file"):
            content = await fs.read_file(path)
        else:
            result = await self._shell.exec(f"cat {path!r}")
            if result.get("stderr"):
                raise RuntimeError(result["stderr"])
            content = result.get("stdout", "")

        if isinstance(content, bytes):
            return content.decode(encoding, errors="replace")
        return str(content)

    async def _fs_write_text(
        self,
        path: str,
        content: str,
        encoding: str = "utf-8",
    ) -> None:
        fs = self._filesystem
        if fs is not None and hasattr(fs, "write_file"):
            await fs.write_file(path, content)
            return

        encoded = base64.b64encode(content.encode(encoding)).decode()
        await self._shell.exec(f"base64 -d > {path!r} <<'EOF'\n{encoded}\nEOF")

    async def _fs_delete(self, path: str) -> None:
        fs = self._filesystem
        if fs is None:
            await self._shell.exec(f"rm -rf {path!r}")
            return

        if hasattr(fs, "delete"):
            await fs.delete(path)
        elif hasattr(fs, "delete_file"):
            await fs.delete_file(path)
        else:
            await self._shell.exec(f"rm -rf {path!r}")

    async def _fs_list(self, path: str, show_hidden: bool) -> dict[str, Any]:
        fs = self._filesystem
        if fs is not None and hasattr(fs, "list_dir"):
            entries = await fs.list_dir(path)
            return {"success": True, "path": path, "entries": entries}

        flags = "-la" if show_hidden else "-l"
        result = await self._shell.exec(f"ls {flags} {path!r}")
        return {
            "success": not bool(result.get("stderr")),
            "path": path,
            "entries": _split_listing_entries(result.get("stdout", "")),
            "error": result.get("stderr", ""),
        }
```

Then the public methods become straightforward, without changing semantics:

```python
async def read_file(...):
    try:
        content = await self._fs_read_text(path, encoding=encoding)
    except RuntimeError as exc:
        return {"success": False, "path": path, "error": str(exc)}

    return {
        "success": True,
        "path": path,
        "content": _slice_content_by_lines(
            content, offset=offset, limit=limit
        ),
    }

async def write_file(...):
    await self._fs_write_text(path, content, encoding=encoding)
    return {"success": True, "path": path}

async def delete_file(self, path: str) -> dict[str, Any]:
    await self._fs_delete(path)
    return {"success": True, "path": path}

async def list_dir(...):
    return await self._fs_list(path, show_hidden)
```

This keeps all current fallbacks (FS API, shell, base64 path) but makes each method’s control flow much more obvious.

### 2. Replace `_call_first` for shell/python with explicit CUA adapters

For `CuaShellComponent` and `CuaPythonComponent`, `_call_first` is quite generic and obscures what shapes you actually support (`run` vs `exec`). You can keep the multi-version support but move it into small, typed adapters, which also makes error modes clearer:

```python
class CuaShellComponent(ShellComponent):
    ...

    async def _run_shell(
        self,
        command: str,
        **kwargs: Any,
    ) -> Any:
        shell = self._sandbox.shell
        if hasattr(shell, "run"):
            return await shell.run(command, **kwargs)
        if hasattr(shell, "exec"):
            return await shell.exec(command, **kwargs)
        raise AttributeError("sandbox.shell has neither 'run' nor 'exec'")
```

and then:

```python
result = await self._run_shell(command, **kwargs)
payload = _maybe_model_dump(result)
...
```

Similarly for `CuaPythonComponent`:

```python
async def _run_python(self, code: str, timeout: int) -> dict[str, Any]:
    python = getattr(self._sandbox, "python", None)
    if python is not None:
        if hasattr(python, "run"):
            result = await python.run(code, timeout=timeout)
        elif hasattr(python, "exec"):
            result = await python.exec(code, timeout=timeout)
        else:
            raise AttributeError("sandbox.python has neither 'run' nor 'exec'")
        return _maybe_model_dump(result)

    shell = CuaShellComponent(self._sandbox)
    result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
    return {
        "output": result.get("stdout", ""),
        "error": result.get("stderr", ""),
    }
```

`exec` then just becomes:

```python
async def exec(...):
    payload = await self._run_python(code, timeout)
    ...
```

These changes keep compatibility with multiple CUA SDK versions, but make the contracts per component explicit and reduce the need to mentally follow generic helper logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py Outdated
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In the CuaGUIComponent methods (click/type_text), accessing sandbox.mouse/keyboard directly will raise AttributeError if the SDK object is missing those attributes; consider guarding with hasattr or a clearer error message so misconfigured CUA SDKs fail more gracefully.
  • The shell-based base64 upload/write logic in CuaFileSystemComponent.write_file and CuaBooter.upload_file is duplicated; extracting a shared helper for this pattern would reduce repetition and keep the behavior consistent if you need to tweak the shell pipeline later.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the CuaGUIComponent methods (click/type_text), accessing sandbox.mouse/keyboard directly will raise AttributeError if the SDK object is missing those attributes; consider guarding with hasattr or a clearer error message so misconfigured CUA SDKs fail more gracefully.
- The shell-based base64 upload/write logic in CuaFileSystemComponent.write_file and CuaBooter.upload_file is duplicated; extracting a shared helper for this pattern would reduce repetition and keep the behavior consistent if you need to tweak the shell pipeline later.

## Individual Comments

### Comment 1
<location path="tests/unit/test_cua_computer_use.py" line_range="103-112" />
<code_context>
+    ] is False
+
+
+@pytest.mark.asyncio
+async def test_screenshot_tool_returns_image_and_sends_file(monkeypatch, tmp_path):
+    from astrbot.core.tools.computer_tools import cua as cua_tools
+    from astrbot.core.tools.computer_tools.cua import CuaScreenshotTool
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for admin-permission failures in CUA tools to cover the negative path of `check_admin_permission`.

Currently these tools are only tested for the success case where `check_admin_permission` returns `None`. Please add tests where `check_admin_permission` returns an error (e.g., monkeypatch it to return a `ToolExecResult` or string) and assert that the tools exit early without calling `_get_gui_component` or sending messages/files. This will verify that permission checks are actually enforced and guard against regressions.
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="70" />
<code_context>
+    return [line for line in output.splitlines() if line.strip()]
+
+
+class CuaShellComponent(ShellComponent):
+    def __init__(self, sandbox: Any) -> None:
+        self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the CUA booter into smaller adapters and environment objects so that normalization, filesystem variants, and screenshot handling are centralized instead of being inlined with conditionals throughout the file.

You can keep the same feature set while making this file easier to reason about by pushing most of the “defensive” logic into small, focused adapters and normalizers, instead of sprinkling conditionals at every call site.

Below are concrete refactors that preserve functionality but reduce cognitive load.

---

### 1. Centralize `shell` / `python` normalization instead of using `_call_first` everywhere

Right now `CuaShellComponent` and `CuaPythonComponent` both have to know about `run` vs `exec` and various payload shapes. You can normalize this once, at sandbox-adapter level, and then keep the components simple.

For example:

```python
class CuaSandboxAdapter:
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        self.shell = self._wrap_shell(getattr(sandbox, "shell", None))
        self.python = self._wrap_python(getattr(sandbox, "python", None))
        self.filesystem = getattr(sandbox, "filesystem", None)
        self.screenshot = getattr(sandbox, "screenshot", None)
        self.upload_file = getattr(sandbox, "upload_file", None)
        self.download_file = getattr(sandbox, "download_file", None)

    def _wrap_shell(self, shell: Any) -> Any:
        if shell is None:
            return None

        async def run(command: str, **kwargs: Any) -> dict[str, Any]:
            raw = await _call_first(shell, ("run", "exec"), command, **kwargs)
            payload = _maybe_model_dump(raw)
            if not payload and isinstance(raw, str):
                payload = {"stdout": raw}
            return {
                "stdout": _result_text(payload, "stdout", "output"),
                "stderr": _result_text(payload, "stderr", "error"),
                "exit_code": payload.get(
                    "exit_code", payload.get("returncode", 0 if not payload.get("stderr") else 1)
                ),
                "success": bool(
                    payload.get("success", not payload.get("stderr") and payload.get("exit_code") in (0, None))
                ),
            }

        return type("ShellAdapter", (), {"run": run})

    def _wrap_python(self, python: Any) -> Any:
        if python is None:
            return None

        async def run(code: str, timeout: int) -> dict[str, Any]:
            raw = await _call_first(python, ("run", "exec"), code, timeout=timeout)
            return _maybe_model_dump(raw)

        return type("PythonAdapter", (), {"run": run})
```

Then `CuaShellComponent.exec` becomes much more straightforward:

```python
class CuaShellComponent(ShellComponent):
    def __init__(self, adapter: CuaSandboxAdapter) -> None:
        self._adapter = adapter

    async def exec(...):
        if not shell:
            return { ... }

        kwargs = {...}
        if self._adapter.shell is None:
            return { "stdout": "", "stderr": "shell not available", "exit_code": 1, "success": False }

        result = await self._adapter.shell.run(command, **kwargs)

        # background-specific post-processing stays here
        ...
```

Similarly, `CuaPythonComponent` can just rely on `adapter.python.run`, and only implement the “fallback via shell” branch.

This keeps the “run vs exec / output vs stdout / returncode vs exit_code” logic in **one place** instead of being re-applied in each component.

---

### 2. Split filesystem into two implementations instead of branching in every method

`CuaFileSystemComponent` currently has multiple branches per method (SDK vs shell, multiple method names). You can instead choose an implementation once in `__init__` and delegate.

For example:

```python
class CuaFSViaSDK(FileSystemComponent):
    def __init__(self, fs: Any) -> None:
        self._fs = fs

    async def read_file(self, path: str, **kwargs: Any) -> dict[str, Any]:
        content = await self._fs.read_file(path)
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(str(content), **kwargs),
        }

    async def write_file(self, path: str, content: str, **kwargs: Any) -> dict[str, Any]:
        await self._fs.write_file(path, content)
        return {"success": True, "path": path}

    async def delete_file(self, path: str) -> dict[str, Any]:
        if hasattr(self._fs, "delete"):
            await self._fs.delete(path)
        else:
            await self._fs.delete_file(path)
        return {"success": True, "path": path}

    async def list_dir(self, path: str = ".", show_hidden: bool = False) -> dict[str, Any]:
        entries = await self._fs.list_dir(path)
        return {"success": True, "path": path, "entries": entries}
```

```python
class CuaFSViaShell(FileSystemComponent):
    def __init__(self, shell: CuaShellComponent) -> None:
        self._shell = shell

    async def read_file(...):
        result = await self._shell.exec(f"cat {path!r}")
        ...

    async def write_file(...):
        ...

    async def delete_file(...):
        ...

    async def list_dir(...):
        ...
```

Then `CuaFileSystemComponent` becomes a thin delegator:

```python
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox_adapter: CuaSandboxAdapter) -> None:
        if sandbox_adapter.filesystem is not None:
            self._impl: FileSystemComponent = CuaFSViaSDK(sandbox_adapter.filesystem)
        else:
            shell = CuaShellComponent(sandbox_adapter)
            self._impl = CuaFSViaShell(shell)

    async def read_file(self, *args, **kwargs):
        return await self._impl.read_file(*args, **kwargs)

    async def write_file(self, *args, **kwargs):
        return await self._impl.write_file(*args, **kwargs)

    async def delete_file(self, *args, **kwargs):
        return await self._impl.delete_file(*args, **kwargs)

    async def list_dir(self, *args, **kwargs):
        return await self._impl.list_dir(*args, **kwargs)
```

This preserves all current behavior but removes 2–3 branches from each public filesystem method.

---

### 3. Narrow `_screenshot_to_bytes` by explicit format handlers

The current `_screenshot_to_bytes` recursively handles many shapes. You can make this more legible by separating the recognition and conversion logic.

For example:

```python
def _screenshot_to_bytes(raw: Any) -> bytes:
    # 1) direct bytes-like
    if isinstance(raw, (bytes, bytearray)):
        return bytes(raw)

    # 2) known dict payloads
    payload = _maybe_model_dump(raw)
    for key in ("data", "base64", "image"):
        if key in payload:
            return _screenshot_to_bytes(payload[key])

    # 3) string formats (data URL / base64 / file path)
    if isinstance(raw, str):
        return _screenshot_str_to_bytes(raw)

    # 4) PIL-like objects
    if hasattr(raw, "save"):
        return _screenshot_image_to_bytes(raw)

    raise TypeError(f"Unsupported CUA screenshot result: {type(raw)!r}")


def _screenshot_str_to_bytes(value: str) -> bytes:
    if value.startswith("data:image"):
        value = value.split(",", 1)[1]

    try:
        return base64.b64decode(value, validate=True)
    except Exception:
        path = Path(value)
        if path.is_file():
            return path.read_bytes()
        return value.encode("utf-8")


def _screenshot_image_to_bytes(image: Any) -> bytes:
    import io
    buf = io.BytesIO()
    image.save(buf, format="PNG")
    return buf.getvalue()
```

This keeps the same supported shapes but flattens the control flow and makes it easier to see “what formats we actually support” at a glance.

---

### 4. Replace nullable booter attributes with a small environment object

Instead of tracking `_sandbox`, `_sandbox_cm`, `_shell`, `_python`, `_fs`, `_gui` separately, you can group them in a single struct-like object whose presence/absence encodes the state. This reduces the number of nullable fields and manual resets.

```python
@dataclass
class CuaEnvironment:
    sandbox_cm: Any
    sandbox: Any
    shell: CuaShellComponent
    python: CuaPythonComponent
    fs: CuaFileSystemComponent
    gui: CuaGUIComponent | None
```

```python
class CuaBooter(ComputerBooter):
    def __init__(...):
        ...
        self._env: CuaEnvironment | None = None

    async def boot(self, session_id: str) -> None:
        ...
        sandbox_cm = Sandbox.ephemeral(image_obj, **ephemeral_kwargs)
        sandbox = await sandbox_cm.__aenter__()
        adapter = CuaSandboxAdapter(sandbox)
        env = CuaEnvironment(
            sandbox_cm=sandbox_cm,
            sandbox=sandbox,
            shell=CuaShellComponent(adapter),
            python=CuaPythonComponent(adapter),
            fs=CuaFileSystemComponent(adapter),
            gui=CuaGUIComponent(sandbox),
        )
        self._env = env
        ...

    async def shutdown(self) -> None:
        if self._env is None:
            return
        await self._env.sandbox_cm.__aexit__(None, None, None)
        self._env = None

    @property
    def fs(self) -> FileSystemComponent:
        if self._env is None:
            raise RuntimeError("CuaBooter is not initialized.")
        return self._env.fs

    # same for shell/python/gui
```

This keeps the lifecycle semantics unchanged but collapses the state space from many nullable attributes to a single “env present or not” check.

---

All of these suggestions preserve your current compatibility shims while making the code more modular and easier to audit. You can adopt them incrementally (e.g., start with the filesystem split or the screenshot refactor) without changing the external `ComputerBooter` contract.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/unit/test_cua_computer_use.py
Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In CuaBooter.capabilities you always advertise GUI/mouse/keyboard support, but CuaGUIComponent can raise at runtime if the underlying sandbox lacks those devices; consider deriving capabilities dynamically from the initialized sandbox (e.g., presence of mouse, keyboard, screenshot) to avoid mismatches between reported and actual features.
  • Helpers like _write_base64_via_shell assume base64 -d is available, while cua_image/cua_os_type can be non-Linux; it may be safer to either gate these shell fallbacks to Linux images or choose a more portable approach (or OS-specific flags) so macOS/Windows images don’t fail on basic file operations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `CuaBooter.capabilities` you always advertise GUI/mouse/keyboard support, but `CuaGUIComponent` can raise at runtime if the underlying sandbox lacks those devices; consider deriving capabilities dynamically from the initialized sandbox (e.g., presence of `mouse`, `keyboard`, `screenshot`) to avoid mismatches between reported and actual features.
- Helpers like `_write_base64_via_shell` assume `base64 -d` is available, while `cua_image`/`cua_os_type` can be non-Linux; it may be safer to either gate these shell fallbacks to Linux images or choose a more portable approach (or OS-specific flags) so macOS/Windows images don’t fail on basic file operations.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="27" />
<code_context>
+    data: bytes,
+) -> dict[str, Any]:
+    encoded = base64.b64encode(data).decode("ascii")
+    return await shell.exec(f"base64 -d > {path!r} <<'EOF'\n{encoded}\nEOF")
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider using a more portable base64 decoding strategy for non-Linux images (e.g., macOS).

Hardcoding `base64 -d` assumes GNU coreutils and will fail on BSD/macOS, which use `-D`. Since `cua_os_type` includes `macos`, this will break file writes on those sandboxes. Consider detecting the supported flag, using a portable approach like `python -m base64 -d`, or using the CUA filesystem API instead of a shell-specific command.
</issue_to_address>

### Comment 2
<location path="astrbot/core/astr_agent_tool_exec.py" line_range="192-195" />
<code_context>
         cls,
         runtime: str,
         tool_mgr,
+        booter: str | None = None,
     ) -> dict[str, FunctionTool]:
+        booter = "" if booter is None else str(booter)
         if runtime == "sandbox":
             shell_tool = tool_mgr.get_builtin_tool(ExecuteShellTool)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Normalize the booter name to avoid case-sensitivity issues when enabling CUA tools.

Since `booter` is only coerced to `str` before `if booter == "cua":`, passing values like "CUA" or "Cua" would skip registering the extra tools. Consider normalizing, e.g. `booter = "" if booter is None else str(booter).lower()`, so case variants still enable CUA tools while preserving current behavior.

```suggestion
        booter: str | None = None,
    ) -> dict[str, FunctionTool]:
        booter = "" if booter is None else str(booter).lower()
        if runtime == "sandbox":
```
</issue_to_address>

### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="94" />
<code_context>
+    return method
+
+
+class CuaShellComponent(ShellComponent):
+    def __init__(self, sandbox: Any) -> None:
+        self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing process-result and shell-call normalization into shared helpers so components don’t each re-implement the same introspective logic.

You can cut a fair bit of cognitive load by centralizing the “shape‑normalization” instead of sprinkling generic helpers across components.

### 1. Normalize process results once via a small dataclass

Right now both `CuaShellComponent.exec` and `CuaPythonComponent.exec` call:

- `_call_first`
- `_maybe_model_dump`
- `_result_text`
- custom fallback logic

You can keep all the compatibility behavior but move it into a single normalization function + dataclass that every component uses.

Example:

```python
from dataclasses import dataclass

@dataclass
class ProcessResult:
    stdout: str
    stderr: str
    exit_code: int | None
    success: bool

def _normalize_process_result(raw: Any) -> ProcessResult:
    payload = _maybe_model_dump(raw)

    # Allow pure string fallback (e.g. shell returns str)
    if not payload and isinstance(raw, str):
        payload = {"stdout": raw}

    stdout = _result_text(payload, "stdout", "output")
    stderr = _result_text(payload, "stderr", "error")
    exit_code = payload.get("exit_code", payload.get("returncode", 0 if not stderr else 1))
    success = bool(payload.get("success", not stderr and exit_code in (0, None)))

    return ProcessResult(stdout=stdout, stderr=stderr, exit_code=exit_code, success=success)
```

Then `CuaShellComponent.exec` becomes more declarative and composable:

```python
async def exec(...):
    ...
    result = await _call_first(self._sandbox.shell, ("run", "exec"), command, **kwargs)
    proc = _normalize_process_result(result)

    response: dict[str, Any] = {
        "stdout": proc.stdout,
        "stderr": proc.stderr,
        "exit_code": proc.exit_code,
        "success": proc.success,
    }

    if background:
        try:
            response["pid"] = int(proc.stdout.strip().splitlines()[-1])
        except Exception:
            response["pid"] = None

    return response
```

And `CuaPythonComponent.exec` can reuse the same normalization, reducing duplicated, introspective logic:

```python
async def exec(...):
    _ = kernel_id

    python = getattr(self._sandbox, "python", None)
    if python is not None:
        raw = await _call_first(python, ("run", "exec"), code, timeout=timeout)
        proc = _normalize_process_result(raw)
    else:
        shell = CuaShellComponent(self._sandbox)
        shell_result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
        # shell_result is already normalized; just map fields
        proc = ProcessResult(
            stdout=shell_result.get("stdout", ""),
            stderr=shell_result.get("stderr", ""),
            exit_code=shell_result.get("exit_code"),
            success=shell_result.get("success", False),
        )

    output_text = "" if silent else proc.stdout
    error_text = proc.stderr

    return {
        "success": proc.success if not silent else bool(not error_text),
        "data": {
            "output": {"text": output_text, "images": []},
            "error": error_text,
        },
        "output": output_text,
        "error": error_text,
    }
```

This keeps all the existing flexibility (`stdout` vs `output`, `returncode` vs `exit_code`, string results, etc.) but:

- Consumers no longer need to know about the different shapes.
- The normalization logic is in one place, making it easier to audit and extend.

### 2. Consider a small shell adapter to avoid `_call_first` at call sites

If you want to retain support for both `run` and `exec` on the CUA shell, you can still avoid scattering `_call_first` by adapting once:

```python
class _ShellAdapter:
    def __init__(self, shell: Any) -> None:
        if hasattr(shell, "run"):
            self._exec = shell.run
        elif hasattr(shell, "exec"):
            self._exec = shell.exec
        else:
            raise AttributeError("Shell has neither `run` nor `exec`")

    async def exec(self, *args: Any, **kwargs: Any) -> Any:
        return await _maybe_await(self._exec(*args, **kwargs))
```

Then `CuaShellComponent` can wrap the CUA shell in this adapter in `__init__` and call a single `exec`, instead of `_call_first`:

```python
class CuaShellComponent(ShellComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        self._shell_adapter = _ShellAdapter(sandbox.shell)

    async def exec(...):
        ...
        result = await self._shell_adapter.exec(command, **kwargs)
        proc = _normalize_process_result(result)
        ...
```

This preserves the current behavior (try `run`, then `exec`) but makes the control flow much easier to follow in the main component methods.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py Outdated
Comment thread astrbot/core/astr_agent_tool_exec.py
Comment thread astrbot/core/computer/booters/cua.py
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 27, 2026
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Apr 27, 2026
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Comment thread astrbot/core/config/astrbot_config.py Fixed
Comment thread astrbot/core/config/astrbot_config.py Fixed
Comment thread astrbot/core/config/astrbot_config.py Fixed
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The CuaBooter.download_file shell fallback always uses base64 without the same POSIX / python3 capability checks you added elsewhere, so it may be worth aligning this path with the other filesystem/shell fallbacks (e.g., OS-type guard and clearer error messaging) to avoid confusing failures on non-POSIX images.
  • Several helpers encode similar logic for reporting missing python3 or shell-based filesystem fallbacks (e.g., _python3_requirement_error, filesystem write/read fallbacks, background launcher); consider centralizing this behavior to reduce the chance of drift and keep error messages consistent across CUA components.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The CuaBooter.download_file shell fallback always uses `base64` without the same POSIX / python3 capability checks you added elsewhere, so it may be worth aligning this path with the other filesystem/shell fallbacks (e.g., OS-type guard and clearer error messaging) to avoid confusing failures on non-POSIX images.
- Several helpers encode similar logic for reporting missing `python3` or shell-based filesystem fallbacks (e.g., `_python3_requirement_error`, filesystem write/read fallbacks, background launcher); consider centralizing this behavior to reduce the chance of drift and keep error messages consistent across CUA components.

## Individual Comments

### Comment 1
<location path="tests/unit/test_cua_computer_use.py" line_range="124-129" />
<code_context>
+    assert sent_messages == []
+
+
+@pytest.mark.asyncio
+async def test_cua_tools_include_exception_type_for_blank_error(monkeypatch):
+    from astrbot.core.tools.computer_tools import cua as cua_tools
+    from astrbot.core.tools.computer_tools.cua import CuaMouseClickTool
+
+    class BlankError(Exception):
+        def __str__(self):
+            return ""
+
+    class FakeEvent:
+        unified_msg_origin = "umo"
+        role = "admin"
+
+    class FakeAstrContext:
+        event = FakeEvent()
+        context = FakeContext(
+            {"provider_settings": {"computer_use_require_admin": True}}
+        )
+
+    class FakeWrapper:
+        context = FakeAstrContext()
+
+    async def fail_gui_lookup(context):
+        raise BlankError()
+
+    monkeypatch.setattr(cua_tools, "_get_gui_component", fail_gui_lookup)
+
+    assert await CuaMouseClickTool().call(FakeWrapper(), x=1, y=2) == (
+        "Error clicking CUA desktop: BlankError"
+    )
</code_context>
<issue_to_address>
**suggestion (testing):** Add happy-path tests for CUA mouse and keyboard tools to ensure correct wiring to the GUI component.

Current tests emphasize error handling, but none cover successful execution of `CuaMouseClickTool` or `CuaKeyboardTypeTool` through `_get_gui_component` to the underlying GUI component. Please add happy-path tests (e.g., injecting a fake GUI with `click`/`type_text`, asserting `_get_gui_component` is used, arguments are forwarded correctly, and the JSON response matches the fake payload) to verify correct wiring and serialization.

```suggestion
class SyncPython:
    def run(self, code: str, **kwargs):
        return {"output": "sync", "error": ""}


@pytest.mark.asyncio
async def test_cua_mouse_click_tool_happy_path_forwards_args_and_serializes_json(
    monkeypatch,
):
    import json

    from astrbot.core.tools.computer_tools import cua as cua_tools
    from astrbot.core.tools.computer_tools.cua import CuaMouseClickTool

    class FakeEvent:
        unified_msg_origin = "umo"
        role = "admin"

    class FakeAstrContext:
        event = FakeEvent()
        context = FakeContext(
            {"provider_settings": {"computer_use_require_admin": True}}
        )

    class FakeWrapper:
        context = FakeAstrContext()

    class FakeGui:
        def __init__(self):
            self.clicked_args = None

        async def click(self, x, y, button=None):
            self.clicked_args = (x, y, button)
            return {
                "status": "ok",
                "x": x,
                "y": y,
                "button": button,
            }

    fake_gui = FakeGui()
    get_gui_called = {"value": False}

    async def fake_get_gui_component(context):
        get_gui_called["value"] = True
        # Ensure the context passed into the helper is the wrapper's context
        assert context is FakeWrapper().context  # smoke check for wiring
        return fake_gui

    # Ensure tools use our fake GUI component
    monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component)

    tool = CuaMouseClickTool()

    # Call the tool and validate behavior
    result = await tool.call(FakeWrapper(), x=10, y=20, button="left")

    assert get_gui_called["value"] is True
    assert fake_gui.clicked_args == (10, 20, "left")
    assert json.loads(result) == {
        "status": "ok",
        "x": 10,
        "y": 20,
        "button": "left",
    }


@pytest.mark.asyncio
async def test_cua_keyboard_type_tool_happy_path_forwards_args_and_serializes_json(
    monkeypatch,
):
    import json

    from astrbot.core.tools.computer_tools import cua as cua_tools
    from astrbot.core.tools.computer_tools.cua import CuaKeyboardTypeTool

    class FakeEvent:
        unified_msg_origin = "umo"
        role = "admin"

    class FakeAstrContext:
        event = FakeEvent()
        context = FakeContext(
            {"provider_settings": {"computer_use_require_admin": True}}
        )

    class FakeWrapper:
        context = FakeAstrContext()

    class FakeGui:
        def __init__(self):
            self.typed_text_args = None

        async def type_text(self, text):
            self.typed_text_args = (text,)
            return {
                "status": "ok",
                "text": text,
            }

    fake_gui = FakeGui()
    get_gui_called = {"value": False}

    async def fake_get_gui_component(context):
        get_gui_called["value"] = True
        # Ensure the context passed into the helper is the wrapper's context
        assert context is FakeWrapper().context  # smoke check for wiring
        return fake_gui

    # Ensure tools use our fake GUI component
    monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component)

    tool = CuaKeyboardTypeTool()

    # Call the tool and validate behavior
    text = "Hello CUA"
    result = await tool.call(FakeWrapper(), text=text)

    assert get_gui_called["value"] is True
    assert fake_gui.typed_text_args == (text,)
    assert json.loads(result) == {
        "status": "ok",
        "text": text,
    }


def _agent_computer_use_items():
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/unit/test_cua_computer_use.py
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In CuaBooter.download_file, the fallback path always shells out to base64 without checking os_type, unlike the other filesystem fallbacks which guard on POSIX; consider reusing _is_posix_os_type and returning a structured error for non-POSIX images to keep behavior consistent and avoid broken downloads on Windows-like images.
  • The CUA-related tests define many repeated FakeContext/FakeEvent/FakeWrapper/FakeBooter scaffolds inline; extracting these into reusable fixtures or helper factories would make the tests easier to maintain and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `CuaBooter.download_file`, the fallback path always shells out to `base64` without checking `os_type`, unlike the other filesystem fallbacks which guard on POSIX; consider reusing `_is_posix_os_type` and returning a structured error for non-POSIX images to keep behavior consistent and avoid broken downloads on Windows-like images.
- The CUA-related tests define many repeated `FakeContext`/`FakeEvent`/`FakeWrapper`/`FakeBooter` scaffolds inline; extracting these into reusable fixtures or helper factories would make the tests easier to maintain and reduce duplication.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="679-688" />
<code_context>
+            **result,
+        }
+
+    async def download_file(self, remote_path: str, local_path: str) -> None:
+        sandbox = None if self._runtime is None else self._runtime.sandbox
+        if sandbox is not None and hasattr(sandbox, "download_file"):
+            await sandbox.download_file(remote_path, local_path)
+            return
+        result = await self.shell.exec(f"base64 {shlex.quote(remote_path)}")
+        if result.get("stderr"):
+            raise RuntimeError(result["stderr"])
+        Path(local_path).parent.mkdir(parents=True, exist_ok=True)
+        Path(local_path).write_bytes(base64.b64decode(result.get("stdout", "")))
+
+    async def available(self) -> bool:
</code_context>
<issue_to_address>
**issue (bug_risk):** CUA download_file assumes POSIX `base64` availability even for non-POSIX images.

Unlike the other filesystem fallbacks, this method unconditionally shells out to `base64`, which will likely fail on Windows or other non-POSIX images and surface as a confusing runtime error. Please either guard this path with `_is_posix_os_type(self.os_type)` and return a `_non_posix_filesystem_result`-style error for non-POSIX, or invoke `python3 - <<` with a small base64 decode helper (and clear error messaging if `python3` is missing) so behavior is consistent across OS types.
</issue_to_address>

### Comment 2
<location path="tests/unit/test_func_tool_manager.py" line_range="52-61" />
<code_context>
+    return CONFIG_METADATA_3["ai_group"]["metadata"]["agent_computer_use"]["items"]
+
+
+@pytest.mark.asyncio
+async def test_get_booter_creates_cua_booter(monkeypatch):
+    from astrbot.core.computer import computer_client
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case to ensure user-provided env dicts are not mutated by ExecuteShellTool

The existing test only covers the `env=None` case. Since `call` now does `env=dict(env or {})`, please add a test that passes a shared `env` (e.g. `{"FOO": "bar"}`) and asserts that the dict passed into `FakeShell.exec` includes both the original entries and the mutation (e.g. `MUTATED_BY_FAKE_SHELL`), while the original `env` object remains unchanged after the call. This guards against `ExecuteShellTool` mutating caller-owned env dicts.
</issue_to_address>

### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="465" />
<code_context>
+    }
+
+
+class CuaGUIComponent(GUIComponent):
+    def __init__(self, sandbox: Any) -> None:
+        self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider resolving and caching GUI component methods in CuaGUIComponent.__init__ so the click/type/press_key methods become straight-line calls instead of doing reflective lookups on every invocation.

You can reduce a noticeable layer of indirection in the GUI path by resolving component methods once in `__init__` instead of doing late-bound lookups on every call.

Today each GUI call goes through `_require_component_method` + `getattr` chains:

```python
async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
    click = _require_component_method(self._sandbox, "mouse", "click")
    result = await _maybe_await(click(x, y, button=button))
    payload = _maybe_model_dump(result)
    return {"success": bool(payload.get("success", True)), **payload}
```

That makes the call path harder to follow and repeats the same reflection logic.

You can keep the same functionality but fail-fast and simplify call sites by resolving and caching these methods in `CuaGUIComponent.__init__`:

```python
class CuaGUIComponent(GUIComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox

        mouse = getattr(sandbox, "mouse", None)
        keyboard = getattr(sandbox, "keyboard", None)

        self._click = getattr(mouse, "click", None) if mouse is not None else None
        self._type_text = getattr(keyboard, "type", None) if keyboard is not None else None

        # preserve the current fallback behaviour for press_key
        self._press_key = None
        if keyboard is not None:
            for name in ("press", "key_press", "press_key"):
                method = getattr(keyboard, name, None)
                if method is not None:
                    self._press_key = method
                    break
```

Then the methods become straight-line and easier to reason about:

```python
    async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
        if self._click is None:
            raise RuntimeError(
                "CUA sandbox does not provide mouse.click; "
                "please check the installed CUA SDK version and sandbox backend."
            )
        result = await _maybe_await(self._click(x, y, button=button))
        payload = _maybe_model_dump(result)
        return {"success": bool(payload.get("success", True)), **payload}

    async def type_text(self, text: str) -> dict[str, Any]:
        if self._type_text is None:
            raise RuntimeError(
                "CUA sandbox does not provide keyboard.type; "
                "please check the installed CUA SDK version and sandbox backend."
            )
        result = await _maybe_await(self._type_text(text))
        payload = _maybe_model_dump(result)
        return {"success": bool(payload.get("success", True)), **payload}

    async def press_key(self, key: str) -> dict[str, Any]:
        if self._press_key is None:
            raise RuntimeError(
                "CUA sandbox does not provide keyboard.press/key_press/press_key; "
                "please check the installed CUA SDK version and sandbox backend."
            )
        result = await _maybe_await(self._press_key(key))
        payload = _maybe_model_dump(result)
        return {"success": bool(payload.get("success", True)), **payload}
```

This keeps all existing behaviour (same fallbacks and error semantics) but:

- Removes repeated `_require_component_method` calls and method scans on every invocation.
- Makes each GUI operation’s implementation locally obvious.
- Aligns with the “resolve at boot/init time, not at each call” simplification direction highlighted by the other reviewer.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py
Comment thread tests/unit/test_func_tool_manager.py
Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The changes in AstrBotConfig.check_config_integrity now emit very generic log messages (e.g., no key/path in the "配置项不存在" / "未知配置项" logs), which may make config debugging harder; consider including a redacted or partial path (or a non-sensitive identifier) so that operators can still identify which setting was inserted/reordered without exposing secrets.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The changes in `AstrBotConfig.check_config_integrity` now emit very generic log messages (e.g., no key/path in the "配置项不存在" / "未知配置项" logs), which may make config debugging harder; consider including a redacted or partial path (or a non-sensitive identifier) so that operators can still identify which setting was inserted/reordered without exposing secrets.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="65" />
<code_context>
+    )
+
+
+@dataclass(slots=True)
+class ProcessResult:
+    stdout: str
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the CUA booter by standardizing per-component result shapes and resolving GUI methods explicitly instead of using generic normalization helpers and reflective utilities.

You can reduce a noticeable amount of indirection in this file by **standardizing the internal schema per component** and replacing some of the “generic normalization” helpers with small, explicit adapters.

Below are two focused refactors that keep all existing behavior but make the code easier to follow:

---

### 1. Narrow `ProcessResult` normalization to known CUA shapes

Right now `_normalize_process_result` + `_maybe_model_dump` + `_result_text` attempt to handle arbitrary objects/dicts/strings. If the CUA SDK only actually returns a small set of shapes (e.g. current SDK plus maybe one legacy variant), you can encode that explicitly instead of handling “any dict-like”.

That lets you delete `_maybe_model_dump` and `_result_text` entirely and make `_normalize_process_result` much more obvious:

```python
@dataclass(slots=True)
class ProcessResult:
    stdout: str
    stderr: str
    exit_code: int | None
    success: bool


def _normalize_process_result(raw: Any) -> ProcessResult:
    # 1. Already in internal shape
    if isinstance(raw, ProcessResult):
        return raw

    # 2. Known SDK shapes
    # Example shape A: {"stdout": ..., "stderr": ..., "exit_code": ..., "success": ...}
    if isinstance(raw, dict):
        stdout = str(raw.get("stdout") or raw.get("output") or "")
        stderr = str(raw.get("stderr") or raw.get("error") or "")
        exit_code = raw.get("exit_code")
        if exit_code is None:
            exit_code = raw.get("returncode")
        if exit_code is None:
            exit_code = 0 if not stderr else 1

        success = raw.get("success")
        if success is None:
            success = not stderr and exit_code in (0, None)

        return ProcessResult(
            stdout=stdout,
            stderr=stderr,
            exit_code=exit_code,
            success=bool(success),
        )

    # 3. String fallback (CUA shell sometimes returns plain stdout)
    if isinstance(raw, str):
        return ProcessResult(stdout=raw, stderr="", exit_code=0, success=True)

    # 4. Pydantic-style objects (only if you actually need to support this)
    if hasattr(raw, "model_dump"):
        return _normalize_process_result(raw.model_dump())
    if hasattr(raw, "dict"):
        return _normalize_process_result(raw.dict())

    # 5. Last-resort fallback
    return ProcessResult(stdout=str(raw), stderr="", exit_code=None, success=True)
```

Usage in `CuaShellComponent` and `CuaPythonComponent` stays the same:

```python
result = await _maybe_await(self._exec_raw(command, **kwargs))
proc = _normalize_process_result(result)
```

Benefits:

- All logic is in one place; you can see exactly which shapes are supported.
- You can easily add/remove branches when CUA SDK changes, instead of trying to handle arbitrary “dict-like” payloads.
- `_maybe_model_dump` and `_result_text` are no longer needed.

---

### 2. Simplify GUI component method resolution

`_resolve_component_method`, `_missing_component_method_error`, and `_has_component_method` add a lot of generic plumbing to support multiple method names. If the supported CUA SDK variants are known, you can resolve and freeze the exact methods once in `__init__`, and drop the generic helpers.

```python
class CuaGUIComponent(GUIComponent):
    def __init__(self, sandbox: Any) -> None:
        self._sandbox = sandbox
        mouse = getattr(sandbox, "mouse", None)
        keyboard = getattr(sandbox, "keyboard", None)

        # Resolve once and keep explicit attributes
        self._click = getattr(mouse, "click", None) if mouse else None

        # Support a small, explicit set of variants
        self._type_text = None
        if keyboard is not None:
            self._type_text = getattr(keyboard, "type", None)
            if self._type_text is None:
                self._type_text = getattr(keyboard, "type_text", None)

            self._press_key = (
                getattr(keyboard, "press", None)
                or getattr(keyboard, "key_press", None)
                or getattr(keyboard, "press_key", None)
            )
        else:
            self._press_key = None

    async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
        if self._click is None:
            raise RuntimeError("CUA sandbox does not provide mouse.click")
        result = await _maybe_await(self._click(x, y, button=button))
        payload = result if isinstance(result, dict) else {}
        return {"success": bool(payload.get("success", True)), **payload}

    async def type_text(self, text: str) -> dict[str, Any]:
        if self._type_text is None:
            raise RuntimeError("CUA sandbox does not provide keyboard.type/_type_text")
        result = await _maybe_await(self._type_text(text))
        payload = result if isinstance(result, dict) else {}
        return {"success": bool(payload.get("success", True)), **payload}

    async def press_key(self, key: str) -> dict[str, Any]:
        if self._press_key is None:
            raise RuntimeError(
                "CUA sandbox does not provide keyboard.press/key_press/press_key"
            )
        result = await _maybe_await(self._press_key(key))
        payload = result if isinstance(result, dict) else {}
        return {"success": bool(payload.get("success", True)), **payload}
```

With this refactor:

- `_resolve_component_method`, `_missing_component_method_error`, and `_has_component_method` can be removed or inlined where strictly needed.
- The capabilities detection in `CuaBooter.capabilities` can directly inspect `self._runtime.gui._click` / `_type_text` instead of walking the sandbox again.

---

These two changes alone:

- Preserve your support for multiple CUA SDK layouts.
- Remove several generic helpers and reflective paths.
- Make the booter behavior easier to read and test because the accepted shapes and method variants are explicitly encoded instead of “any dict/model with some keys”.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The revised config integrity logging now omits both the key path and value, which makes troubleshooting harder; consider logging a sanitized path (e.g., provider_settings.sandbox.*) while still redacting concrete key names and values so you retain some debuggability without leaking secrets.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The revised config integrity logging now omits both the key path and value, which makes troubleshooting harder; consider logging a sanitized path (e.g., `provider_settings.sandbox.*`) while still redacting concrete key names and values so you retain some debuggability without leaking secrets.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="125-132" />
<code_context>
+    return ""
+
+
+def _normalize_process_result(raw: Any) -> ProcessResult:
+    payload = _maybe_model_dump(raw)
+    if not payload and isinstance(raw, str):
+        payload = {"stdout": raw}
+
+    stdout = _result_text(payload, "stdout", "output")
+    stderr = _result_text(payload, "stderr", "error")
+    exit_code = payload.get(
+        "exit_code", payload.get("returncode", 0 if not stderr else 1)
+    )
</code_context>
<issue_to_address>
**issue (bug_risk):** Exit code normalization treats explicit `None` the same as a missing value, ignoring `returncode` when present.

In `_normalize_process_result`, `exit_code` is pulled with `payload.get("exit_code", payload.get("returncode", ...))`. This means a payload like `{"exit_code": None, "returncode": 1}` will yield `None` and never use the available `returncode`. To distinguish between a missing key and an explicit `None`, consider an explicit key check, e.g. `exit_code = payload["exit_code"] if "exit_code" in payload else payload.get("returncode", 0 if not stderr else 1)`.
</issue_to_address>

### Comment 2
<location path="astrbot/core/tools/computer_tools/shell.py" line_range="98" />
<code_context>
+    )
+
+
+def _strip_shell_comment(command: str) -> str:
+    in_single = False
+    in_double = False
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the custom comment-stripping state machine with `shlex` and simplifying the `env` handling to reduce complexity while preserving behavior.

You can reduce the added complexity while keeping the same behavior by leaning on standard library parsing and simplifying some plumbing.

**1. Replace the manual state machine with `shlex`**

The current `_strip_shell_comment` is a custom FSM for quotes/escapes, which is hard to maintain. Python’s `shlex` already handles POSIX shell quoting and comments:

```python
import shlex

def _strip_shell_comment(command: str) -> str:
    # shlex will treat `#` as a comment starter and respect quotes/escapes
    lexer = shlex.shlex(command, posix=True)
    lexer.commenters = '#'
    lexer.whitespace_split = False
    return ''.join(list(lexer))
```

This keeps the “strip trailing comments but respect quotes/escapes” behavior, without custom state handling logic.

Then `_is_self_detached_command` stays the same structurally, but becomes easier to reason about:

```python
def _is_self_detached_command(command: str) -> bool:
    stripped = _strip_shell_comment(command).strip()
    lowered = stripped.lower()

    return (
        lowered.startswith("nohup ")
        or lowered.startswith("setsid ")
        or lowered.startswith("disown ")
        or lowered.startswith("start ")
        or lowered.startswith("start-process ")
        or re.search(r"(?:^|\s)&\s*$", stripped) is not None
    )
```

**2. Simplify `env` copying**

The defensive copy can be expressed more directly:

```python
async def call(..., env: dict[str, Any] | None = None, ...) -> ToolExecResult:
    ...
    env = dict(env or {})
    result = await sb.shell.exec(
        command,
        cwd=cwd,
        background=effective_background,
        env=env,
    )
```

This avoids the slightly opaque `env=dict(env or {})` in the call site and keeps the behavior identical.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py Outdated
Comment thread astrbot/core/tools/computer_tools/shell.py Outdated
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new CUA booter implementation (astrbot/core/computer/booters/cua.py) is quite large and mixes shell, Python, filesystem, GUI, and helper concerns; consider splitting it into smaller, focused modules/classes (e.g., separate shell/fs/gui helpers) to keep responsibilities clearer and maintenance easier.
  • The changes to AstrBotConfig.check_config_integrity now log only generic messages for missing/unknown keys; consider including the config path or key name (without values) so operators can still diagnose which entries were inserted or removed while keeping secrets out of logs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new CUA booter implementation (`astrbot/core/computer/booters/cua.py`) is quite large and mixes shell, Python, filesystem, GUI, and helper concerns; consider splitting it into smaller, focused modules/classes (e.g., separate shell/fs/gui helpers) to keep responsibilities clearer and maintenance easier.
- The changes to `AstrBotConfig.check_config_integrity` now log only generic messages for missing/unknown keys; consider including the config path or key name (without values) so operators can still diagnose which entries were inserted or removed while keeping secrets out of logs.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="337" />
<code_context>
+    return {"success": True, "path": path, **result}
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+    def __init__(
+        self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]
</code_context>
<issue_to_address>
**issue (complexity):** Consider splitting CuaFileSystemComponent into separate rich-FS and shell-fallback implementations and delegating at construction time to reduce branching and improve clarity.

You can flatten a lot of branching in `CuaFileSystemComponent` by splitting the “rich FS” and “shell fallback” paths into separate implementations, and choosing the implementation at construction time. This keeps all behavior (including POSIX checks and base64 writes) but localizes it.

For example:

```python
class _CuaFilesystemAPI(FileSystemComponent):
    def __init__(self, fs: Any) -> None:
        self._fs = fs

    async def read_file(
        self,
        path: str,
        encoding: str = "utf-8",
        offset: int | None = None,
        limit: int | None = None,
    ) -> dict[str, Any]:
        content = await _maybe_await(self._fs.read_file(path))
        if isinstance(content, bytes):
            content = content.decode(encoding, errors="replace")
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
        }

    async def write_file(
        self,
        path: str,
        content: str,
        mode: str = "w",
        encoding: str = "utf-8",
    ) -> dict[str, Any]:
        _ = mode
        await _maybe_await(self._fs.write_file(path, content))
        return {"success": True, "path": path}

    async def delete_file(self, path: str) -> dict[str, Any]:
        delete = getattr(self._fs, "delete", None) or getattr(self._fs, "delete_file", None)
        await _maybe_await(delete(path))
        return {"success": True, "path": path}

    async def list_dir(
        self,
        path: str = ".",
        show_hidden: bool = False,
    ) -> dict[str, Any]:
        entries = await _maybe_await(self._fs.list_dir(path))
        return {"success": True, "path": path, "entries": entries}
```

Shell fallback stays in a dedicated class, which centralizes POSIX enforcement and shell plumbing:

```python
class _CuaShellFilesystemFallback(FileSystemComponent):
    def __init__(self, shell: CuaShellComponent, os_type: str) -> None:
        self._shell = shell
        self._os_type = os_type.lower()

    async def read_file(...):
        if not _is_posix_os_type(self._os_type):
            return _non_posix_filesystem_result(path, self._os_type)
        result = await self._shell.exec(f"cat {shlex.quote(path)}")
        if result.get("stderr"):
            return {"success": False, "path": path, "error": result["stderr"]}
        content = result.get("stdout", "")
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(content, offset=offset, limit=limit),
        }

    async def write_file(...):
        if not _is_posix_os_type(self._os_type):
            return _non_posix_filesystem_result(path, self._os_type)
        result = await _write_base64_via_shell(
            self._shell, path, content.encode(encoding),
        )
        return _write_result(path, result)

    # delete_file, list_dir, search_files, edit_file similarly delegate to existing helpers
```

Then `CuaFileSystemComponent` becomes a thin chooser that reuses these:

```python
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]) -> None:
        shell = CuaShellComponent(sandbox, os_type=os_type)
        fs = getattr(sandbox, "filesystem", None)

        if fs is not None:
            self._impl: FileSystemComponent = _CuaFilesystemAPI(fs)
        else:
            self._impl = _CuaShellFilesystemFallback(shell, os_type)

    async def read_file(...):
        return await self._impl.read_file(path, encoding=encoding, offset=offset, limit=limit)

    async def write_file(...):
        return await self._impl.write_file(path, content, mode=mode, encoding=encoding)

    async def delete_file(self, path: str) -> dict[str, Any]:
        return await self._impl.delete_file(path)

    async def list_dir(...):
        return await self._impl.list_dir(path, show_hidden=show_hidden)

    async def search_files(...):
        return await self._impl.search_files(
            pattern=pattern,
            path=path,
            glob=glob,
            after_context=after_context,
            before_context=before_context,
        )

    async def edit_file(...):
        return await self._impl.edit_file(
            path, old_string, new_string, replace_all=replace_all, encoding=encoding
        )
```

This keeps:

- The same shell-based behavior (base64 writes, `ls`, `rm`, `search_files_via_shell`).
- The same POSIX guarding via `_non_posix_filesystem_result`.
- The same public `CuaFileSystemComponent` interface.

But removes repeated `if self._fs is None` / POSIX checks from every method and makes each implementation easier to read and test.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tests/unit/test_astr_main_agent.py" line_range="1564-1573" />
<code_context>
+    def test_apply_sandbox_tools_with_cua_adds_gui_guidance(self, mock_context):
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that CUA GUI tools are actually attached to `req.func_tool`, not just mentioned in the prompt.

To better validate the CUA integration, extend this test to assert that `req.func_tool` actually registers `astrbot_cua_screenshot`, `astrbot_cua_mouse_click`, and `astrbot_cua_keyboard_type` when `booter='cua'`, and that `astrbot_cua_key_press` is omitted if that’s intentional. This ensures we’re testing the actual tool wiring, not just the prompt text.

Suggested implementation:

```python
    def test_apply_sandbox_tools_with_cua_adds_gui_guidance(self, mock_context):
        """Test that CUA sandbox guidance nudges reliable GUI workflows."""
        module = ama
        config = module.MainAgentBuildConfig(
            tool_call_timeout=60,
            computer_use_runtime="sandbox",
            sandbox_cfg={"booter": "cua"},
        )
        req = ProviderRequest(prompt="Test", system_prompt="Original prompt")

        module._apply_sandbox_tools(config, req, "session-123")

        # Ensure a function tool is actually attached
        assert getattr(req, "func_tool", None) is not None

        # Introspect registered tool names in a way that tolerates different container shapes
        tools_container = getattr(req.func_tool, "tools", None) or getattr(
            req.func_tool, "func_tools", None
        )
        assert tools_container is not None

        if isinstance(tools_container, dict):
            tool_names = set(tools_container.keys())
        else:
            tool_names = {
                getattr(tool, "name", None)
                for tool in tools_container
                if getattr(tool, "name", None) is not None
            }

        # CUA GUI tools should be registered when booter='cua'
        assert "astrbot_cua_screenshot" in tool_names
        assert "astrbot_cua_mouse_click" in tool_names
        assert "astrbot_cua_keyboard_type" in tool_names

        # Key press tool should not be wired if that is intentional
        assert "astrbot_cua_key_press" not in tool_names

```

If the actual attribute on `req.func_tool` that holds the tools differs (e.g. it is named `_tools`, `registry`, or similar), update the `tools_container = ...` line to reference the correct attribute so that `tool_names` is computed from the real registry used in your codebase.
Also, if the tool objects or registry use a different field than `.name` (for example, `tool_name` or `id`), adjust the comprehension that builds `tool_names` accordingly.
</issue_to_address>

### Comment 2
<location path="docs/zh/use/computer.md" line_range="103" />
<code_context>
+沙盒环境驱动器可在 `配置 -> 普通配置 -> 使用电脑能力` 的沙箱配置中选择。当前常用选项包括:
+
+- `Shipyard Neo`:AstrBot 推荐的远程/独立部署沙盒服务,适合长期运行和多人使用。
+- `CUA`:基于 [Cua](https://github.com/trycua/cua) 的本地或云端电脑使用沙盒,可提供桌面截图、鼠标、键盘、Shell、Python 和文件系统能力。
+
+使用 `Shipyard Neo` 时,沙盒 workspace 根目录通常是:
</code_context>
<issue_to_address>
**nitpick (typo):** “CUA” 与链接文本“Cua”的大小写不一致,建议统一。

前面名词使用大写 `CUA`,而链接文本为 “Cua”。建议将链接文本改为 `[CUA](https://github.com/trycua/cua)` 以保持术语大小写一致。

```suggestion
- `CUA`:基于 [CUA](https://github.com/trycua/cua) 的本地或云端电脑使用沙盒,可提供桌面截图、鼠标、键盘、Shell、Python 和文件系统能力。
```
</issue_to_address>

### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="337" />
<code_context>
+    return {"success": True, "path": path, **result}
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+    def __init__(
+        self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the POSIX shell-based filesystem fallback logic from CuaFileSystemComponent into a dedicated helper/adapter class to simplify each method’s control flow and responsibilities.

You can reduce complexity in `CuaFileSystemComponent` by extracting the POSIX shell fallback into a dedicated adapter and delegating to it, instead of inlining all the branching in each method.

Right now each method has to juggle:

- `self._fs` vs shell fallback
- `_is_posix_os_type` checks
- special error/result shaping

That makes every method longer and harder to follow. You can keep all current behavior by pushing the fallback logic into a small helper class with a stable interface, e.g.:

```python
class _PosixShellFileSystemFallback:
    def __init__(self, shell: CuaShellComponent, os_type: str) -> None:
        self._shell = shell
        self._os_type = os_type.lower()

    def _ensure_posix(self, path: str) -> dict[str, Any] | None:
        if not _is_posix_os_type(self._os_type):
            return _non_posix_filesystem_result(path, self._os_type)
        return None

    async def read_file(
        self, path: str, *, encoding: str, offset: int | None, limit: int | None
    ) -> dict[str, Any]:
        err = self._ensure_posix(path)
        if err:
            return err
        result = await self._shell.exec(f"cat {shlex.quote(path)}")
        if result.get("stderr"):
            return {"success": False, "path": path, "error": result["stderr"]}
        content = result.get("stdout", "")
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
        }

    async def write_file(
        self, path: str, content: str, *, encoding: str
    ) -> dict[str, Any]:
        err = self._ensure_posix(path)
        if err:
            return err
        result = await _write_base64_via_shell(
            self._shell, path, content.encode(encoding)
        )
        return _write_result(path, result)

    # similarly: delete_file, list_dir, search_files
```

Then `CuaFileSystemComponent` methods become straight-line logic with clearer responsibilities:

```python
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]) -> None:
        self._shell = CuaShellComponent(sandbox, os_type=os_type)
        self._fs = getattr(sandbox, "filesystem", None)
        self._fallback = _PosixShellFileSystemFallback(self._shell, os_type)

    async def read_file(self, path: str, encoding="utf-8", offset=None, limit=None):
        if self._fs is not None and hasattr(self._fs, "read_file"):
            content = await _maybe_await(self._fs.read_file(path))
            if isinstance(content, bytes):
                content = content.decode(encoding, errors="replace")
            return {
                "success": True,
                "path": path,
                "content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
            }
        return await self._fallback.read_file(
            path, encoding=encoding, offset=offset, limit=limit
        )

    async def write_file(self, path: str, content: str, mode="w", encoding="utf-8"):
        _ = mode
        if self._fs is not None and hasattr(self._fs, "write_file"):
            await _maybe_await(self._fs.write_file(path, content))
            return {"success": True, "path": path}
        return await self._fallback.write_file(path, content, encoding=encoding)
```

This keeps:

- The same POSIX/non‑POSIX behavior and error messages.
- The same shell commands and `_write_base64_via_shell` / `_write_result` behavior.
- The same public API and return shapes.

But it removes repeated branching and `_is_posix_os_type` checks from every `CuaFileSystemComponent` method, making the core class easier to read and maintain.
</issue_to_address>

### Comment 4
<location path="astrbot/core/tools/computer_tools/shell.py" line_range="52" />
<code_context>
         context: ContextWrapper[AstrAgentContext],
         command: str,
         background: bool = False,
-        env: dict = {},
+        env: dict[str, Any] | None = None,
     ) -> ToolExecResult:
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the new environment handling by avoiding optional typing and normalizing the env value at the call site.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1564 to +1573
def test_apply_sandbox_tools_with_cua_adds_gui_guidance(self, mock_context):
"""Test that CUA sandbox guidance nudges reliable GUI workflows."""
module = ama
config = module.MainAgentBuildConfig(
tool_call_timeout=60,
computer_use_runtime="sandbox",
sandbox_cfg={"booter": "cua"},
)
req = ProviderRequest(prompt="Test", system_prompt="Original prompt")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Also assert that CUA GUI tools are actually attached to req.func_tool, not just mentioned in the prompt.

To better validate the CUA integration, extend this test to assert that req.func_tool actually registers astrbot_cua_screenshot, astrbot_cua_mouse_click, and astrbot_cua_keyboard_type when booter='cua', and that astrbot_cua_key_press is omitted if that’s intentional. This ensures we’re testing the actual tool wiring, not just the prompt text.

Suggested implementation:

    def test_apply_sandbox_tools_with_cua_adds_gui_guidance(self, mock_context):
        """Test that CUA sandbox guidance nudges reliable GUI workflows."""
        module = ama
        config = module.MainAgentBuildConfig(
            tool_call_timeout=60,
            computer_use_runtime="sandbox",
            sandbox_cfg={"booter": "cua"},
        )
        req = ProviderRequest(prompt="Test", system_prompt="Original prompt")

        module._apply_sandbox_tools(config, req, "session-123")

        # Ensure a function tool is actually attached
        assert getattr(req, "func_tool", None) is not None

        # Introspect registered tool names in a way that tolerates different container shapes
        tools_container = getattr(req.func_tool, "tools", None) or getattr(
            req.func_tool, "func_tools", None
        )
        assert tools_container is not None

        if isinstance(tools_container, dict):
            tool_names = set(tools_container.keys())
        else:
            tool_names = {
                getattr(tool, "name", None)
                for tool in tools_container
                if getattr(tool, "name", None) is not None
            }

        # CUA GUI tools should be registered when booter='cua'
        assert "astrbot_cua_screenshot" in tool_names
        assert "astrbot_cua_mouse_click" in tool_names
        assert "astrbot_cua_keyboard_type" in tool_names

        # Key press tool should not be wired if that is intentional
        assert "astrbot_cua_key_press" not in tool_names

If the actual attribute on req.func_tool that holds the tools differs (e.g. it is named _tools, registry, or similar), update the tools_container = ... line to reference the correct attribute so that tool_names is computed from the real registry used in your codebase.
Also, if the tool objects or registry use a different field than .name (for example, tool_name or id), adjust the comprehension that builds tool_names accordingly.

Comment thread docs/zh/use/computer.md Outdated
Comment thread astrbot/core/computer/booters/cua.py
Comment thread astrbot/core/tools/computer_tools/shell.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In CuaBooter.upload_file and download_file the non-POSIX fallback path returns an error dict in one case and raises RuntimeError in the other; consider standardizing the error mechanism so callers can handle CUA filesystem fallbacks consistently.
  • The CUA background shell launcher (_CUA_BACKGROUND_LAUNCHER and _build_cua_background_command) inlines a fairly large Python script into a shell command string; it might be worth extracting this into a reusable module/script inside the sandbox image (or at least centralizing the string) to reduce duplication and make future tweaks less error‑prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `CuaBooter.upload_file` and `download_file` the non-POSIX fallback path returns an error dict in one case and raises `RuntimeError` in the other; consider standardizing the error mechanism so callers can handle CUA filesystem fallbacks consistently.
- The CUA background shell launcher (`_CUA_BACKGROUND_LAUNCHER` and `_build_cua_background_command`) inlines a fairly large Python script into a shell command string; it might be worth extracting this into a reusable module/script inside the sandbox image (or at least centralizing the string) to reduce duplication and make future tweaks less error‑prone.

## Individual Comments

### Comment 1
<location path="docs/zh/use/astrbot-agent-sandbox.md" line_range="72" />
<code_context>
+- macOS VM 通常依赖 CUA/Lume 相关运行时。
+- 云端 CUA 需要可用的 CUA API Key。
+
+具体宿主机要求、镜像支持情况和本地运行时安装方式,请参考 [Cua 官方文档](https://cua.ai/docs)+
+### 在 AstrBot 中配置 CUA
</code_context>
<issue_to_address>
**suggestion (typo):** 链接文本中的“Cua”建议统一为“CUA”以保持专有名词大小写一致。

本节及全文中此前均使用全大写“CUA”,这里写成“Cua 官方文档”会造成显得不一致,建议改为“CUA 官方文档”统一专有名词写法。

```suggestion
具体宿主机要求、镜像支持情况和本地运行时安装方式,请参考 [CUA 官方文档](https://cua.ai/docs)。
```
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="337" />
<code_context>
+    return {"success": True, "path": path, **result}
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+    def __init__(
+        self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the repeated POSIX filesystem fallback and python3 error-handling logic into shared helpers/classes so CUA components delegate instead of re-implementing similar branching in many methods.

You can keep all current behavior but reduce cognitive load a bit by tightening a few abstractions and deduplicating some branching.

### 1. Filesystem POSIX fallback branching

`CuaFileSystemComponent` currently repeats:

- “if self._fs has method -> use it; else if POSIX -> shell fallback; else error”

across `read_file`, `write_file`, `delete_file`, `list_dir`, `search_files`. You can push this into a dedicated POSIX fallback implementation and then make `CuaFileSystemComponent` a thin delegator.

Example of a minimal refactor:

```python
class _PosixShellFileSystem(FileSystemComponent):
    def __init__(self, shell: CuaShellComponent, os_type: str) -> None:
        self._shell = shell
        self._os_type = os_type.lower()

    def _ensure_posix(self, path: str) -> dict[str, Any] | None:
        if not _is_posix_os_type(self._os_type):
            return _non_posix_filesystem_result(path, self._os_type)
        return None

    async def read_file(self, path: str, encoding: str = "utf-8",
                        offset: int | None = None, limit: int | None = None) -> dict[str, Any]:
        err = self._ensure_posix(path)
        if err:
            return err
        result = await self._shell.exec(f"cat {shlex.quote(path)}")
        if result.get("stderr"):
            return {"success": False, "path": path, "error": result["stderr"]}
        content = result.get("stdout", "")
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
        }

    async def write_file(self, path: str, content: str,
                         mode: str = "w", encoding: str = "utf-8") -> dict[str, Any]:
        _ = mode
        err = self._ensure_posix(path)
        if err:
            return err
        result = await _write_base64_via_shell(self._shell, path, content.encode(encoding))
        return _write_result(path, result)

    # delete_file, list_dir, search_files can be moved here similarly
```

Then `CuaFileSystemComponent` can resolve which backend to use once:

```python
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]) -> None:
        self._shell = CuaShellComponent(sandbox, os_type=os_type)
        self._fs_native = getattr(sandbox, "filesystem", None)
        self._fallback = _PosixShellFileSystem(self._shell, os_type)

    async def read_file(...):
        if self._fs_native is not None and hasattr(self._fs_native, "read_file"):
            content = await _maybe_await(self._fs_native.read_file(path))
            # existing normalization
        else:
            return await self._fallback.read_file(path, encoding, offset, limit)
```

This removes repeated `_is_posix_os_type` / `_non_posix_filesystem_result` / shell-command wiring per method while fully preserving behavior.

You can similarly reuse `_PosixShellFileSystem` from `upload_file` / `download_file` to avoid re-implementing the base64 + shell IO pattern there.

---

### 2. Shell/Python python3-fallback handling

Both `CuaShellComponent` (background launcher) and `CuaPythonComponent` duplicate “run python3, detect missing python3, rewrite error message” logic. You can encapsulate this into a small helper that keeps behavior identical but makes the execution path easier to follow.

For example:

```python
async def _exec_python3_or_error(shell: CuaShellComponent, code: str,
                                 *, operation: str, timeout: int | None = 30) -> ProcessResult:
    result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
    stderr = result.get("stderr", "")
    if stderr and _is_missing_python3_error(stderr):
        stderr = _python3_requirement_error(operation, stderr)
    return ProcessResult(
        stdout=result.get("stdout", ""),
        stderr=stderr,
        exit_code=result.get("exit_code"),
        success=bool(result.get("success", False)),
    )
```

Then `CuaPythonComponent.exec` reduces to:

```python
async def exec(self, code: str, kernel_id: str | None = None,
               timeout: int = 30, silent: bool = False) -> dict[str, Any]:
    _ = kernel_id
    if self._python_exec is not None:
        raw = await _maybe_await(self._python_exec(code, timeout=timeout))
        proc = _normalize_process_result(raw)
    else:
        shell = CuaShellComponent(self._sandbox, os_type=self._os_type)
        proc = await _exec_python3_or_error(
            shell, code, operation="Python execution fallback", timeout=timeout
        )

    output_text = "" if silent else proc.stdout
    error_text = proc.stderr
    return {
        "success": proc.success if not silent else not bool(error_text),
        "data": {"output": {"text": output_text, "images": []}, "error": error_text},
        "output": output_text,
        "error": error_text,
    }
```

You can apply the same helper (or a variant taking a command string instead of inline `-c`/heredoc) to the background-launch path, so the python3 requirement messaging is centralized and consistent.

These two changes keep all your current flexibility but localize complexity so individual methods are easier to read and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread docs/zh/use/astrbot-agent-sandbox.md Outdated
Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/tools/computer_tools/shell.py" line_range="86" />
<code_context>
+            return f"Error executing command: {detail}"
+
+
+def _is_self_detached_command(command: str) -> bool:
+    stripped = _strip_shell_comment(command).strip()
+    lowered = stripped.lower()
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the custom parsing and regex in `_is_self_detached_command` with `shlex`-based tokenization to detect self-detaching commands more simply and robustly.

You can simplify the background-detection logic and drop the custom parser while keeping behavior (quote/escape-aware, `#` comments) by delegating to `shlex`, which already handles shell-like tokenization and comments.

Specifically:

- Remove `_strip_shell_comment` and the manual `in_single`/`in_double`/`escaped` state machine.
- Use `shlex` to strip comments and get tokens, then:
  - Check the first token for `nohup`, `setsid`, `disown`, `start`, `start-process`.
  - Check if the last token is a bare `&`.

Example refactor:

```python
import shlex
import re  # you may no longer need this if & detection is via shlex

def _is_self_detached_command(command: str) -> bool:
    # shlex handles quotes, escapes, and comments for us
    lex = shlex.shlex(command, posix=True)
    lex.whitespace_split = True
    lex.commenters = "#"

    tokens = list(lex)
    if not tokens:
        return False

    first = tokens[0].lower()
    if first in ("nohup", "setsid", "disown", "start", "start-process"):
        return True

    # detect trailing & (unquoted, not in a comment)
    return tokens[-1] == "&"
```

Then you can delete `_strip_shell_comment` entirely.

This keeps:
- Comment stripping with `#` outside quotes.
- Correct handling of quotes/escapes.
- The same “self-detached” cases you currently check.

But it removes the ad-hoc state machine and regex, making the logic easier to read, test, and maintain.
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="73" />
<code_context>
+    success: bool
+
+
+def _maybe_model_dump(value: Any) -> dict[str, Any]:
+    if isinstance(value, dict):
+        return value
</code_context>
<issue_to_address>
**issue (complexity):** Consider consolidating the scattered helper functions (for process normalization, POSIX checks, shell filesystem helpers, and screenshot handling) into a few cohesive, well-documented utilities to reduce mental overhead while preserving behavior.

You can keep all current behavior but reduce cognitive load by collapsing some of the tiny helpers into a few cohesive, explicit normalizers.

### 1. Process result normalization & python3 error handling

Right now `_maybe_model_dump` + `_result_text` + `_normalize_process_result` + `_process_result_with_python3_error` form a small framework with a lot of cross‑references.

You can keep all existing shapes and error detection, but make it much easier to follow by:

- Inlining `_result_text` and the exit code logic into `_normalize_process_result`.
- Folding `_process_result_with_python3_error` into a single “normalize + maybe wrap python3 error” function.
- Keeping the “generic guessing” in a single well‑documented place.

Example:

```python
@dataclass(slots=True)
class ProcessResult:
    stdout: str
    stderr: str
    exit_code: int | None
    success: bool


def _normalize_process_result(raw: Any) -> ProcessResult:
    """
    Best-effort normalization of CUA process results.

    Supports:
    - dict-like results
    - dataclasses / pydantic-like with .model_dump() / .dict()
    - objects with stdout/stderr/exit_code/returncode/return_code/success
    - plain string (treated as stdout)
    """
    # === 1. Get a dict-like view ===
    if isinstance(raw, dict):
        payload = raw
    else:
        if is_dataclass(raw) and not isinstance(raw, type):
            payload = asdict(raw)
        elif hasattr(raw, "model_dump"):
            dumped = raw.model_dump()
            payload = dumped if isinstance(dumped, dict) else {}
        elif hasattr(raw, "dict"):
            dumped = raw.dict()
            payload = dumped if isinstance(dumped, dict) else {}
        else:
            # attribute-based adapter
            keys = (
                "stdout", "stderr", "output", "error",
                "returncode", "return_code", "exit_code", "success",
            )
            payload = {k: getattr(raw, k) for k in keys if hasattr(raw, k)}

    if not payload and isinstance(raw, str):
        payload = {"stdout": raw}

    # === 2. Normalize text fields ===
    def _first_text(*keys: str) -> str:
        for k in keys:
            v = payload.get(k)
            if v is not None:
                return str(v)
        return ""

    stdout = _first_text("stdout", "output")
    stderr = _first_text("stderr", "error")

    # === 3. Normalize exit code ===
    exit_code = (
        payload.get("exit_code")
        or payload.get("returncode")
        or payload.get("return_code")
    )
    if exit_code is None:
        exit_code = 0 if not stderr else 1

    # === 4. Success flag ===
    success = bool(payload.get("success", not stderr and exit_code in (0, None)))

    return ProcessResult(stdout=stdout, stderr=stderr, exit_code=exit_code, success=success)


def _normalize_with_python3_requirement(raw: Any, operation: str) -> ProcessResult:
    proc = _normalize_process_result(raw)
    if proc.stderr and _is_missing_python3_error(proc.stderr):
        return ProcessResult(
            stdout=proc.stdout,
            stderr=_python3_requirement_error(operation, proc.stderr),
            exit_code=proc.exit_code,
            success=proc.success,
        )
    return proc
```

Then usage points become simpler and more uniform:

```python
# CuaShellComponent.exec
result = await _maybe_await(self._exec_raw(command, **kwargs))
proc = (
    _normalize_with_python3_requirement(result, "background execution")
    if background
    else _normalize_process_result(result)
)
```

```python
# _exec_python3_or_error
result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
return _normalize_with_python3_requirement(result, operation)
```

```python
# _write_result
stderr = result.get("stderr", "")
if stderr and _is_missing_python3_error(stderr):
    result = {
        **result,
        "stderr": _python3_requirement_error("filesystem write fallback", stderr),
    }
```

This keeps all existing branches, but concentrates the complexity into two well‑named functions with local helpers instead of many top‑level utilities.

---

### 2. POSIX filesystem helpers

The POSIX / non‑POSIX error handling is currently spread across:

- `_is_posix_os_type`
- `_non_posix_filesystem_error`
- `_non_posix_filesystem_result`
- `_raise_non_posix_filesystem_error`
- `_PosixShellFileSystem._ensure_posix`

You can keep all the same messages and behavior but collapse the result/error building into one helper and one message builder. That reduces the surface area while retaining clarity.

Example:

```python
_POSIX_OS_TYPES = {"linux", "darwin", "macos"}


def _is_posix_os_type(os_type: str) -> bool:
    return os_type.lower() in _POSIX_OS_TYPES


def _posix_fs_error_message(os_type: str) -> str:
    return (
        "CUA filesystem shell fallback is only supported for POSIX images; "
        f"os_type={os_type!r} does not support the required shell commands."
    )


def _non_posix_filesystem_result(path: str, os_type: str) -> dict[str, Any]:
    error = _posix_fs_error_message(os_type)
    return {"success": False, "path": path, "error": error, "message": error}


def _raise_non_posix_filesystem_error(os_type: str) -> None:
    raise RuntimeError(_posix_fs_error_message(os_type))
```

Then `_PosixShellFileSystem._ensure_posix` is trivial and doesn’t need its own error builders:

```python
class _PosixShellFileSystem(FileSystemComponent):
    def __init__(self, shell: CuaShellComponent, os_type: str) -> None:
        self._shell = shell
        self._os_type = os_type.lower()

    def _ensure_posix(self, path: str) -> dict[str, Any] | None:
        if _is_posix_os_type(self._os_type):
            return None
        return _non_posix_filesystem_result(path, self._os_type)
```

This preserves all existing behavior (same message text, same failure shape) while removing three separate “message wrapper” helpers.

---

### 3. Shell-based filesystem helpers

`_write_base64_via_shell`, `_write_result`, `_split_listing_entries`, and `_list_dir_via_shell` are straightforward and only used for the POSIX fallback. You can make the flow easier to follow by inlining the split helper and keeping the write helpers close together.

For example, `_split_listing_entries` can be inlined into `_list_dir_via_shell`:

```python
async def _list_dir_via_shell(
    shell: CuaShellComponent,
    path: str,
    show_hidden: bool,
) -> dict[str, Any]:
    flags = "-1A" if show_hidden else "-1"
    result = await shell.exec(f"ls {flags} {shlex.quote(path)}")
    stdout = result.get("stdout", "")
    entries = [line for line in stdout.splitlines() if line.strip()]
    return {
        "success": not bool(result.get("stderr")),
        "path": path,
        "entries": entries,
        "error": result.get("stderr", ""),
    }
```

This keeps the behavior but reduces the number of “micro” helpers you have to mentally chase.

---

### 4. Screenshot normalization

`_screenshot_to_bytes` currently supports many shapes (bytes, data URI, base64 string, file path, PIL‑like, and dict variants) via nested recursion. You can keep all of those but structure it into clearly separated “cases” with a small inner function, which makes the branching easier to read and test.

```python
def _screenshot_to_bytes(raw: Any) -> bytes:
    def _from_str(s: str) -> bytes:
        # data URI
        if s.startswith("data:image"):
            s = s.split(",", 1)[1]
        # base64 or path
        try:
            return base64.b64decode(s, validate=True)
        except Exception:
            candidate = Path(s)
            if candidate.is_file():
                return candidate.read_bytes()
            return s.encode("utf-8")

    if isinstance(raw, (bytes, bytearray)):
        return bytes(raw)

    if isinstance(raw, str):
        return _from_str(raw)

    if hasattr(raw, "save"):
        import io
        buf = io.BytesIO()
        raw.save(buf, format="PNG")
        return buf.getvalue()

    payload = _maybe_model_dump(raw)
    for key in ("data", "base64", "image"):
        value = payload.get(key)
        if value is not None:
            # recurse but through the same clear entry points
            return _screenshot_to_bytes(value)

    raise TypeError(f"Unsupported CUA screenshot result: {type(raw)!r}")
```

Behavior is unchanged, but the structure is more linear and self‑documenting.

---

Overall, these small refactors keep all the flexibility you added, while consolidating related logic and removing several tiny, cross‑referencing helpers. That should address the “too generic / too many helpers” concern without sacrificing compatibility with varying CUA SDK shapes.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/tools/computer_tools/shell.py
Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe
Copy link
Copy Markdown
Member Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tests/unit/test_cua_computer_use.py" line_range="149-158" />
<code_context>
+    )
+
+
+@pytest.mark.asyncio
+async def test_screenshot_tool_returns_image_and_sends_file(monkeypatch, tmp_path):
+    from astrbot.core.tools.computer_tools import cua as cua_tools
+    from astrbot.core.tools.computer_tools.cua import CuaScreenshotTool
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests to cover additional screenshot result formats handled by `_screenshot_to_bytes`

Currently only the raw-bytes path is tested, but `_screenshot_to_bytes` also handles base64/data URL strings, filesystem paths, PIL-like objects with `.save`, and nested dict payloads. Please add a parametrized test that covers these shapes (e.g., `data:image/png;base64,...` string, temp-file path string, simple `.save` object, dict with a `base64` field) and verifies that `CuaScreenshotTool` returns a valid PNG and removes the `base64` field from the JSON payload.
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="73" />
<code_context>
+    success: bool
+
+
+def _maybe_model_dump(value: Any) -> dict[str, Any]:
+    if isinstance(value, dict):
+        return value
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the process result normalization and inlining the POSIX filesystem fallback logic to reduce indirection and make failure modes clearer.

You can meaningfully reduce complexity in two spots without changing real‑world behavior: the generic result normalization and the POSIX filesystem fallback.

---

### 1. Narrow `_maybe_model_dump` / process normalization surface

`_maybe_model_dump` + `_normalize_process_result` currently act like a mini “universal adapter” for any random object:

```py
def _maybe_model_dump(value: Any) -> dict[str, Any]:
    ...
    attr_payload = {
        key: getattr(value, key)
        for key in (
            "stdout",
            "stderr",
            "output",
            "error",
            "returncode",
            "return_code",
            "exit_code",
            "success",
        )
        if hasattr(value, key)
    }
    if attr_payload:
        return attr_payload
    return {}
```

and then `_normalize_process_result` relies on that.

If the CUA SDK actually returns a small, documented set of types (e.g. plain `dict`, one or two dataclasses, maybe Pydantic models), you can simplify this by:

* Handling only those concrete shapes explicitly.
* Removing the broad attribute introspection heuristic, which makes debugging harder (silent coercion of arbitrary objects).

Example refactor (pattern, not exact types):

```py
def _maybe_model_dump(value: Any) -> dict[str, Any]:
    # 1. Already-normalized dict
    if isinstance(value, dict):
        return value

    # 2. Dataclass result
    if is_dataclass(value) and not isinstance(value, type):
        return asdict(value)

    # 3. Pydantic-style result
    for attr in ("model_dump", "dict"):
        fn = getattr(value, attr, None)
        if callable(fn):
            dumped = fn()
            if isinstance(dumped, dict):
                return dumped

    # 4. Fallback: fail fast, don't silently coerce arbitrary objects
    raise TypeError(f"Unsupported process result type: {type(value)!r}")
```

Then make `_normalize_process_result` clearly expect this:

```py
def _normalize_process_result(raw: Any) -> ProcessResult:
    if isinstance(raw, str):
        payload = {"stdout": raw}
    else:
        payload = _maybe_model_dump(raw)

    stdout = str(payload.get("stdout") or payload.get("output") or "")
    stderr = str(payload.get("stderr") or payload.get("error") or "")

    exit_code = (
        payload.get("exit_code")
        if "exit_code" in payload
        else payload.get("returncode", payload.get("return_code"))
    )
    if exit_code is None:
        exit_code = 0 if not stderr else 1

    success = bool(payload.get("success", not stderr and exit_code in (0, None)))
    return ProcessResult(stdout=stdout, stderr=stderr, exit_code=exit_code, success=success)
```

Actionable steps:

1. Enumerate the actual CUA shell/python/GUI result types you expect.
2. Replace the generic `attr_payload` logic with explicit handling for those types only.
3. Make `_maybe_model_dump` raise instead of silently returning `{}` for unsupported objects so unexpected SDK changes fail loudly.

This shrinks the number of hidden code paths and makes failures much easier to reason about, while preserving the behavior for the real CUA result shapes.

---

### 2. Inline `_PosixShellFileSystem` into `CuaFileSystemComponent`

The current filesystem path has multiple layers:

- `CuaFileSystemComponent` chooses between `self._fs` and `_fallback` (`_PosixShellFileSystem`).
- `_PosixShellFileSystem` itself has `_ensure_posix` + shell helpers + POSIX error helpers.

For simple operations like `read_file`/`write_file`, that makes the call stack harder to follow.

Since `_PosixShellFileSystem` is only ever constructed inside `CuaFileSystemComponent` and only for shell fallback, you can fold it into `CuaFileSystemComponent` and keep the same behavior:

**Current pattern (simplified):**

```py
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(...):
        self._shell = CuaShellComponent(...)
        self._fs = getattr(sandbox, "filesystem", None)
        self._os_type = os_type.lower()
        self._fallback = _PosixShellFileSystem(self._shell, self._os_type)

    async def read_file(...):
        read_file = None if self._fs is None else getattr(self._fs, "read_file", None)
        if read_file is None:
            return await self._fallback.read_file(path, encoding, offset, limit)
        ...
```

**Suggested pattern: inline fallback and shared POSIX guard:**

```py
class CuaFileSystemComponent(FileSystemComponent):
    def __init__(self, sandbox: Any, os_type: str = CUA_DEFAULT_CONFIG["os_type"]) -> None:
        self._shell = CuaShellComponent(sandbox, os_type=os_type)
        self._fs = getattr(sandbox, "filesystem", None)
        self._os_type = os_type.lower()

    def _ensure_posix(self, path: str) -> dict[str, Any] | None:
        if _is_posix_os_type(self._os_type):
            return None
        return _non_posix_filesystem_result(path, self._os_type)

    async def _read_file_via_shell(
        self,
        path: str,
        encoding: str,
        offset: int | None,
        limit: int | None,
    ) -> dict[str, Any]:
        if error := self._ensure_posix(path):
            return error
        result = await self._shell.exec(f"cat {shlex.quote(path)}")
        if result.get("stderr"):
            return {"success": False, "path": path, "error": result["stderr"]}
        return {
            "success": True,
            "path": path,
            "content": _slice_content_by_lines(
                str(result.get("stdout", "")), offset=offset, limit=limit
            ),
        }

    async def read_file(...):
        read_file = getattr(self._fs, "read_file", None) if self._fs else None
        if read_file is not None:
            content = await _maybe_await(read_file(path))
            ...
        # single fallback path
        return await self._read_file_via_shell(path, encoding, offset, limit)
```

You can apply the same pattern for `write_file`, `delete_file`, `list_dir`, and `search_files`:

- A single `_ensure_posix` method.
- One private “via shell” method per operation.
- No `_PosixShellFileSystem` wrapper class.

This keeps all existing behaviors (POSIX gating, shell implementation, non‑POSIX error messages) but removes a whole level of indirection, making the filesystem flow easier to inspect and debug.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +149 to +158
@pytest.mark.asyncio
async def test_get_booter_creates_cua_booter(monkeypatch):
from astrbot.core.computer import computer_client

created = []

class FakeCuaBooter:
def __init__(
self,
image: str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests to cover additional screenshot result formats handled by _screenshot_to_bytes

Currently only the raw-bytes path is tested, but _screenshot_to_bytes also handles base64/data URL strings, filesystem paths, PIL-like objects with .save, and nested dict payloads. Please add a parametrized test that covers these shapes (e.g., data:image/png;base64,... string, temp-file path string, simple .save object, dict with a base64 field) and verifies that CuaScreenshotTool returns a valid PNG and removes the base64 field from the JSON payload.

Comment thread astrbot/core/computer/booters/cua.py
@zouyonghe zouyonghe merged commit cb5c172 into AstrBotDevs:master Apr 27, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants