CUA Sample Agent#261
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new .NET 8 “W365 Computer Use” sample agent that connects to the W365 Computer Use MCP server and drives a computer-use (CUA) loop via the OpenAI Responses API, with accompanying configuration, telemetry, and documentation.
Changes:
- Introduces a new sample agent project/solution for Windows 365 computer-use via MCP + OpenAI Responses API.
- Implements model providers (Azure OpenAI + custom endpoint), the computer-use orchestrator, and an Agent Framework-based bot.
- Adds OpenTelemetry/observability helpers, local dev configuration assets, and a full README with setup/run instructions.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/w365-computer-use/sample-agent/appsettings.json | Adds sample configuration for auth, connections, AI providers, and computer-use settings |
| dotnet/w365-computer-use/sample-agent/W365ComputerUseSample.csproj | New .NET 8 web project with A365/Agent Framework + OTEL dependencies |
| dotnet/w365-computer-use/sample-agent/ToolingManifest.json | Declares MCP server dependency for tooling discovery |
| dotnet/w365-computer-use/sample-agent/Telemetry/AgentMetrics.cs | Adds custom ActivitySource + metrics helpers for request/turn instrumentation |
| dotnet/w365-computer-use/sample-agent/Telemetry/A365OtelWrapper.cs | Wraps agent operations to attach baggage and register observability token cache |
| dotnet/w365-computer-use/sample-agent/ServiceExtensions.cs | Adds OpenTelemetry wiring (traces + metrics) to the web host |
| dotnet/w365-computer-use/sample-agent/README.md | Documents sample purpose, architecture, setup, configuration, and troubleshooting |
| dotnet/w365-computer-use/sample-agent/Properties/launchSettings.json | Adds local debug profile and URL binding |
| dotnet/w365-computer-use/sample-agent/Program.cs | App composition: DI registrations, routing, auth, endpoints, shutdown hook |
| dotnet/w365-computer-use/sample-agent/ComputerUse/Models/ComputerUseModels.cs | Adds JSON request/response models and tool definitions for Responses API |
| dotnet/w365-computer-use/sample-agent/ComputerUse/ICuaModelProvider.cs | Defines abstraction for calling a CUA-capable model endpoint |
| dotnet/w365-computer-use/sample-agent/ComputerUse/CustomEndpointProvider.cs | Implements certificate/MSAL-secured custom endpoint model provider |
| dotnet/w365-computer-use/sample-agent/ComputerUse/ComputerUseOrchestrator.cs | Core CUA loop translating model actions into W365 MCP tool calls |
| dotnet/w365-computer-use/sample-agent/ComputerUse/AzureOpenAIModelProvider.cs | Implements Azure OpenAI Responses API provider using API key |
| dotnet/w365-computer-use/sample-agent/AspNetExtensions.cs | Adds configurable JWT token validation wiring for ASP.NET |
| dotnet/w365-computer-use/sample-agent/Agent/MyAgent.cs | Agent entrypoint: auth selection, tool loading, streaming updates, orchestration |
| dotnet/w365-computer-use/sample-agent/.gitignore | Ignores dev settings and screenshots output |
| dotnet/w365-computer-use/W365ComputerUseSample.sln | New solution to open/build the sample project |
| if (_cachedTools != null) | ||
| return (_cachedTools, _cachedMcpClient); | ||
|
|
||
| var httpClient = _httpClient; |
There was a problem hiding this comment.
Setting _httpClient.DefaultRequestHeaders.Authorization on a long-lived HttpClient can leak/overwrite bearer tokens between users and affect unrelated requests from this orchestrator. Prefer per-request authorization headers (or a dedicated client per cached connection) rather than mutating DefaultRequestHeaders.
| var httpClient = _httpClient; | |
| // Use a dedicated HttpClient instance for the MCP connection to avoid | |
| // mutating authorization headers on a shared, long-lived HttpClient. | |
| var httpClient = new HttpClient(); |
| } | ||
|
|
||
| /// <summary> | ||
| /// Run the CUA loop. Session must already be started by the caller. |
There was a problem hiding this comment.
XML doc says RunAsync “Session must already be started by the caller”, but this method starts the session itself when _sessionStarted is false. Update the comment to match behavior (or enforce the contract by removing the internal start).
| /// Run the CUA loop. Session must already be started by the caller. | |
| /// Run the CUA loop. Starts a W365 session if one is not already active and | |
| /// reuses the same session across calls for this application instance. |
| var driveBase = string.IsNullOrEmpty(_oneDriveUserId) | ||
| ? "https://graph.microsoft.com/v1.0/me/drive" | ||
| : $"https://graph.microsoft.com/v1.0/users/{_oneDriveUserId}/drive"; | ||
| var url = $"{driveBase}/root:/{_oneDriveFolder.TrimStart('/')}/{fileName}:/content"; | ||
|
|
There was a problem hiding this comment.
The OneDrive upload doc comment says files go to /CUA-Sessions/{date}/, but the code builds a URL without any date-based subfolder. Either implement the dated folder structure or adjust the comment to match actual behavior.
| request.Content = new ByteArrayContent(Convert.FromBase64String(base64Data)); | ||
| request.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("image/png"); | ||
|
|
||
| var response = await _httpClient.SendAsync(request); |
There was a problem hiding this comment.
_httpClient.SendAsync(request) isn’t passed a CancellationToken, so cancelled requests/shutdown may hang until the HTTP call completes. Thread the token through and pass it into SendAsync here.
| var response = await _httpClient.SendAsync(request); | |
| var response = await _httpClient.SendAsync(request, System.Threading.CancellationToken.None); |
| agentId = agentId ?? Guid.Empty.ToString(); | ||
| string? tempTenantId = turnContext?.Activity?.Conversation?.TenantId ?? turnContext?.Activity?.Recipient?.TenantId; | ||
| string tenantId = tempTenantId ?? Guid.Empty.ToString(); |
There was a problem hiding this comment.
agentId = agentId ?? Guid.Empty.ToString(); won’t replace an empty string, so agentId can remain "" and be propagated into baggage/observability. Use string.IsNullOrEmpty(agentId) (or make it nullable) and fall back to a stable placeholder when it’s missing.
| finally | ||
| { | ||
| stopwatch.Stop(); | ||
| FinalizeMessageHandlingActivity(activity, context, stopwatch.ElapsedMilliseconds, true); | ||
| } |
There was a problem hiding this comment.
FinalizeMessageHandlingActivity(..., success: true) is always called with true, even after an exception path. This can overwrite the Activity status to OK and misreport duration metrics. Track a success flag based on whether func() completed without throwing and pass that value.
| if (_cachedTools != null) | ||
| return (_cachedTools, _cachedMcpClient); | ||
|
|
There was a problem hiding this comment.
This global cache returns the same _cachedTools/_cachedMcpClient for all callers. If multiple users/tenants hit the agent, sessions and tool state can be unintentionally shared. Cache per conversation/agent identity instead of a single global instance.
| ## Overview | ||
|
|
||
| This sample demonstrates how to build an agent that controls a Windows 365 Cloud PC using the OpenAI Responses API and the W365 Computer Use MCP server. | ||
|
|
||
| The agent receives a natural language task from the user, provisions a W365 desktop session via MCP tools, then runs a CUA (Computer Use Agent) loop: the model sees screenshots, decides actions (click, type, scroll), and the MCP server executes them on the VM. |
There was a problem hiding this comment.
The README doesn’t include an explicit “Demonstrates” section (used by other Agent 365 samples to quickly summarize what the sample teaches). Add a short “Demonstrates” section near the top so the learning goals are scannable.
| using Microsoft.Agents.A365.Observability; | ||
| using Microsoft.Agents.A365.Observability.Extensions.AgentFramework; | ||
| using Microsoft.Agents.A365.Observability.Runtime; |
There was a problem hiding this comment.
These using directives appear unused in this file and will generate build warnings. Remove the unused Microsoft.Agents.A365.Observability* usings (or start using the referenced APIs) to keep the sample warning-free.
| using Microsoft.Agents.A365.Observability; | |
| using Microsoft.Agents.A365.Observability.Extensions.AgentFramework; | |
| using Microsoft.Agents.A365.Observability.Runtime; |
|
|
||
| // Register the Computer Use orchestrator | ||
| builder.Services.AddSingleton<ComputerUseOrchestrator>(); | ||
|
|
There was a problem hiding this comment.
ComputerUseOrchestrator is registered as a singleton but it holds mutable per-conversation state (_conversationHistory, _sessionStarted, cached MCP client/tools, screenshot counter). This can cause cross-user data leakage and races if multiple conversations/messages are processed concurrently. Consider making it scoped/per-conversation (or keying state by conversation/user id with locking).
- Add ConversationSession class to track per-conversation W365 sessions - Refactor ComputerUseOrchestrator to use ConcurrentDictionary keyed by conversationId - Parse sessionId from QuickStartSession response and pass to all MCP tool calls - Pass conversationId from turnContext to orchestrator - Add deployment artifacts to .gitignore (a365 configs, app.zip, publish/)
…loop Instead of sending the full conversation history (including all base64 screenshots) on every model call, use the OpenAI Responses API's previous_response_id to let the server reconstruct prior context. Only new items (computer_call_output, function_call_output) are sent per iteration, reducing API payload by ~15x. Between user messages, computer actions and screenshots are pruned from history while text context is preserved for conversational continuity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: use previous_response_id to avoid resending screenshots in CUA …
…r screenshots Two fixes: 1. Keep the last computer_call + computer_call_output pair when pruning history between user messages. The API requires a matching computer_call for every computer_call_output (linked by call_id) — dropping one causes BadRequest: "No tool call found for computer call with call_id". This also gives the model visual context for simple follow-ups. 2. Add session recovery to the screenshot capture path, matching the existing pattern used by action tools (click, type, scroll). If CaptureScreenshot returns "no active session", recover the session and retry once — instead of failing outright. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the W365 session ID instead of truncated conversation ID for the OneDrive screenshot subfolder. Updated in both initial session start and session recovery paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review feedback — rename cryptic loop variables (i, j, t, ct, cid, ccid) to descriptive names (histIdx, searchIdx, entryType, earlierType, outputCallId, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Support multiple MCP server URLs (W365 + MailTools) with per-server HttpClient - Fix _cachedTools overwrite bug that dropped mail tools on second message - Add function tool instructions to system prompt so model prefers them over CUA - Per-server error handling so one server failure doesn't block others - Fix CancellationTokenSource disposal races in typing indicator - Make CUA session start message ephemeral (informative update) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix CUA orchestrator session recovery and screenshot handling
- ToolingManifest.json: add mcp_MailTools entry so Mail MCP is discovered in prod - ComputerUseOrchestrator: restore onFolderLinkReady callback, session-ID-based screenshot subfolder, ShareConversationFolderAsync, FolderShared flag, and have UploadScreenshotToOneDriveAsync return the share URL - ComputerUseOrchestrator: log function_call args and returned output for troubleshooting MCP tool invocations - ComputerUseOrchestrator: update system prompt so model tells the user "I can't" when no matching tool exists (instead of silently calling OnTaskComplete) - MyAgent.cs: pass onFolderLinkReady callback that posts a View-folder link Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Additive port of 407b43c features that don't conflict with our multi-mcp work: - System prompt: mention EndSession and casual-chat behavior - Add EndSession as a model-callable function tool - Handle EndSession in function_call: tear down W365 session, drop state, return "Session ended..." text to the user - EndSessionAsync: add catch for HttpRequestException 404 (MCP transport already expired — no need to warn) - Transparent session recovery during CUA actions: detect session-not-found tool responses, end the stale session, start a fresh one, and retry - Pin A365 SDK package versions to 0.1.72-beta (was beta.*) - Add nuget.config (clear + nuget.org only) Skipped: HasActiveSession / EndConversationSessionAsync public methods (not called anywhere on the target branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two targeted changes to eliminate a post-response crash on the `computer`
tool path (gpt-5.4-mini):
fail: Kestrel[13] ConnectionAbortedException
→ InvalidOperationException: Reading is already in progress
Unhandled: ObjectDisposedException (HttpRequestPipeReader)
1. MyAgent.cs: drop the background typing-indicator Task.Run loop. The
loop fired SendActivityAsync every ~4s concurrently with the main
reply path, and the resulting race against StreamingResponse.EndStream
triggered the crash. Keep a single initial typing activity; informative
updates via onStatusUpdate/onCuaStarting already cover visual feedback.
2. Program.cs: call request.EnableBuffering() at the /api/messages
endpoint so observability/tracing middleware can re-read the body
without hitting "Reading is not allowed after reader was completed".
Validated against a long CUA session (10+ iterations), a mixed
CUA/email/chat exchange, and parallel requests — no crashes observed in
the test window that previously reproduced the bug within minutes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Register the full stock-MCP set so the A365 SDK gateway surfaces tools from each server at runtime: mcp_W365ComputerUse, mcp_MailTools, mcp_MeServer, mcp_CalendarTools, mcp_TeamsServer, mcp_ODSPRemoteServer, mcp_SharepointListsTools, mcp_AdminTools, mcp_WordServer, mcp_m365copilot Loads ~149 function tools in prod when the blueprint has the matching McpServers.*.All inheritable scopes consented. Note: mcp_SharepointListsTools currently fails to load with an ObjectDisposedException inside the A365 SDK's MCP client factory (CancellationTokenSource). Appears to be an SDK-side issue — the other nine servers load cleanly. Not addressed here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-MCP tools, mail + EndSession, OneDrive folder link restored
No description provided.