Skip to content

Conversation

@Gkrumbach07
Copy link
Collaborator

  • Introduced a new endpoint for retrieving runner capabilities at /agentic-sessions/:sessionName/agui/capabilities.
  • Implemented the HandleCapabilities function to authenticate users, verify permissions, and proxy requests to the runner.
  • Enhanced AGUI event handling by adding support for custom events and persisting message snapshots for faster reconnections.
  • Updated the frontend to utilize the new capabilities endpoint and replaced the existing chat component with CopilotChatPanel for improved user experience.

This update improves the overall functionality and performance of the AG-UI system, allowing for better integration with the runner's capabilities and enhancing user interactions.

- Introduced a new endpoint for retrieving runner capabilities at `/agentic-sessions/:sessionName/agui/capabilities`.
- Implemented the `HandleCapabilities` function to authenticate users, verify permissions, and proxy requests to the runner.
- Enhanced AGUI event handling by adding support for custom events and persisting message snapshots for faster reconnections.
- Updated the frontend to utilize the new capabilities endpoint and replaced the existing chat component with `CopilotChatPanel` for improved user experience.

This update improves the overall functionality and performance of the AG-UI system, allowing for better integration with the runner's capabilities and enhancing user interactions.
@Gkrumbach07 Gkrumbach07 marked this pull request as draft February 11, 2026 00:39
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 4.54545% with 105 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...onents/runners/claude-code-runner/observability.py 4.54% 105 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

This comment has been minimized.

- Fixed a typo in the event type constant from `EventTypStateDelta` to `EventTypeStateDelta`.
- Added a new event type constant `EventTypeCustom` for platform extensions.
- Refactored message extraction logic from snapshots to improve handling of messages from persisted snapshots.
- Removed the deprecated `loadCompactedMessages` function and updated the event streaming logic to utilize persisted message snapshots for better performance and reliability.

These changes enhance the overall stability and functionality of the AG-UI event handling system.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Claude Code Review

Summary

This PR introduces a new capabilities endpoint and significantly refactors the AGUI event handling system. The changes replace custom event compaction logic with runner-emitted snapshots and integrate CopilotKit for the frontend chat UI. Overall, the implementation demonstrates strong security practices and architectural clarity, with a few areas requiring attention before merge.

Key Changes:

  • ✅ New /capabilities endpoint with proper RBAC validation
  • ✅ MESSAGES_SNAPSHOT persistence for fast reconnect
  • ✅ Removal of complex compaction logic (~400 lines deleted)
  • ✅ CopilotKit integration for chat UI
  • ⚠️ Large dependency additions (16K+ lines in package-lock.json)
  • ⚠️ Frontend uses interface instead of type (violates guidelines)

Issues by Severity

🚫 Blocker Issues

None - No critical security or correctness issues that block merge.


🔴 Critical Issues

1. Frontend Type Definitions Violate Standards

Location: components/frontend/src/types/agui.ts

The codebase standard is to always use type over interface (see CLAUDE.md line 1144 and frontend-development.md line 73-76).

Problem:

// Added in this PR - violates guidelines
interface Capabilities { ... }

Fix Required:

// Should be:
type Capabilities = { ... }

Reference: CLAUDE.md lines 1141-1145, frontend-development.md lines 73-76


2. Missing Type Safety in Capabilities Response

Location: components/backend/websocket/agui_proxy.go:454-462

var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
    log.Printf("Capabilities: Failed to decode response: %v", err)
    c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to parse runner response"})
    return
}
c.JSON(http.StatusOK, result)

Issues:

  • No type validation on result before returning to user
  • Could return arbitrary JSON from runner without structure validation
  • Returning 500 Internal Server Error exposes implementation details

Recommendation:

  1. Define a CapabilitiesResponse struct with expected fields
  2. Unmarshal into typed struct
  3. Return 503 Service Unavailable (not 500) if runner response is malformed

Pattern: See error-handling.md lines 199-220 for proper error exposure patterns.


3. Large Dependency Additions Without Justification

Location: components/frontend/package.json and package-lock.json

Added Dependencies:

  • @copilotkit/react-core + @copilotkit/react-ui + @copilotkit/runtime + @copilotkit/runtime-client-gql
  • @ag-ui/client

Impact:

  • +16,085 lines added to package-lock.json
  • Substantial increase in bundle size
  • Potential security surface area expansion

Missing:

  • Dependency audit results
  • Bundle size impact analysis
  • Justification for why CopilotKit is preferred over the custom implementation

Recommendation:

  • Add comment to PR description explaining why CopilotKit was chosen
  • Include bundle size comparison (before/after)
  • Run npm audit and document any vulnerabilities

🟡 Major Issues

4. Fallback Capabilities Response May Hide Errors

Location: components/backend/websocket/agui_proxy.go:431-439

if err != nil {
    log.Printf("Capabilities: Request failed: %v", err)
    // Runner not ready — return minimal default
    c.JSON(http.StatusOK, gin.H{
        "framework":       "unknown",
        "agent_features":  []interface{}{},
        "platform_features": []interface{}{},
        "file_system":     false,
        "mcp":             false,
    })
    return
}

Issue:

  • Returns 200 OK when runner is actually unavailable
  • Frontend cannot distinguish between "runner truly has no features" vs. "runner is not responding"
  • Could lead to confusing UI state

Recommendation:
Return 503 Service Unavailable with structured error:

c.JSON(http.StatusServiceUnavailable, gin.H{
    "error": "Runner not available",
    "message": "Session is starting or runner is unavailable",
})

Frontend can then show appropriate loading/error state.


5. Missing Error Context in Logs

Location: components/backend/websocket/agui.go:52

if eventType == types.EventTypeMessagesSnapshot {
    go persistMessagesSnapshot(sessionID, event)
}

Issue:

  • persistMessagesSnapshot runs in goroutine but errors are only logged
  • No way to know if snapshot persistence failed
  • Could lead to users losing conversation history on reconnect

Recommendation:
Consider adding metrics/alerting for snapshot persistence failures, or at minimum log with ERROR level instead of Printf.


6. Deleted Compaction Logic Without Migration Path

Location: components/backend/websocket/compaction.go (deleted)

Issue:

  • 401 lines of compaction logic deleted
  • Existing sessions with events in old format may not have MESSAGES_SNAPSHOT
  • No migration documented for sessions created before this PR

Questions:

  1. What happens to sessions created before this PR that don't have messages-snapshot.json?
  2. Is there a migration script to backfill snapshots?

Recommendation:
Add migration logic or document the breaking change in CHANGELOG.


🔵 Minor Issues

7. Frontend Component Missing Loading States

Location: components/frontend/src/components/session/CopilotChatPanel.tsx

Issue:

  • No loading state while CopilotKit initializes
  • No error boundary for when runtime connection fails

Recommendation:

export function CopilotChatPanel({ projectName, sessionName }: Props) {
  const { data: capabilities, isLoading, error } = useCapabilities(projectName, sessionName);
  
  if (isLoading) return <div>Initializing chat...</div>;
  if (error) return <div>Failed to connect: {error.message}</div>;
  
  return <CopilotKit runtimeUrl={...}>...</CopilotKit>;
}

Reference: frontend-development.md line 156 (all buttons/components need loading states)


8. Typo Fixed But Inconsistent Naming

Location: components/backend/types/agui.go:23-24

-EventTypStateDelta     = "STATE_DELTA"  // Typo fixed
+EventTypeStateDelta    = "STATE_DELTA"

Good: Typo fixed ✅

Issue: Existing code may reference EventTypStateDelta - should verify no usages remain:

grep -r "EventTypStateDelta" components/backend components/operator

9. Missing Test Coverage for New Endpoint

Location: components/backend/websocket/agui_proxy.go:416-462

Issue:

  • New HandleCapabilities endpoint has no unit or integration tests
  • RBAC validation logic should be tested (unauthorized access scenarios)

Recommendation:
Add tests following pattern in tests/integration/:

func TestHandleCapabilities_Unauthorized(t *testing.T) { ... }
func TestHandleCapabilities_RunnerUnavailable(t *testing.T) { ... }
func TestHandleCapabilities_Success(t *testing.T) { ... }

10. Runner Endpoint Uses Global State

Location: components/runners/claude-code-runner/endpoints/capabilities.py:40

has_langfuse = state._obs is not None and state._obs.langfuse_client is not None

Issue:

  • Direct access to global state._obs is fragile
  • Underscore prefix suggests private implementation detail

Recommendation:
Add accessor method:

def has_observability() -> bool:
    return state._obs is not None and state._obs.langfuse_client is not None

Positive Highlights

✅ Security Done Right

  1. User Token Authentication: HandleCapabilities correctly uses GetK8sClientsForRequest (agui_proxy.go:421)
  2. RBAC Validation: Proper permission check before proxying (agui_proxy.go:430-446)
  3. No Token Leaks: All logging uses safe patterns

Reference Compliance: Follows k8s-client-usage.md patterns exactly. ✅


✅ Excellent Code Organization

  1. Snapshot Persistence: Clean separation of concerns (agui.go:46-81)
  2. Error Handling: Consistent patterns with proper context logging
  3. Removal of Dead Code: Deleted 401 lines of unused compaction logic

✅ React Query Usage

The new useCapabilities hook follows all best practices:

  • ✅ Proper query keys with parameters (use-capabilities.ts:6-8)
  • ✅ Conditional polling with dynamic interval (lines 29-38)
  • ✅ Stale time configuration (line 26)
  • ✅ Proper TypeScript types

Reference Compliance: Follows react-query-usage.md patterns exactly. ✅


✅ Backend Proxy Pattern

The HandleCapabilities function follows established proxy patterns:

  • ✅ Auth validation before proxy
  • ✅ RBAC check
  • ✅ Proper error propagation
  • ✅ Timeout handling (10s timeout)

✅ Documentation

Added comprehensive ADR (0006-ambient-runner-sdk-architecture.md, +423 lines) explaining architectural decisions. Strong documentation culture. 📚


Recommendations

Priority 1 (Before Merge)

  1. Fix Frontend Type Violation: Change interface to type in agui.ts
  2. Add Typed Capabilities Response: Define Go struct for capabilities validation
  3. Document Breaking Changes: Add migration notes for pre-existing sessions

Priority 2 (Nice to Have)

  1. Bundle Size Analysis: Document impact of CopilotKit dependencies
  2. Test Coverage: Add integration tests for capabilities endpoint
  3. Error Handling: Return 503 instead of 200 OK when runner unavailable
  4. Loading States: Add loading/error states to CopilotChatPanel

Priority 3 (Future Work)

  1. Metrics: Add Prometheus metrics for snapshot persistence failures
  2. Refactor Global State: Add accessor methods for runner state
  3. Verify Typo Fix: Grep for old EventTypStateDelta usage

Overall Assessment

Code Quality: 8.5/10 — Strong security practices, good architectural clarity, minor issues with frontend standards compliance.

Readiness: APPROVE with changes — Fix the frontend interface → type issue and add typed capabilities response. All other issues are non-blocking improvements.

The removal of compaction logic is a significant architectural improvement (MESSAGES_SNAPSHOT from runner is cleaner). The new capabilities endpoint follows all security patterns correctly.

Great work on the refactor! 🚀


References:

  • CLAUDE.md (Backend/Operator Development Standards)
  • .claude/context/security-standards.md
  • .claude/patterns/k8s-client-usage.md
  • .claude/patterns/error-handling.md
  • .claude/patterns/react-query-usage.md

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant