Skip to content

Conversation

@jeremyeder
Copy link
Collaborator

Summary

  • Add MODELS_JSON key to the existing operator-config ConfigMap (all overlays: kind, e2e, production, local-dev, minikube)
  • Backend parses MODELS_JSON env var and includes models array in GET /api/cluster-info response
  • Frontend reads models from useClusterInfo() hook and populates the session creation dropdown dynamically (falls back to hardcoded list if empty)
  • Operator passes MODELS_JSON through to runner pods as an env var
  • Runner builds VERTEX_MODEL_MAP from MODELS_JSON at startup (falls back to hardcoded map)

Adding or removing a model is now: edit ConfigMap → kubectl rollout restart → done. No code changes or image rebuilds needed.

Test plan

  • GET /api/cluster-info returns models array populated from ConfigMap
  • Frontend model dropdown shows models from API response
  • Default model selection respects default: true flag
  • Frontend falls back to hardcoded list when models is empty
  • Runner resolves Vertex model IDs from MODELS_JSON env var
  • Runner falls back to hardcoded map when MODELS_JSON is unset
  • Edit ConfigMap + rollout restart updates models without image rebuilds
  • go vet, gofmt, tsc --noEmit, next build all pass

🤖 Generated with Claude Code

Models were hardcoded in the frontend dropdown and runner's Vertex
mapping, requiring code changes + image rebuilds to add a model.
Now models are defined in the existing operator-config ConfigMap
and flow through the stack: backend exposes them via /api/cluster-info,
frontend populates the dropdown dynamically, operator passes them to
runner pods, and the runner builds VERTEX_MODEL_MAP from the env var.

Edit ConfigMap + rollout restart = models updated, no image rebuilds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Claude Code Review

Summary

This PR introduces dynamic model configuration via a MODELS_JSON ConfigMap key, allowing operators to add/remove models without code changes or image rebuilds. The implementation spans backend (Go), frontend (TypeScript/React), and runner (Python), with graceful fallback to hardcoded defaults.

Overall Assessment: Strong implementation with good fallback patterns and type safety. A few areas need attention before merge.


Issues by Severity

🔴 Critical Issues

1. Backend: Untyped JSON parsing exposes type safety risks

  • Location: components/backend/handlers/projects.go:141-149
  • Issue: Parsing MODELS_JSON into []map[string]interface{} without validation
  • Risk: Malformed ConfigMap data could cause frontend crashes or unexpected behavior
  • Impact: Medium - graceful degradation exists, but could lead to confusing UX
  • Fix: Add struct type and validation:
type ModelInfo struct {
    Name        string `json:"name"`
    DisplayName string `json:"displayName"`
    VertexID    string `json:"vertexId,omitempty"`
    Default     bool   `json:"default,omitempty"`
}

var models []ModelInfo
if raw := os.Getenv("MODELS_JSON"); raw \!= "" {
    if err := json.Unmarshal([]byte(raw), &models); err \!= nil {
        log.Printf("Warning: failed to parse MODELS_JSON: %v", err)
        models = []ModelInfo{}
    }
}
// Validate required fields
for _, m := range models {
    if m.Name == "" || m.DisplayName == "" {
        log.Printf("Warning: invalid model entry (missing name or displayName)")
    }
}

2. Frontend: Defaulting model value may cause form inconsistency

  • Location: components/frontend/src/components/create-session-dialog.tsx:79-92
  • Issue: defaultModel is computed but not reactive to clusterModels changes
  • Risk: If useClusterInfo() loads after form initialization, default stays claude-sonnet-4-5 even if cluster default differs
  • Fix: Use useEffect to reset form when clusterModels loads:
useEffect(() => {
  if (clusterModels.length > 0) {
    const defaultModel = clusterModels.find((m) => m.default)?.name ?? "claude-sonnet-4-5";
    form.setValue("model", defaultModel);
  }
}, [clusterModels, form]);

🟡 Major Issues

3. Python: Broad exception catching masks errors

  • Location: components/runners/claude-code-runner/auth.py:65
  • Issue: except Exception: catches all errors, including KeyError, TypeError
  • Best Practice: Catch specific exceptions for better debugging
  • Fix:
try:
    models = _json.loads(raw)
    return {m["name"]: m["vertexId"] for m in models if m.get("vertexId")}
except (_json.JSONDecodeError, KeyError, TypeError) as e:
    logger.warning(f"Failed to parse MODELS_JSON: {e}")
    return dict(_HARDCODED_VERTEX_MAP)

4. Missing validation: Empty string handling

  • Location: Backend projects.go:142, Runner auth.py:59
  • Issue: Both check if raw \!= "" but don't validate JSON isn't just whitespace
  • Risk: Whitespace-only env vars pass the check but fail parsing
  • Fix: Add strings.TrimSpace() in Go, raw.strip() in Python

5. No default model validation in frontend

  • Location: create-session-dialog.tsx:79
  • Issue: If no model has default: true, hardcoded fallback may not exist in cluster
  • Risk: Form defaults to a model not available in the cluster
  • Fix:
const defaultModel = clusterModels.find((m) => m.default)?.name 
  ?? clusterModels[0]?.name  // Use first model if no default
  ?? "claude-sonnet-4-5";   // Final fallback

🔵 Minor Issues

6. Python: Dictionary comprehension could filter models more explicitly

  • Location: auth.py:64
  • Issue: if m.get("vertexId") silently skips models without vertexId
  • Suggestion: Log skipped models for operator visibility:
vertex_map = {}
for m in models:
    if "vertexId" not in m or not m["vertexId"]:
        logger.info(f"Skipping model without vertexId: {m.get('name', 'unknown')}")
    else:
        vertex_map[m["name"]] = m["vertexId"]
return vertex_map

7. ConfigMap comments could be more descriptive

  • Location: All operator-config.yaml files
  • Current: # Available models for the platform (consumed by backend, operator, runner)
  • Suggestion: Add example showing required/optional fields:
# Available models for the platform (consumed by backend, operator, runner)
# Format: [{"name":"...","displayName":"...","vertexId":"...","default":true}]
# Required: name, displayName | Optional: vertexId (for Vertex AI), default (boolean)
MODELS_JSON: '...'

8. No error boundary for model dropdown rendering

  • Location: create-session-dialog.tsx:75-77
  • Issue: If clusterModels has malformed data, .map() could throw
  • Suggestion: Add validation before mapping:
const models = clusterModels.length > 0 && clusterModels.every(m => m.name && m.displayName)
  ? clusterModels.map((m) => ({ value: m.name, label: m.displayName }))
  : fallbackModels;

Positive Highlights

Excellent fallback patterns - All three components gracefully degrade to hardcoded defaults
Type safety in frontend - ModelInfo type properly defined in cluster.ts
Consistent implementation - Backend, frontend, and runner all follow similar patterns
Deployment coverage - ConfigMap updated across all overlays (kind, e2e, prod, minikube, local-dev)
Non-breaking change - Existing deployments continue working with empty/missing MODELS_JSON
Security conscious - ConfigMap marked optional: true in deployments
Good separation of concerns - API layer (cluster.ts) separated from hook (use-cluster-info.ts)


Recommendations

Pre-Merge (Priority Order)

  1. Add backend validation - Structured type for ModelInfo in Go (Critical Issue Outcome: Reduce Refinement Time with agent System #1)
  2. Fix frontend default model - useEffect to handle async cluster info load (Critical Issue Epic: RAT Architecture & Design #2)
  3. Narrow Python exception handling - Catch specific exceptions (Major Issue Epic: Data Source Integration #3)
  4. Add whitespace validation - Trim env vars before parsing (Major Issue Epic: AI Agent Development #4)
  5. Improve default model logic - Use first model if no default flag (Major Issue Epic: Jira Integration & Workflow #5)

Post-Merge Enhancements

  1. Add E2E test validating model dropdown populates from ConfigMap
  2. Add backend endpoint for validating MODELS_JSON schema (for operator tooling)
  3. Document MODELS_JSON schema in components/manifests/README.md
  4. Consider adding Prometheus metric for model selection distribution

Testing Checklist Progress

Based on PR description test plan:

  • ⚠️ Partially addressed: Frontend default model (see Critical Issue Epic: RAT Architecture & Design #2)
  • ⚠️ Missing: Validation that selected model exists in cluster after ConfigMap change
  • Covered: Fallback behavior (code review confirms pattern)
  • ⚠️ Recommend: Add test for malformed JSON in ConfigMap

Adherence to Project Standards

Backend (Go):

  • ✅ Error handling patterns followed
  • ⚠️ Type safety needs improvement (use structs, not map[string]interface{})
  • ✅ No panic in production code
  • ✅ Proper logging with context

Frontend (TypeScript/React):

  • ✅ React Query patterns followed correctly
  • ✅ Shadcn components used
  • ✅ Type safety with ModelInfo type
  • ✅ No any types
  • ⚠️ Missing useEffect for reactive default handling

Security:

  • ✅ No token leakage concerns
  • ✅ Public endpoint (/cluster-info) appropriate for this data
  • ✅ No RBAC bypass risks

Overall: 8/10 - Solid implementation, needs type safety and reactive default handling improvements before merge.


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant