Skip to content

feat: workspace-rbac-quota-design#607

Draft
jeremyeder wants to merge 3 commits intoambient-code:mainfrom
jeremyeder:feature/workspace-rbac-quota-design
Draft

feat: workspace-rbac-quota-design#607
jeremyeder wants to merge 3 commits intoambient-code:mainfrom
jeremyeder:feature/workspace-rbac-quota-design

Conversation

@jeremyeder
Copy link
Collaborator

No description provided.

@jeremyeder jeremyeder changed the title feat: add session-config repo support (spec.configRepo) feat: workspace-rbac-quota-design Feb 10, 2026
@github-actions

This comment has been minimized.

MVP design documentation for workspace permissions and quota management system.

Documents included:

1. WORKSPACE_RBAC_AND_QUOTA_DESIGN.md (15 KB)
   - Complete technical specification with 10 detailed parts
   - Owner/admin hierarchy (5-tier model)
   - ProjectSettings CR enhancements with full schema
   - Kueue integration for quota enforcement
   - Langfuse tracing strategy (privacy-first masking)
   - Delete project safety pattern
   - Implementation phases (Phase 1 full scope, Phase 2 deferred)
   - Backward compatibility approach

2. MVP_IMPLEMENTATION_CHECKLIST.md (8 KB)
   - Week-by-week implementation plan (8-10 weeks)
   - Actionable tasks with checkboxes for Jira
   - Effort breakdown: 13 person-days (4 backend + 3 operator + 2 frontend + 2 testing + 2 ops)
   - Step-by-step progression from CRD design to deployment

3. ROLES_VS_OWNER_HIERARCHY.md (7 KB)
   - Clarification of governance vs. technical permissions
   - Difference between Kubernetes RBAC roles and owner/admin fields
   - Scenario wal   - Scenario wal   - Scenario wal   - Scenario wal   - Scenario wal   - Scenaion   - Scenario wal ry
   - Scenario wal   - Scenario wal   - Scenaut   - Scenario wal   - Scenario wal   - Scenaut   - Scenariod    - Scenario wal   - Scenario wal   - S 1
   - Success criteria for MVP
   - Risk mitigation and next steps

5. QUICK_REFERENCE.md (3 KB)
                                                       l                                  w
                                                       l        Navigation guide for different audiences
   - Links to choose your path (architect/engineer/PM/infra)
   - Document statistics and qu   - Document statistics and qu   - Document ked In):
- 5-tier hierarchy: Root User → Owner → Admin(s) → User/Editor → Viewer
- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owot)- Owner i- Owner i- Owner i- Owner is - Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Oss- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owrk- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owot)- Owner i- Owner i- Owner i- Owner is - Owner iro- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owner i- Owot)- Owner i- Owner i- Owner i- Owner is - Owner i- Owner i- Owner i- Owner i- OwnU-ers, quota, kueueWorkloadProf- Owner i- Owner i- Owner i- Owner(add/remove)
- Delete with confi- Delete with confi- Delete with confi- Delete with confi- Delete with confi- Delete with ce- Delete with confi- Delete with confi- Delete with confi- Delete with confi- Delete with confi- Deleteec- Delete with confi- Delete with confi- Delete with confi- )
- Audit trail (createdAt, createdBy, lastModifiedAt, lastModifiedBy)
- Migration scri- Migration scri- Migration scri- Migration scri- Migration scri- Migration scov- Migration scri- Migration scri- Migration scri- Migration scrireserved, prepaid)
- Cost attribution and chargeback
… system

- LEARNING_GUIDE.md (10KB): Beginner-friendly guide for all roles
  * PMs: 5-min overview of the 5-tier hierarchy
  * Engineers: 20-min detailed architecture walkthrough
  * Operators: 15-min deployment & configuration guide
  * Includes FAQ, scenarios, testing strategy

- ARCHITECTURE_DIAGRAMS.md (8KB): 14 Mermaid diagrams
  * Permission hierarchy (5-tier overview)
  * Admin management lifecycle
  * ProjectSettings CR structure
  * Kueue integration architecture
  * Kubernetes RBAC integration
  * User journeys (create workspace, create session)
  * Implementation timeline

- QUICK_SLIDES.md (6KB): Executive summary in 14 slides
  * Problem statement
  * Permission matrix
  * Common workflows
  * Key takeaways
  * Learning paths by role
  * Next steps

Total learning time: ~90 minutes for complete understanding
@jeremyeder jeremyeder force-pushed the feature/workspace-rbac-quota-design branch from 6f89d57 to d4e348d Compare February 10, 2026 07:11
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Claude Code Review - PR #607

Summary

This PR adds comprehensive design documentation for a new Workspace RBAC & Quota System feature. 9 new markdown files (~4,380 lines) in docs/design/.

Type: Design documentation (no code changes)
Quality: High-quality technical writing
Risk: Low (documentation only)

🔴 Critical Issues

1. Violates CLAUDE.md Documentation Standards

CLAUDE.md states: "Default to improving existing documentation rather than creating new files" and "Avoid top-level proliferation".

This PR creates 9 new files in docs/design/ instead of:

  • Using docs/adr/ for design decisions (ADR-0006)
  • Updating docs/architecture/diagrams/ for diagrams
  • Enhancing .claude/context/ files with patterns

Recommendation: Reorganize into existing documentation structure per CLAUDE.md.

2. Backward-Incompatible Changes Without Migration Strategy

Part 10 shows shell-based migration script. Issues:

  • No validation for safe migration
  • No rollback plan
  • Assumes first admin becomes owner (brittle)

Recommendation: Implement migration as Go code in operator with validation and rollback.

3. Kueue Integration Assumes Cluster-Wide Installation

Part 4 assumes Kueue is installed. No fallback if unavailable.

Impact: Won't work in multi-tenant clusters, restricted environments, air-gapped deployments.

Recommendation: Make Kueue optional with KUEUE_ENABLED flag and graceful degradation.

🟡 Major Issues

4. Langfuse Integration Language Confusion

Backend is Go but design references Python observability.py. Shows Python syntax for Go handlers.

Recommendation: Use HTTP API to Langfuse from Go, reference existing runner integration.

5. Owner Role Has Too Much Power

Owner is immutable with delete privileges. No transfer in Phase 1.

Risks: Owner leaves company → workspace orphaned. Owner compromised → full deletion.

Recommendation: Add soft-delete, co-owners, or move transfer to Phase 1.

6. Missing Security Review

No discussion of: token exposure, RBAC escalation, audit retention, emergency access.

Recommendation: Add security threat model and compliance documentation.

7. Quota Split Across 6 Systems

Enforcement across Kueue, ResourceQuota, LimitRange, Operator, Backend, Langfuse.

Recommendation: Consolidate to one enforcement point with clear precedence order.

🔵 Minor Issues

  1. Inconsistent terminology (Workspace vs Project)
  2. Missing E2E test scenarios
  3. No rollback plan for 8-10 week implementation
  4. Mermaid diagrams lack alt text
  5. File naming (SCREAMING_SNAKE_CASE vs kebab-case convention)

✅ Positive Highlights

  • Excellent technical writing
  • Thorough design coverage
  • Good use of examples and diagrams
  • Clear phase planning
  • Week-by-week implementation guidance

Verdict

⚠️ REQUEST CHANGES

Merge Blockers:

  1. Reorganize per CLAUDE.md standards
  2. Make Kueue optional
  3. Fix Langfuse Go/Python confusion
  4. Add migration safety

After fixes: Excellent reference documentation.


Reviewer: Claude Code (Sonnet 4.5)
Standards: CLAUDE.md, Backend/Frontend/Security patterns


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Claude Code Review

Summary

This PR introduces comprehensive design documentation for implementing a workspace RBAC and quota management system in the Ambient Code Platform. The documentation is exceptionally well-structured with 9 detailed markdown files (~4,300 lines total) covering architecture, implementation plans, learning guides, and reference materials.

PR Scope: Documentation-only (no code changes)
Quality: Excellent - professional, thorough, actionable
Readiness: Ready to merge with minor suggestions


Issues by Severity

🚫 Blocker Issues

None - This is documentation-only, no blocking issues found.


🔴 Critical Issues

1. Kueue Dependency Referenced Throughout, But Design Changed to Namespace Quotas

Files affected:

  • docs/design/WORKSPACE_RBAC_AND_QUOTA_DESIGN.md (Part 4, multiple sections)
  • docs/design/MVP_IMPLEMENTATION_CHECKLIST.md (Week 3-4, Week 5-6)
  • docs/design/ARCHITECTURE_SUMMARY.md
  • docs/design/LEARNING_GUIDE.md

Issue: The design documents originally referenced Kueue (Kubernetes queuing system) throughout, but the main design doc (Part 4) was updated to use native Kubernetes ResourceQuota + LimitRange instead. However, many references to Kueue remain in other documents, creating inconsistency.

Evidence:

  • Main design (Part 4) now says: "Why namespace quotas? ... For MVP we prefer to use native Kubernetes primitives"
  • But checklist still says: "Install Kueue operator", "Create LocalQueue", "Kueue enforces via ClusterQueue limits"
  • Architecture summary mentions: "Kueue integrated (first-class component, not optional)"

Impact: High confusion for implementers - they won't know whether to install Kueue or use native K8s quotas.

Recommendation:

  1. Do a global search-replace of Kueue references:
    • Replace KueueNamespace ResourceQuota/LimitRange
    • Replace ClusterQueue/LocalQueueResourceQuota and LimitRange
    • Replace WorkloadPod or Job
  2. Update all diagrams/examples to show ResourceQuota/LimitRange instead of Kueue components
  3. Remove Kueue installation steps from checklist
  4. Add ResourceQuota/LimitRange manifest examples

🟡 Major Issues

1. Inconsistent Terminology: "Admin" Role Overloaded

Files: All design docs

Issue: The term "Admin" is used for two different concepts:

  1. Governance role (in ProjectSettings.spec.adminUsers) - managed by Owner
  2. Kubernetes RBAC role (ambient-project-admin ClusterRole) - technical permissions

This creates confusion when reading statements like "admins can't remove each other" (governance) vs "admins can delete sessions" (RBAC).

Recommendation:

  • Consider using "Workspace Admin" for governance role
  • Use "Admin Role" or "ClusterRole admin" for RBAC
  • Or add a glossary section early in each doc clarifying the two meanings

Mitigating factor: ROLES_VS_OWNER_HIERARCHY.md does explain this distinction well, but readers won't always read that first.


2. Migration Script Details Missing

File: docs/design/WORKSPACE_RBAC_AND_QUOTA_DESIGN.md Part 10

Issue: Migration script is sketched but lacks:

  • Error handling (what if namespace has no admins?)
  • Rollback strategy
  • Dry-run mode details
  • What happens to existing RoleBindings?

Recommendation:
Add to implementation checklist:

  • Test migration script on dev cluster with edge cases
  • Document rollback procedure
  • Add validation step before migration (check all namespaces have at least one admin)

3. Langfuse Trace Event Schema Not Specified

File: docs/design/WORKSPACE_RBAC_AND_QUOTA_DESIGN.md Part 5

Issue: Events listed (project_created, admin_added, etc.) but no schema specified:

  • What exact fields are in each event?
  • Are they structured as traces, generations, or spans?
  • What's the format of userId, sessionId, metadata?

Recommendation:
Add a subsection "Event Schema" with example JSON for each event type:

{
  "name": "admin_added",
  "input": { "project_name": "...", "admin_email": "..." },
  "output": { "status": "success", "rolebinding_created": "..." },
  "userId": "alice@company.com",
  "sessionId": "project-settings-update-123",
  "metadata": { "timestamp": "..." }
}

🔵 Minor Issues

1. File Structure Section Lists Non-Existent Paths

File: docs/design/README.md, ARCHITECTURE_SUMMARY.md

Issue: Documents reference:

  • components/manifests/quota/ - doesn't exist yet
  • components/backend/observability.py - platform uses Go backend, not Python (runner uses Python)

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

Recommendation:

  • Mark these as "to be created" or use [TBD] prefix
  • Clarify that observability.py is in runner component, not backend

2. Root User Implementation Details Vague

File: docs/design/WORKSPACE_RBAC_AND_QUOTA_DESIGN.md Part 8

Issue: Root user concept introduced but:

  • How is root user authenticated? (Environment var? Group membership?)
  • What endpoints expose root-only operations?
  • Is there a UI for root user tasks?

Recommendation:

  • Phase 1: Explicitly defer root user implementation to Phase 2
  • Or: Add "Root User Implementation Details" section to checklist

3. Documentation Standards Violation (from CLAUDE.md)

Issue: Per CLAUDE.md documentation standards:

"Default to improving existing documentation rather than creating new files... Colocate new docs: When feasible, documentation should live in the subdirectory that has the relevant code"

This PR creates 9 top-level docs files in docs/design/ for a single feature.

Recommendation:
Consider consolidating into fewer files:

  • Option A: Single comprehensive doc with chapters (current approach is acceptable but verbose)
  • Option B: Move implementation-specific docs closer to code:
    • Backend guide → components/backend/docs/rbac-quota-implementation.md
    • Operator guide → components/operator/docs/projectsettings-reconciliation.md
    • Frontend guide → components/frontend/docs/admin-management-ui.md

Mitigating factor: The docs/design/README.md navigation guide helps, and this is a cross-cutting architectural change affecting multiple components, so top-level placement is justifiable.


4. Typos and Grammar Issues

Files: Multiple

Issues found:

  1. WORKSPACE_RBAC_AND_QUOTA_DESIGN.md:532 - Duplicate section header "QUOTA EVENTS:"
  2. LEARNING_GUIDE.md:42 - Stray line: "Q: How do namespace quotas prevent starvation? A: Per-namespace..." appears mid-section
  3. QUICK_SLIDES.md:287 - "Weeks 1-10" but timeline shows 8-10 weeks

Recommendation: Proofread before final merge.


5. Missing Cross-References to Existing ADRs

Issue: Design introduces major architectural changes but doesn't reference existing ADRs:

  • ADR-0001 (Kubernetes-Native Architecture) - relevant to quota decisions
  • ADR-0002 (User Token Authentication) - relevant to RBAC validation

Recommendation:
Add "Related Decisions" section to main design doc linking to:

  • Existing ADRs that influenced this design
  • Future ADR that should be created from this design (e.g., ADR-0006: Workspace Governance Model)

Positive Highlights

✅ Exceptional Documentation Quality

  1. Multi-Audience Approach: Separate guides for engineers, PMs, operators, and executives
  2. Progressive Disclosure: Quick reference → Summary → Deep dive → Implementation checklist
  3. Visual Aids: ASCII diagrams, permission matrices, flow charts
  4. Action-Oriented: Week-by-week checklists with checkboxes ready for Jira
  5. Realistic Effort Estimates: 13 person-days, 8-10 weeks - grounded in reality

✅ Comprehensive Coverage

  • 10-part main design doc covers every aspect: current state, new model, CRD changes, quota integration, observability, safety patterns, phases, root user, config examples, backward compat
  • Scenarios and walkthroughs make complex concepts understandable
  • FAQ sections anticipate common questions
  • Glossary and permission matrices serve as quick lookup

✅ Standards Compliance (Mostly)

Security Standards:

  • Correctly identifies user token authentication requirements
  • Proposes owner validation before governance operations
  • Langfuse privacy-first masking (messages redacted by default)
  • Delete confirmation safety pattern

Error Handling:

  • Acknowledges need for graceful handling of deleted resources during reconciliation
  • Proposes status updates for quota limit exceeded scenarios

Backend Patterns:

  • Uses GetK8sClientsForRequest for user operations
  • Correctly identifies when to use service account (CR writes after validation)
  • Proposes RBAC checks before admin operations

Frontend Patterns:

  • Confirmation dialog uses Shadcn UI components
  • Proposes React Query mutations for admin management
  • Loading states and error handling considered

✅ Phasing Strategy

  • Phase 1 (MVP): Focused scope - governance + safety + quota enforcement
  • Phase 2: Deferred complexity - transfers, advanced policies
  • Phase 3: Future vision - cost attribution, chargeback

This phased approach reduces risk and allows for iteration based on feedback.

✅ Backward Compatibility Considered

  • Migration script for existing projects
  • Operator handles legacy CRs gracefully (empty owner field)
  • Existing Kubernetes RBAC ClusterRoles unchanged
  • No breaking changes to existing API endpoints

Recommendations

High Priority

  1. Resolve Kueue vs. Namespace Quota Inconsistency (Critical Outcome: Reduce Refinement Time with agent System #1)

    • Decision: Stick with ResourceQuota/LimitRange approach
    • Action: Global cleanup of Kueue references across all docs
    • Estimated effort: 1-2 hours
  2. Add Concrete Examples

    • ResourceQuota + LimitRange YAML manifests for each tier (development, production, unlimited)
    • Example API requests/responses for admin management endpoints
    • Example Langfuse trace payloads
  3. Clarify Root User Scope

    • Either fully spec root user implementation in Phase 1
    • Or explicitly defer to Phase 2 and remove from MVP scope

Medium Priority

  1. Create ADR from This Design

    • Once approved, convert main design doc into ADR-0006
    • Link from docs/decisions.md
    • Follow ADR template structure
  2. Add Implementation Notes

    • Section in checklist for "Blockers Discovered During Implementation"
    • Template for weekly status updates
    • Definition of "done" for each week

Low Priority

  1. Consider Consolidation

    • 9 separate files → could be 3-4 files
    • But current structure is navigable with README.md
  2. Improve Diagrams

    • Consider using Mermaid instead of ASCII art (some docs already do)
    • Add sequence diagrams for critical flows (some included, add more)

Architecture Review

Design Decisions - Alignment with Platform Standards

Kubernetes-Native: Uses CRDs, RBAC, ResourceQuota - aligns with ADR-0001
User Token Auth: Backend validates owner using user tokens - aligns with ADR-0002
Multi-Tenant: Namespace isolation, per-workspace quotas - consistent with existing approach
Operator Pattern: Reconciliation loop, status subresource updates - follows established patterns
Observability: Langfuse integration with privacy masking - consistent with existing setup

@github-actions
Copy link
Contributor

Scalability Considerations

Operator Reconciliation: Idempotent RoleBinding creation prevents issues at scale
Status Updates: Using UpdateStatus subresource (correct pattern)
⚠️ Watch Performance: No mention of informer caching or watch bookmark strategies - should be addressed in implementation

Security Considerations

Owner Immutability: Prevents unauthorized transfers
Delete Confirmation: Reduces accidental deletions
Audit Trail: Full attribution (createdBy, lastModifiedBy)
Token Redaction: Langfuse masking prevents token leaks
⚠️ Input Validation: Design mentions email pattern validation, but no regex specified


Testing Recommendations

The checklist includes testing (Week 8-10), but should expand:

Unit Tests (Backend)

// Add these test cases to checklist:
- TestOwnerFieldImmutable (update should be rejected)
- TestNonOwnerCannotAddAdmin (403 Forbidden)
- TestOwnerCanAddMultipleAdmins (idempotent)
- TestDeleteWorkspaceWithoutConfirmation (400 Bad Request)
- TestDeleteWorkspaceNonOwner (403 Forbidden)
- TestAdminRoleBindingCreatedOnAdminAdd
- TestAdminRoleBindingDeletedOnAdminRemove

Integration Tests

// Scenarios to add:
- Owner adds admin → Operator creates RoleBinding → Admin creates session → Success
- Quota exceeded → Session creation rejected → Langfuse trace emitted
- Concurrent admin additions (race condition test)
- Namespace deleted during operator reconciliation (should not crash)

E2E Tests (Cypress)

// User journeys to add:
- Create workspace → Add 2 admins → Remove 1 admin → Verify UI reflects changes
- Non-owner tries to see "Delete Workspace" button → Should be hidden
- Owner deletes workspace → Workspace disappears from list
- Quota exceeded → UI shows "Quota exceeded" message with queue position

Documentation Completeness Check

Against CLAUDE.md standards:

Colocated Documentation: Justifiable at top-level for architectural design
ADR Integration: Should create ADR-0006 after approval
Decision Log: Should add entry to docs/decisions.md
⚠️ Memory System Update: After merge, update .claude/context/backend-development.md with ProjectSettings patterns
⚠️ Pattern Catalog: Add new pattern file .claude/patterns/projectsettings-reconciliation.md


Final Verdict

Ready to Merge: ✅ YES (with suggestions)

Strengths:

  • Exceptional documentation quality
  • Comprehensive coverage of all aspects
  • Well-structured for multiple audiences
  • Realistic effort estimates
  • Strong alignment with platform standards

Blocker Resolution Required:

  • None - This is documentation, not code

Post-Merge Actions Required:

  1. ✅ Resolve Kueue vs. Namespace Quota inconsistency before implementation starts
  2. ✅ Add ResourceQuota/LimitRange manifest examples
  3. ✅ Create ADR-0006 from approved design
  4. ✅ Update .claude/context/ files with new patterns
  5. ✅ Add concrete Langfuse event schema examples

Estimated Rework Effort

  • High Priority Issues: 2-4 hours (Kueue cleanup, add examples)
  • Medium Priority Issues: 1-2 hours (ADR creation)
  • Low Priority Issues: 1 hour (typos, cross-references)
  • Total: 4-7 hours to address all feedback

Conclusion

This is exemplary design documentation that sets a high bar for future architectural work. The multi-document approach with progressive disclosure, multiple audience targeting, and actionable checklists demonstrates professional software engineering practices.

The main issue (Kueue/ResourceQuota inconsistency) is straightforward to fix and doesn't block merge. All other issues are minor improvements that can be addressed before or during implementation.

Recommendation: APPROVE with post-merge cleanup of Kueue references.


Review completed by: Claude Sonnet 4.5
Review date: February 10, 2026
PR: #607
Files reviewed: 9 markdown files (~4,300 lines)
Review time: ~45 minutes
Memory context loaded: backend-development.md, frontend-development.md, security-standards.md, k8s-client-usage.md, error-handling.md, react-query-usage.md, CLAUDE.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant