Skip to content

Conversation

@jeremyeder
Copy link
Collaborator

Summary

  • Increase minio CPU/memory requests (250m/512Mi → 2000m/4Gi) and limits (1000m/2Gi → 4000m/8Gi)
  • Relax liveness and readiness probes with higher initialDelaySeconds, failureThreshold: 5, and timeoutSeconds: 5 to tolerate slow minio startups

Test plan

  • Deploy to a cluster and verify minio pod starts without probe-triggered restarts
  • Confirm minio service becomes ready and accepts connections

🤖 Generated with Claude Code

Increase CPU/memory requests and limits to handle heavier workloads,
and make liveness/readiness probes more tolerant of slow startups by
adding timeoutSeconds, failureThreshold, and increasing delays.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Claude Code Review

Summary

PR #610 increases MinIO resource allocation and relaxes health probe settings to prevent probe-triggered restarts during slow startups. The changes are infrastructure-focused (Kubernetes deployment configuration) and do not involve code logic, authentication, or security patterns.

Issues by Severity

🚫 Blocker Issues

None found.

🔴 Critical Issues

None found.

🟡 Major Issues

1. Resource Allocation Increase - Justification Needed

  • Location: components/manifests/base/minio-deployment.yaml:76-81
  • Issue: CPU and memory increases are substantial:
    • CPU requests: 250m → 2000m (8x increase)
    • CPU limits: 1000m → 4000m (4x increase)
    • Memory requests: 512Mi → 4Gi (~8x increase)
    • Memory limits: 2Gi → 8Gi (4x increase)
  • Impact: This will significantly increase resource consumption per MinIO deployment.
  • Recommendation: Document the reasoning for these specific values. Were these based on observed resource usage patterns, load testing, or vendor recommendations? Consider whether these values are appropriate for all deployment scenarios (dev, staging, prod).

2. Probe Timing Changes - Risk of Delayed Failure Detection

  • Location: components/manifests/base/minio-deployment.yaml:62-73
  • Issue:
    • Liveness probe: initialDelaySeconds: 30 → 60, periodSeconds: 10 → 15
    • Readiness probe: initialDelaySeconds: 10 → 20, periodSeconds: 5 → 10
    • Both probes: Added failureThreshold: 5
  • Impact: Longer detection times for failures:
    • Liveness failure detection: 30s + 3×10s = 60s (old) → 60s + 5×15s = 135s (new)
    • Readiness failure detection: 10s + 3×5s = 25s (old) → 20s + 5×10s = 70s (new)
  • Recommendation: While these changes prevent false positives during slow startups, they also increase the time to detect legitimate failures. Consider whether failureThreshold: 5 is too high—could failureThreshold: 3 be sufficient?

🔵 Minor Issues

1. Missing Documentation Update

  • Issue: No documentation explaining when these higher resource values are appropriate.
  • Recommendation: Add a comment in the YAML file or update components/manifests/README.md with guidance on:
    • When to use these resource values vs. lower values
    • How to adjust for different deployment sizes
    • Observed MinIO resource usage patterns that motivated these changes

2. Environment-Specific Configuration Consideration

  • Issue: These values are in the base deployment, so they'll apply to all environments.
  • Recommendation: Consider using Kustomize overlays to allow different resource values for dev/staging/prod environments. Not blocking for this PR, but worth considering for future work.

Positive Highlights

Appropriate Response to Operational Issues

  • Addresses real-world startup issues with probe-triggered restarts
  • Follows Kubernetes best practices by tuning probes rather than disabling them

Conservative Probe Tuning

  • Added timeoutSeconds: 5 to both probes (prevents timeout issues)
  • Used reasonable failureThreshold values to tolerate transient failures

Clear PR Description

  • Well-documented changes with specific before/after values
  • Includes test plan (though not yet executed per checklist)

Follows Git Workflow Standards

  • Feature branch name follows convention
  • Commit message follows conventional commits style
  • PR title uses proper fix: prefix

Recommendations

Priority 1 (Address Before Merge):

  1. Execute Test Plan: Check the box in the PR description after verifying:

    • MinIO pod starts without probe-triggered restarts
    • MinIO service becomes ready and accepts connections
    • Confirm the pod actually uses these resources (check metrics)
  2. Justify Resource Values: Add a comment explaining:

    • What workload/scenario requires 2000m CPU and 4Gi memory?
    • Were these values based on production metrics or load testing?
    • Example comment to add above resources section:
    # Resource values tuned for [describe scenario]:
    # - Based on observed peak usage of [X] during [Y]
    # - CPU: 2000m request handles [Z] concurrent operations
    # - Memory: 4Gi handles [N] buckets with [M] objects

Priority 2 (Consider for Follow-up):

  1. Environment-Specific Values: Create Kustomize overlays:

    • overlays/dev/: Lower resource values for local development
    • overlays/prod/: Current higher values for production
  2. Monitoring: Add Prometheus metrics/alerts for:

    • MinIO CPU/memory usage vs. limits
    • Probe failure rates
    • Startup duration
  3. Documentation: Update docs/deployment/ with MinIO sizing guidance.


Overall Assessment: This PR appropriately addresses operational issues with MinIO startup reliability. The changes are infrastructure-focused and don't raise security or code quality concerns. The main recommendation is to document the justification for these specific resource values and complete the test plan before merging.

Suggested Action: Execute the test plan, add a comment documenting the rationale for resource values, then merge.


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@jeremyeder jeremyeder merged commit e9f3d45 into ambient-code:main Feb 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant