Skip to content

Conversation

@pec1985
Copy link
Contributor

@pec1985 pec1985 commented Sep 18, 2025

Summary by CodeRabbit

  • New Features

    • Interactive machine creation: prompts for cluster, provider, and region when not provided.
    • Added AWS support for cluster setup and machine provisioning, with guided fallback when local tooling is unavailable.
    • Dynamic region selection per provider across GCP, AWS, Azure, and VMware.
    • Post-creation setup now runs automatically using the newly created cluster.
  • Bug Fixes

    • Setup steps are more resilient; non-critical “skip” errors no longer halt execution.
    • Certain “already exists” responses are treated as success to avoid failures.
    • Improved messaging and error reporting during command execution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces provider-aware region selection; moves cluster infrastructure setup to post-creation using the new cluster ID; adds interactive machine creation prompts; extends the infrastructure interface with CreateMachine and integrates provider-specific machine setup, including a full AWS setup implementation; adjusts execution/spec skipping and command utilities; minor TUI formatting tweak.

Changes

Cohort / File(s) Summary of changes
CLI: Cluster flow and regions
cmd/cluster.go
Added providerRegions and getRegionsForProvider; region prompt now provider-driven; moved infrastructure.Setup to run after cluster creation using the new cluster ID; removed early os.Exit calls; JSON path unchanged; applied across providers (gcp, aws, azure, vmware).
CLI: Machine interactive create
cmd/machine.go
Enabled interactive machine creation when args are incomplete and TTY is present; prompts for cluster and region; added helper prompt functions; updated arg handling (MaximumNArgs(3)); errors in non-interactive mode when args missing; passes collected clusterID/provider/region to CreateMachine.
Infrastructure interface
internal/infrastructure/cluster.go
Extended ClusterSetup interface with CreateMachine(ctx, logger, region, token, clusterID) error.
Provider: AWS setup
internal/infrastructure/aws.go
Added AWS implementation with Setup and CreateMachine; environment assembly; JSON execution specs for IAM/VPC/SG/EC2 and secrets; execution context with detection of AWS tooling/auth; interactive fallback guidance; registered provider setup.
Provider: GCP stub
internal/infrastructure/gcp.go
Implemented no-op CreateMachine method on gcpSetup.
Infra orchestration: Machine hook
internal/infrastructure/infrastructure.go
After API machine creation, invokes provider setup CreateMachine; on failure, deletes created machine and returns error; otherwise returns creation data as before.
Spec execution behavior
internal/infrastructure/spec.go
Adjusted SkipIf error handling: non-ErrInvalidMatch errors now treated as “don’t skip” and non-fatal.
TUI tweak
internal/infrastructure/tui.go
Changed success rendering to ShowSuccess("%s", success).
Command execution utils
internal/infrastructure/util.go
Special-cased sh -c to avoid pipe parsing; enhanced runCommand to treat certain AWS “already exists” errors as success; improved error messages with command output; standardized trimmed output handling.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (cluster create)
  participant API as Control Plane API
  participant Infra as Infrastructure Setup

  User->>CLI: cluster create (provider, region)
  Note over CLI: Regions sourced via providerRegions
  CLI->>API: POST /cli/cluster (create)
  API-->>CLI: 201 { clusterID, ... }
  Note right of CLI: New flow: setup runs after ID available
  CLI->>Infra: Setup(ctx, logger, provider, region, clusterID, token)
  Infra-->>CLI: Setup result (success/error)
  CLI-->>User: Output (JSON or spinner result)
Loading
sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (machine create)
  participant API as Control Plane API
  participant Provider as Provider Setup

  alt Interactive (TTY, missing args)
    CLI->>CLI: Prompt for cluster
    CLI->>CLI: Derive provider, prompt for region
  else Non-interactive
    CLI->>User: Error: missing args
  end

  CLI->>API: POST /cli/machine (clusterID, provider, region)
  API-->>CLI: 201 { machineID, token, provider }
  opt Provider-specific setup
    CLI->>Provider: CreateMachine(ctx, logger, region, token, clusterID)
    alt Setup fails
      CLI->>API: DELETE /cli/machine (machineID)
      Provider-->>CLI: Error returned
      CLI-->>User: Error (includes provider setup failure)
    else Setup succeeds
      Provider-->>CLI: OK
      CLI-->>User: Machine created
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit taps keys in a meadow of code,
Regions now bloom by provider’s road.
Clusters get IDs before roots are laid,
Machines hop in with prompts well-played.
AWS burrows deep with scripted might—
If “already exists,” we nod, “alright.”
Carrots up—ship it tonight! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately highlights the primary new feature—an initial AWS provider integration—which is the core focus of the changeset and aligns with the substantial AWS-specific code additions in the pull request.

Comment @coderabbitai help to get the list of available commands and usage tips.

@pec1985
Copy link
Contributor Author

pec1985 commented Oct 3, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

# Conflicts:
#	cmd/cluster.go
#	go.mod
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
internal/infrastructure/cluster.go (1)

272-274: Restore the real cluster token when calling Setup.

We’re throwing away the API-provided bootstrap token by creating a new Cluster with Token: "". Providers like GCP pipe CLUSTER_TOKEN straight into their launch commands (e.g., --metadata=user-data={CLUSTER_TOKEN}), so this change now boots nodes without any credentials and they fail to register. Please pass the actual cluster returned from CreateCluster (or at least its real Token) into infrastructure.Setup.

-           if err := infrastructure.Setup(ctx, logger, &infrastructure.Cluster{ID: cluster.ID, Token: "", Provider: provider, Name: name, Type: size, Region: region}, format); err != nil {
+           if err := infrastructure.Setup(ctx, logger, cluster, format); err != nil {
cmd/machine.go (1)

325-332: Add structured and quiet output modes to machine-create to prevent token exposure

  • Avoid unconditionally printing resp.Token in cmd/machine.go (machineCreateCmd).
  • Add --format=json and --quiet flags to machine-create; emit token only in JSON or when --quiet is specified.
  • By default, print minimal info (ID only) or route token to stderr/mask it.
  • Confirm the returned token is one-time or short-lived.
cmd/cluster.go (1)

492-492: Help text lists unsupported provider "other".

Validator rejects "other" but help suggests it. Fix the flag help to avoid a broken UX.

Apply this diff:

-	clusterNewCmd.Flags().String("provider", "", "The infrastructure provider (gcp, aws, azure, vmware, other)")
+	clusterNewCmd.Flags().String("provider", "", "The infrastructure provider (gcp, aws, azure, vmware)")
🧹 Nitpick comments (9)
cmd/machine.go (5)

284-299: Place “create” under the management group; docs look good

Create is a mutating command and should live under the “management” group for consistency with remove. Suggest switching GroupID.

 var machineCreateCmd = &cobra.Command{
   Use:     "create [cluster_id] [provider] [region]",
-  GroupID: "info",
+  GroupID: "management",
   Short:   "Create a new machine for a cluster",
   Long: `Create a new machine for a cluster.

307-324: Handle partial args (1 or 2) + interactive prompts; don’t ignore provided cluster_id

Currently, any call with fewer than 3 args ignores provided values and re-prompts, which is surprising UX. Support 1- and 2-arg forms and only prompt for missing pieces. Error clearly in non-interactive mode.

-    var clusterID, provider, region string
-
-    // If all arguments provided, use them directly
-    if len(args) == 3 {
-      clusterID = args[0]
-      provider = args[1]
-      region = args[2]
-    } else if tui.HasTTY {
-      // Interactive mode - prompt for missing values
-      cluster := promptForClusterSelection(ctx, logger, apiUrl, apikey)
-      provider = cluster.Provider
-      region = promptForRegionSelection(ctx, logger, provider)
-      clusterID = cluster.ID
-    } else {
-      // Non-interactive mode - require all arguments
-      errsystem.New(errsystem.ErrMissingRequiredArgument, fmt.Errorf("cluster_id, provider, and region are required in non-interactive mode"), errsystem.WithContextMessage("Missing required arguments")).ShowErrorAndExit()
-    }
+    var clusterID, provider, region string
+    switch len(args) {
+    case 3:
+      clusterID, provider, region = args[0], args[1], args[2]
+    case 2:
+      clusterID, provider = args[0], args[1]
+      if tui.HasTTY {
+        region = promptForRegionSelection(ctx, logger, provider)
+      } else {
+        errsystem.New(errsystem.ErrMissingRequiredArgument, fmt.Errorf("region is required in non-interactive mode"), errsystem.WithContextMessage("Missing required arguments")).ShowErrorAndExit()
+      }
+    case 1:
+      clusterID = args[0]
+      // Resolve cluster to derive provider
+      var selected infrastructure.Cluster
+      tui.ShowSpinner(fmt.Sprintf("Resolving cluster %s...", clusterID), func() {
+        clusters, err := infrastructure.ListClusters(ctx, logger, apiUrl, apikey)
+        if err != nil {
+          errsystem.New(errsystem.ErrApiRequest, err, errsystem.WithContextMessage("Failed to resolve cluster")).ShowErrorAndExit()
+        }
+        for _, c := range clusters {
+          if c.ID == clusterID || c.Name == clusterID {
+            selected = c
+            break
+          }
+        }
+      })
+      if selected.ID == "" {
+        errsystem.New(errsystem.ErrInvalidArgumentProvided, fmt.Errorf("cluster %q not found", clusterID), errsystem.WithContextMessage("Invalid cluster")).ShowErrorAndExit()
+      }
+      provider = selected.Provider
+      if tui.HasTTY {
+        region = promptForRegionSelection(ctx, logger, provider)
+      } else {
+        errsystem.New(errsystem.ErrMissingRequiredArgument, fmt.Errorf("region is required in non-interactive mode"), errsystem.WithContextMessage("Missing required arguments")).ShowErrorAndExit()
+      }
+    default:
+      if tui.HasTTY {
+        cluster := promptForClusterSelection(ctx, logger, apiUrl, apikey)
+        provider = cluster.Provider
+        region = promptForRegionSelection(ctx, logger, provider)
+        clusterID = cluster.ID
+      } else {
+        errsystem.New(errsystem.ErrMissingRequiredArgument, fmt.Errorf("cluster_id, provider, and region are required in non-interactive mode"), errsystem.WithContextMessage("Missing required arguments")).ShowErrorAndExit()
+      }
+    }

395-401: Region selection: remove stray print and guard empty option set

Avoid extra stdout noise and fail fast if the provider has no regions configured.

 func promptForRegionSelection(ctx context.Context, logger logger.Logger, provider string) string {
   // Get regions for the provider (reuse the same logic from cluster.go)
-  fmt.Println("Provider:", provider)
   opts := getRegionsForProvider(provider)
+  if len(opts) == 0 {
+    errsystem.New(errsystem.ErrInvalidArgumentProvided, fmt.Errorf("no regions available for provider %q", provider), errsystem.WithContextMessage("Region selection failed")).ShowErrorAndExit()
+  }
   return tui.Select(logger, "Which region should we use?", "The region to deploy the machine", opts)
 }

Also verify getRegionsForProvider exists in this package; otherwise this won’t compile. See verification script below.


327-330: Use errsystem for consistency instead of logger.Fatal

Other paths use errsystem.New(...).ShowErrorAndExit(). Keep error handling consistent here.

-    resp, err := infrastructure.CreateMachine(ctx, logger, apiUrl, apikey, clusterID, orgId, provider, region)
-    if err != nil {
-      logger.Fatal("error creating machine: %s", err)
-    }
+    resp, err := infrastructure.CreateMachine(ctx, logger, apiUrl, apikey, clusterID, orgId, provider, region)
+    if err != nil {
+      errsystem.New(errsystem.ErrApiRequest, err, errsystem.WithContextMessage("Failed to create machine")).ShowErrorAndExit()
+    }

15-16: Import alias “logger” collides with local variable name “logger”

Local vars named logger shadow the imported package name; this is easy to confuse. Consider aliasing the package (e.g., logx) or renaming locals (e.g., log).

cmd/cluster.go (1)

85-93: Avoid silently defaulting to GCP regions for unknown providers.

Returning GCP options can confuse users and lead to wrong regions. Prefer empty options and fall back to a free-form input prompt.

internal/infrastructure/aws.go (3)

490-499: Remove stray fmt.Println(spec); avoid noisy output.

This prints the execution spec every run. Drop it or downgrade to debug logging.

Apply this diff:

 func getAWSClusterSpecification(envs map[string]any) string {
 	spec := awsClusterSpecification
-	fmt.Println(spec)
 	// Replace variables in the JSON string

234-239: Scope down IAM policy Resource to the current account.

Using arn ... :*:secret:... grants cross-account scope. Prefer the caller’s account ID.

As a follow-up, capture AWS_ACCOUNT_ID via sts get-caller-identity and template:

  • Add AWS_ACCOUNT_ID to envs in Setup.
  • Update the policy Resource to arn:aws:secretsmanager:{AWS_REGION}:{AWS_ACCOUNT_ID}:secret:{AWS_SECRET_NAME}*

188-196: Open SSH (0.0.0.0/0). Consider restricting.

Port 22 from anywhere is risky. Restrict to a CIDR input or make SSH optional.

Is global SSH ingress required for typical deployments, or can we scope it to the operator’s IP/CIDR?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecccac6 and a8b3747.

📒 Files selected for processing (9)
  • cmd/cluster.go (5 hunks)
  • cmd/machine.go (4 hunks)
  • internal/infrastructure/aws.go (1 hunks)
  • internal/infrastructure/cluster.go (1 hunks)
  • internal/infrastructure/gcp.go (1 hunks)
  • internal/infrastructure/infrastructure.go (1 hunks)
  • internal/infrastructure/spec.go (1 hunks)
  • internal/infrastructure/tui.go (1 hunks)
  • internal/infrastructure/util.go (3 hunks)
🔇 Additional comments (2)
cmd/machine.go (1)

325-326: Helper functions verified promptForClusterOrganization and getRegionsForProvider are defined in cmd/cluster.go with signatures matching their usage; no changes needed.

internal/infrastructure/aws.go (1)

321-333: Verify user-data semantics.

run-instances --user-data is passed the raw {CLUSTER_TOKEN}. Confirm your AMI expects this exact payload (not a script/cloud-init multipart). If not, encode or use file://.

Copy link
Member

@robindiddams robindiddams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me

@pec1985 pec1985 merged commit 8f6c6e3 into infra Oct 8, 2025
1 check passed
@pec1985 pec1985 deleted the infra-aws branch October 8, 2025 14:48
robindiddams added a commit that referenced this pull request Oct 24, 2025
* Add infra commands

* fixes

* more visual improvements

* reorder org

* wip changes

* wip changes

* fixes

* machine create

* use ecdsa

* more cleanup

* clean up go mod 🧹

* fix thing

* AWS initial implementation (#448)

* AWS initil implementation

* more work on getting aws working from scratch

* refactor awsSpecification commands

* More work on gettting AWS working

* cluster create and machine create commands fixes

* checks if clustering is enabled for the authenticated user

* remove debug logs

* Small nits from Coder Rabbit

* cleanup

* fix create image command for aws

* latest

* indirect vuln

* fixes

* more fixes

* errrors!

* make the clipboard mode work better

* hidden

* some feedback

---------

Co-authored-by: Robin Diddams <robindiddams@gmail.com>
Co-authored-by: Pedro Enrique <pedro.tma@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants