Skip to content

Add specification and implementation plan for external Kubernetes deployments#11644

Open
zachcasper wants to merge 5 commits into
radius-project:mainfrom
zachcasper:003-external-k8s-deploy
Open

Add specification and implementation plan for external Kubernetes deployments#11644
zachcasper wants to merge 5 commits into
radius-project:mainfrom
zachcasper:003-external-k8s-deploy

Conversation

@zachcasper
Copy link
Copy Markdown
Contributor

This pull request introduces the foundational design documents and implementation plan for supporting deployment to external Kubernetes clusters (EKS and AKS) in Radius. It defines the new data model, validation rules, API contract, and outlines the required code changes and project structure. The documentation covers how users can configure environments and credentials to target external clusters, and provides a quickstart guide for both EKS and AKS scenarios.

The most important changes are:

Data Model & API Changes

  • Extended the ProvidersKubernetes model with new fields: target, clusterType, and clusterName, including detailed validation rules and cross-entity dependencies.
  • Updated the API contract for Radius.Core/environments to support these new fields, with examples and validation error responses for incorrect configurations.

Implementation Plan & Project Structure

  • Added an implementation plan detailing technical context, constitution checks, affected code areas, and the introduction of a new pkg/kubernetes/kubeconfig/ package for cloud-specific kubeconfig acquisition.

User-Facing Documentation

  • Provided a quickstart guide with step-by-step instructions for deploying to external EKS and AKS clusters, covering credential registration, environment creation, and verification.
  • Added a specification quality checklist to ensure requirements are complete, clear, and ready for planning.# Description

Type of change

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document PR is created in the design-notes repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for the samples repository is created, if existing samples are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable
  • A PR for the recipes repository is created, if existing recipes are affected by the changes in this PR.
    • Yes
    • Not applicable

Signed-off-by: Zach Casper <zachcasper@microsoft.com>
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
Copilot AI review requested due to automatic review settings April 14, 2026 18:59
@zachcasper zachcasper requested review from a team as code owners April 14, 2026 18:59
@zachcasper zachcasper requested a deployment to external-contributor-approval April 14, 2026 18:59 — with GitHub Actions Waiting
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a set of design/plan documents for enabling Radius recipe execution against external Kubernetes clusters (EKS and AKS), including the proposed environment schema/API additions (target, clusterType, clusterName) and a phased implementation task plan.

Changes:

  • Added feature spec covering user stories, requirements, and validation rules for external cluster targeting.
  • Added implementation plan + research notes describing the intended EKS/AKS kubeconfig acquisition approach and impacted code areas.
  • Added API contract examples, a quickstart, and a task breakdown/checklist for executing the work.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
specs/003-external-k8s-deploy/spec.md Defines user stories, requirements, validation rules, and scope for external EKS/AKS targeting
specs/003-external-k8s-deploy/plan.md Implementation plan, impacted components, and project structure for the feature
specs/003-external-k8s-deploy/research.md Technical research on EKS token generation, AKS credential flow, Terraform/Bicep integration
specs/003-external-k8s-deploy/data-model.md Proposed data model changes + validation rules + flow diagram
specs/003-external-k8s-deploy/contracts/environments-api.md API contract changes and example payloads/error responses
specs/003-external-k8s-deploy/quickstart.md Step-by-step usage guide for EKS/AKS scenarios
specs/003-external-k8s-deploy/tasks.md Phased work plan with explicit file-level tasks
specs/003-external-k8s-deploy/checklists/requirements.md Spec quality checklist for readiness

Comment thread specs/003-external-k8s-deploy/quickstart.md Outdated
Comment thread specs/003-external-k8s-deploy/spec.md Outdated
Comment thread specs/003-external-k8s-deploy/plan.md Outdated
Comment thread specs/003-external-k8s-deploy/quickstart.md Outdated
Comment thread specs/003-external-k8s-deploy/tasks.md Outdated
Comment thread specs/003-external-k8s-deploy/checklists/requirements.md
Comment thread specs/003-external-k8s-deploy/spec.md Outdated
Comment thread specs/003-external-k8s-deploy/spec.md Outdated
Comment thread specs/003-external-k8s-deploy/research.md Outdated
Comment thread specs/003-external-k8s-deploy/data-model.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
@zachcasper zachcasper requested a deployment to external-contributor-approval April 15, 2026 19:55 — with GitHub Actions Waiting
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
@zachcasper zachcasper requested a deployment to external-contributor-approval April 15, 2026 20:05 — with GitHub Actions Waiting
Comment thread specs/003-external-k8s-deploy/spec.md

**Acceptance Scenarios**:

1. **Given** an environment with `providers.kubernetes.target = external`, `providers.kubernetes.clusterType = eks`, `providers.kubernetes.clusterName = my-eks-cluster`, and valid AWS credentials registered, **When** a recipe that creates a Kubernetes ConfigMap is executed, **Then** the ConfigMap is created in the specified namespace on the external EKS cluster regardless of recipe engine.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this config be part of AWS provider now that we are deploying to an external cluster in AWS? It feels fragmented to configure the eks cluster with Kubernetes provider and AWS account/ region with AWS provider.


As a platform engineer, I want to configure a Radius environment to deploy workloads to an external Amazon EKS cluster so that Radius can manage applications on clusters other than the one it is installed on.

I configure my environment's Kubernetes provider with `target: external`, `clusterType: eks`, and the EKS cluster name. Radius uses the existing registered AWS credentials along with the AWS region from `providers.aws.region` to obtain a kubeconfig for the target EKS cluster before executing any recipe (Terraform or Bicep). All Kubernetes resources created by the recipe land on the external cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we worry only about the recipes? UCP directly talks to AWS today using cloud control APIs too. Would we want this enhanced too or do we plan to deprecate this ability similar to applications RP?

3. **Given** an environment with `providers.kubernetes.clusterType = eks` and `providers.kubernetes.clusterName` set to a non-existent cluster, **When** a recipe is executed, **Then** the operation fails with a clear error indicating the cluster was not found.
4. **Given** `providers.kubernetes.target = external` and `clusterType = eks` without `clusterName`, **When** the environment is created, **Then** validation fails stating `clusterName` is required.
5. **Given** `providers.kubernetes.clusterType = eks` without a corresponding `providers.aws` configuration, **When** the environment is created, **Then** validation fails stating that AWS provider configuration is required for EKS clusters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this ability, would we worry about a radius instance local to eks simultaneously managing the same application in additional to radius on kind/k3d or is that out of scope?

- Only one AWS credential and one Azure credential are supported (named `"default"` per the current design). Multi-credential support is a future enhancement.
- The EKS kubeconfig acquisition follows the same mechanism as `aws eks update-kubeconfig` — using AWS STS to generate a bearer token for the cluster's authentication endpoint.
- The AKS kubeconfig acquisition uses user credentials (`listClusterUserCredential`) combined with Entra ID (AAD) token acquisition using the registered Azure service principal or workload identity. This is equivalent to `az aks get-credentials` followed by `kubelogin convert-kubeconfig --login spn`.
- The target namespace (`providers.kubernetes.namespace`) is expected to already exist on the external cluster. Radius does not auto-create namespaces on external clusters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do auto-create namespaces by default, but we might want to change this behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread specs/003-external-k8s-deploy/spec.md

As an existing Radius user, I want my current environments to continue working without modification so that the external cluster feature does not break my existing workflows.

When `providers.kubernetes.target` is omitted or set to `current`, Radius behaves exactly as it does today: recipes execute against the local/in-cluster Kubernetes using the existing kubeconfig resolution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't really like current as this keyword. maybe local or something else is better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All are bad. Kubernetes has no concept of a name or a symbol that references a cluster outside of the kube context. The closest to a reference is when connecting to the API server from within the cluster, the URL is https://kubernetes.default.svc.cluster.local. I worry about local referring to the user's local workstation. Possibly cluster.local is better.


### Functional Requirements

- **FR-001**: The `providers.kubernetes` model MUST support a new `target` property with allowed values `current` and `external`. When omitted, the default MUST be `current`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to discuss the naming of the target property and whether current and external are good choices.


### User Story 2 - Deploy to an External AKS Cluster (Priority: P1)

As a platform engineer, I want to configure a Radius environment to deploy workloads to an external Azure AKS cluster so that I can manage applications across multiple Azure-hosted clusters from a single Radius installation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we supporting external deployments across regions and resource groups? Are we scoping Azure resources to the resource group and AWS resources to the region (both denoted in the providers block) because of the scope of the credentials?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a question relative to deploying to external Kubernetes clusters or a question in general about environments? Environments today are scoped to an AWS {accountId, region} or an Azure {subscription, resource group}.

**Acceptance Scenarios**:

1. **Given** an existing environment with only `providers.kubernetes.namespace` set (no `target` property), **When** a recipe is executed, **Then** it deploys to the local cluster exactly as it does today.
2. **Given** an environment with `providers.kubernetes.target = current`, **When** a recipe is executed, **Then** it deploys to the local cluster.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local sounds more intuitive than current to me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


**Acceptance Scenarios**:

1. **Given** an environment with `providers.kubernetes.target = external`, `providers.kubernetes.clusterType = eks`, `providers.kubernetes.clusterName = my-eks-cluster`, and valid AWS credentials registered, **When** a recipe that creates a Kubernetes ConfigMap is executed, **Then** the ConfigMap is created in the specified namespace on the external EKS cluster regardless of recipe engine.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we mean by recipe engine here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bicep or Terraform.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make that explicit here? Recipe engine is agnostic to how the recipe is getting deployed.

Copy link
Copy Markdown
Member

@brooke-hamilton brooke-hamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to diagram where the credentials are set and where they flow. For example, there is going to be an existing credential configured on the client machine in the local kubectl context, but there will also be credentials configured in the cluster. Those creds are for the current cluster and cloud environment.

This spec is adding new credentials for an external cloud environment and cluster, using "existing registered AWS credentials along with the AWS region", but where do those come from and how are they set?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, we really need an overall architecture diagram that depicts where credentials exist in the current version of Radius, plus the planed delta in this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that there is no restriction on AWS clusters deploying to Azure clusters and vice versa, but I want to ask the question in case this is an invalid assumption.

We should add this scenario to the acceptance criteria.

- What happens when the external cluster's API server is temporarily unreachable? Radius returns a clear connectivity error rather than a generic failure.
- What happens when the dynamically-obtained kubeconfig token expires mid-recipe-execution? A fresh kubeconfig is obtained per recipe execution. For EKS tokens (~15 min validity), this is sufficient for most recipes. Token refresh during execution is out of scope.
- What happens when the external cluster's namespace specified in `providers.kubernetes.namespace` does not exist? Radius reports a clear error about the missing namespace.
- What happens when both AWS and Azure providers are configured but `clusterType` is `eks`? Only the AWS credentials are used for kubeconfig acquisition; the Azure provider is used for any Azure-targeted resources in the recipe, not for Kubernetes access.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the AWS credentials are used for kubeconfig acquisition; the Azure provider is used for any Azure-targeted resources in the recipe, not for Kubernetes access.

Could you say more on the user scenario for this one?

## Requirements *(mandatory)*

### Functional Requirements

Copy link
Copy Markdown
Contributor

@willtsai willtsai May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to discuss this:

      kubernetes: {
        namespace: 'my-app'
        target: 'external'
        clusterType: 'eks'
        clusterName: 'my-eks-cluster'
      }

and whether something like this would be better for the user:

      kubernetes: {
        namespace: 'my-app'
        target: {
          type: 'external'
          clusterType: 'eks'
          clusterName: 'my-eks-cluster'
        }
      }

Comment thread specs/003-external-k8s-deploy/quickstart.md
Comment thread specs/003-external-k8s-deploy/spec.md Outdated
Comment on lines +49 to +50
target: 'external'
clusterType: 'eks'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is clusterType required if target is current? If not, then could we merge the two properties? For example, target: current would mean local cluster, target: eks would be external eks

Co-authored-by: Karishma Chawla <kachawla@microsoft.com>
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
@zachcasper
Copy link
Copy Markdown
Contributor Author

zachcasper commented May 6, 2026

In today's review, the general consensus was:

  1. We should not mix cloud provider details and Kubernetes cluster details; e.g. if clusterName is an AWS/Azure name then it should be in the AWS/Azure block.
  2. current is ambiguous and local is preferred.

Given this, one possibility is to model the environment as:

resource env 'Radius.Core/environments@2025-08-01-preview' = {
  name: 'my-radius-env'
  properties: {
    providers: {
      aws: {
        accountId: '<AWS_ACCOUNT_ID>'
        region: '<AWS_REGION>'
        eksClusterName: '<EKS_CLUSTER_NAME>'
      }
      // Can only specify aws or azure block
      azure: {
        subscriptionId: '<SUBSCRIPTION_ID>'
        resourceGroupName: '<RESOURCE_GROUP_NAME>'
        aksClusterName: '<AKS_CLUSTER_NAME>'
      }
      kubernetes: {
        namespace: '<KUBERNETES_NAMESPACE>'
      }
    }
  }
}

In the future, this can be generalized to support generic Kubernetes cluster with the addition of a Kubernetes credential resource such as:

resource env 'Radius.Core/environments@2025-08-01-preview' = {
  name: 'my-radius-env'
  properties: {
    providers: {
      kubernetes: {
        apiServerUrl: 'https://kubernetes.default.svc.cluster.local'
        namespace: '<KUBERNETES_NAMESPACE>'
        secretName: '<SECRET_NAME>'
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manage applications in multiple environments on separate Kubernetes clusters

10 participants