Add specification and implementation plan for external Kubernetes deployments#11644
Add specification and implementation plan for external Kubernetes deployments#11644zachcasper wants to merge 5 commits into
Conversation
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a set of design/plan documents for enabling Radius recipe execution against external Kubernetes clusters (EKS and AKS), including the proposed environment schema/API additions (target, clusterType, clusterName) and a phased implementation task plan.
Changes:
- Added feature spec covering user stories, requirements, and validation rules for external cluster targeting.
- Added implementation plan + research notes describing the intended EKS/AKS kubeconfig acquisition approach and impacted code areas.
- Added API contract examples, a quickstart, and a task breakdown/checklist for executing the work.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| specs/003-external-k8s-deploy/spec.md | Defines user stories, requirements, validation rules, and scope for external EKS/AKS targeting |
| specs/003-external-k8s-deploy/plan.md | Implementation plan, impacted components, and project structure for the feature |
| specs/003-external-k8s-deploy/research.md | Technical research on EKS token generation, AKS credential flow, Terraform/Bicep integration |
| specs/003-external-k8s-deploy/data-model.md | Proposed data model changes + validation rules + flow diagram |
| specs/003-external-k8s-deploy/contracts/environments-api.md | API contract changes and example payloads/error responses |
| specs/003-external-k8s-deploy/quickstart.md | Step-by-step usage guide for EKS/AKS scenarios |
| specs/003-external-k8s-deploy/tasks.md | Phased work plan with explicit file-level tasks |
| specs/003-external-k8s-deploy/checklists/requirements.md | Spec quality checklist for readiness |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Zach Casper <zachcasper@microsoft.com>
Signed-off-by: Zach Casper <zachcasper@microsoft.com>
|
|
||
| **Acceptance Scenarios**: | ||
|
|
||
| 1. **Given** an environment with `providers.kubernetes.target = external`, `providers.kubernetes.clusterType = eks`, `providers.kubernetes.clusterName = my-eks-cluster`, and valid AWS credentials registered, **When** a recipe that creates a Kubernetes ConfigMap is executed, **Then** the ConfigMap is created in the specified namespace on the external EKS cluster regardless of recipe engine. |
There was a problem hiding this comment.
Shouldn't this config be part of AWS provider now that we are deploying to an external cluster in AWS? It feels fragmented to configure the eks cluster with Kubernetes provider and AWS account/ region with AWS provider.
|
|
||
| As a platform engineer, I want to configure a Radius environment to deploy workloads to an external Amazon EKS cluster so that Radius can manage applications on clusters other than the one it is installed on. | ||
|
|
||
| I configure my environment's Kubernetes provider with `target: external`, `clusterType: eks`, and the EKS cluster name. Radius uses the existing registered AWS credentials along with the AWS region from `providers.aws.region` to obtain a kubeconfig for the target EKS cluster before executing any recipe (Terraform or Bicep). All Kubernetes resources created by the recipe land on the external cluster. |
There was a problem hiding this comment.
Do we worry only about the recipes? UCP directly talks to AWS today using cloud control APIs too. Would we want this enhanced too or do we plan to deprecate this ability similar to applications RP?
| 3. **Given** an environment with `providers.kubernetes.clusterType = eks` and `providers.kubernetes.clusterName` set to a non-existent cluster, **When** a recipe is executed, **Then** the operation fails with a clear error indicating the cluster was not found. | ||
| 4. **Given** `providers.kubernetes.target = external` and `clusterType = eks` without `clusterName`, **When** the environment is created, **Then** validation fails stating `clusterName` is required. | ||
| 5. **Given** `providers.kubernetes.clusterType = eks` without a corresponding `providers.aws` configuration, **When** the environment is created, **Then** validation fails stating that AWS provider configuration is required for EKS clusters. | ||
|
|
There was a problem hiding this comment.
With this ability, would we worry about a radius instance local to eks simultaneously managing the same application in additional to radius on kind/k3d or is that out of scope?
| - Only one AWS credential and one Azure credential are supported (named `"default"` per the current design). Multi-credential support is a future enhancement. | ||
| - The EKS kubeconfig acquisition follows the same mechanism as `aws eks update-kubeconfig` — using AWS STS to generate a bearer token for the cluster's authentication endpoint. | ||
| - The AKS kubeconfig acquisition uses user credentials (`listClusterUserCredential`) combined with Entra ID (AAD) token acquisition using the registered Azure service principal or workload identity. This is equivalent to `az aks get-credentials` followed by `kubelogin convert-kubeconfig --login spn`. | ||
| - The target namespace (`providers.kubernetes.namespace`) is expected to already exist on the external cluster. Radius does not auto-create namespaces on external clusters. |
There was a problem hiding this comment.
I think we do auto-create namespaces by default, but we might want to change this behavior
There was a problem hiding this comment.
I see no evidence of creating namespaces in the Containers recipe. https://github.com/radius-project/resource-types-contrib/blob/main/Compute/containers/recipes/kubernetes/terraform/main.tf
|
|
||
| As an existing Radius user, I want my current environments to continue working without modification so that the external cluster feature does not break my existing workflows. | ||
|
|
||
| When `providers.kubernetes.target` is omitted or set to `current`, Radius behaves exactly as it does today: recipes execute against the local/in-cluster Kubernetes using the existing kubeconfig resolution. |
There was a problem hiding this comment.
nit: I don't really like current as this keyword. maybe local or something else is better
There was a problem hiding this comment.
All are bad. Kubernetes has no concept of a name or a symbol that references a cluster outside of the kube context. The closest to a reference is when connecting to the API server from within the cluster, the URL is https://kubernetes.default.svc.cluster.local. I worry about local referring to the user's local workstation. Possibly cluster.local is better.
|
|
||
| ### Functional Requirements | ||
|
|
||
| - **FR-001**: The `providers.kubernetes` model MUST support a new `target` property with allowed values `current` and `external`. When omitted, the default MUST be `current`. |
There was a problem hiding this comment.
Would like to discuss the naming of the target property and whether current and external are good choices.
|
|
||
| ### User Story 2 - Deploy to an External AKS Cluster (Priority: P1) | ||
|
|
||
| As a platform engineer, I want to configure a Radius environment to deploy workloads to an external Azure AKS cluster so that I can manage applications across multiple Azure-hosted clusters from a single Radius installation. |
There was a problem hiding this comment.
Why aren't we supporting external deployments across regions and resource groups? Are we scoping Azure resources to the resource group and AWS resources to the region (both denoted in the providers block) because of the scope of the credentials?
There was a problem hiding this comment.
Is this a question relative to deploying to external Kubernetes clusters or a question in general about environments? Environments today are scoped to an AWS {accountId, region} or an Azure {subscription, resource group}.
| **Acceptance Scenarios**: | ||
|
|
||
| 1. **Given** an existing environment with only `providers.kubernetes.namespace` set (no `target` property), **When** a recipe is executed, **Then** it deploys to the local cluster exactly as it does today. | ||
| 2. **Given** an environment with `providers.kubernetes.target = current`, **When** a recipe is executed, **Then** it deploys to the local cluster. |
There was a problem hiding this comment.
local sounds more intuitive than current to me
|
|
||
| **Acceptance Scenarios**: | ||
|
|
||
| 1. **Given** an environment with `providers.kubernetes.target = external`, `providers.kubernetes.clusterType = eks`, `providers.kubernetes.clusterName = my-eks-cluster`, and valid AWS credentials registered, **When** a recipe that creates a Kubernetes ConfigMap is executed, **Then** the ConfigMap is created in the specified namespace on the external EKS cluster regardless of recipe engine. |
There was a problem hiding this comment.
What do we mean by recipe engine here?
There was a problem hiding this comment.
Bicep or Terraform.
There was a problem hiding this comment.
Can you make that explicit here? Recipe engine is agnostic to how the recipe is getting deployed.
There was a problem hiding this comment.
It might be helpful to diagram where the credentials are set and where they flow. For example, there is going to be an existing credential configured on the client machine in the local kubectl context, but there will also be credentials configured in the cluster. Those creds are for the current cluster and cloud environment.
This spec is adding new credentials for an external cloud environment and cluster, using "existing registered AWS credentials along with the AWS region", but where do those come from and how are they set?
There was a problem hiding this comment.
Thinking about this more, we really need an overall architecture diagram that depicts where credentials exist in the current version of Radius, plus the planed delta in this PR.
There was a problem hiding this comment.
I assume that there is no restriction on AWS clusters deploying to Azure clusters and vice versa, but I want to ask the question in case this is an invalid assumption.
We should add this scenario to the acceptance criteria.
| - What happens when the external cluster's API server is temporarily unreachable? Radius returns a clear connectivity error rather than a generic failure. | ||
| - What happens when the dynamically-obtained kubeconfig token expires mid-recipe-execution? A fresh kubeconfig is obtained per recipe execution. For EKS tokens (~15 min validity), this is sufficient for most recipes. Token refresh during execution is out of scope. | ||
| - What happens when the external cluster's namespace specified in `providers.kubernetes.namespace` does not exist? Radius reports a clear error about the missing namespace. | ||
| - What happens when both AWS and Azure providers are configured but `clusterType` is `eks`? Only the AWS credentials are used for kubeconfig acquisition; the Azure provider is used for any Azure-targeted resources in the recipe, not for Kubernetes access. |
There was a problem hiding this comment.
Only the AWS credentials are used for kubeconfig acquisition; the Azure provider is used for any Azure-targeted resources in the recipe, not for Kubernetes access.
Could you say more on the user scenario for this one?
| ## Requirements *(mandatory)* | ||
|
|
||
| ### Functional Requirements | ||
|
|
There was a problem hiding this comment.
Would like to discuss this:
kubernetes: {
namespace: 'my-app'
target: 'external'
clusterType: 'eks'
clusterName: 'my-eks-cluster'
}
and whether something like this would be better for the user:
kubernetes: {
namespace: 'my-app'
target: {
type: 'external'
clusterType: 'eks'
clusterName: 'my-eks-cluster'
}
}
| target: 'external' | ||
| clusterType: 'eks' |
There was a problem hiding this comment.
Is clusterType required if target is current? If not, then could we merge the two properties? For example, target: current would mean local cluster, target: eks would be external eks
Co-authored-by: Karishma Chawla <kachawla@microsoft.com> Signed-off-by: Zach Casper <zachcasper@microsoft.com>
|
In today's review, the general consensus was:
Given this, one possibility is to model the environment as: In the future, this can be generalized to support generic Kubernetes cluster with the addition of a Kubernetes credential resource such as: |
This pull request introduces the foundational design documents and implementation plan for supporting deployment to external Kubernetes clusters (EKS and AKS) in Radius. It defines the new data model, validation rules, API contract, and outlines the required code changes and project structure. The documentation covers how users can configure environments and credentials to target external clusters, and provides a quickstart guide for both EKS and AKS scenarios.
The most important changes are:
Data Model & API Changes
ProvidersKubernetesmodel with new fields:target,clusterType, andclusterName, including detailed validation rules and cross-entity dependencies.Radius.Core/environmentsto support these new fields, with examples and validation error responses for incorrect configurations.Implementation Plan & Project Structure
pkg/kubernetes/kubeconfig/package for cloud-specific kubeconfig acquisition.User-Facing Documentation
Type of change
Fixes: Manage applications in multiple environments on separate Kubernetes clusters #6934
Contributor checklist
Please verify that the PR meets the following requirements, where applicable: