feat: pivot psql-stack to EKS Auto Mode defaults by patrickleet · Pull Request #10 · hops-ops/psql-stack

patrickleet · 2026-04-28T04:49:43Z

Summary

Pivots the stack from StackGres → CNPG → Mayastor → Longhorn → EKS Auto Mode primitives. The journey is preserved on feat/cnpg-pivot (Mayastor) and feat/longhorn-pivot (Longhorn V2) checkpoint branches.

The end state is a dramatically slimmer stack that runs cleanly on EKS Auto Mode without dedicated NodePools, custom AMIs, or in-cluster storage clusters. Composes 4 things:

CloudNativePG operator (Helm)
Atlas operator (Helm) — declarative schema migrations
cnpg-i-scale-to-zero plugin (set of 9 Crossplane Objects)
A psql VolumeSnapshotClass (driver: ebs.csi.eks.amazonaws.com) for PSQLBranch's CoW fork target

PSQLClusters target whatever StorageClass the cluster already provides (gp3 on Auto Mode); the stack stops opining about SCs.

Why the pivot

Mayastor — chart hardcodes kubernetes.io/arch: amd64 on the io_engine DaemonSet; broken on arm64 Graviton (which we use for cost-optimal NVMe instance types). Also brings ~4 vCPU + 6 GiB platform tax (etcd×3, NATS, minio, loki, alloy) that duplicates other stacks.
Longhorn V2 — even with V2 enabled, longhorn-manager env-checks for iscsiadm at startup unconditionally. Bottlerocket (EKS Auto Mode's OS) has an immutable rootfs — no path to install iscsiadm. Verified live on pat-local: manager CrashLoopBackOff with the env-check failure.
Auto Mode primitives — gp3 SC + EBS CSI driver + composed VolumeSnapshotClass. AWS-blessed, zero node-side prep, no in-cluster storage cluster. Tradeoff: branch volumes are full-cost EBS clones rather than CoW deltas; manageable with TTL-bounded ephemeral branches at the typical Vercel-like multi-tenant shape. Revisit Longhorn-on-self-managed-nodes if/when CoW economics warrant it (separate `aws-storage-stack` planned).

Schema diff

Drops:

`spec.nodePool` (NodePool sub-pools removed — Auto Mode handles node provisioning)
`spec.nodePrep` (DaemonSet removed — no host-level setup needed)
`spec.storage.{chartVersion, namespace, replicationFactor, thin, values, ...}` (storage cluster gone)

Adds:

`spec.snapshotClass.{enabled, name, driver, deletionPolicy, parameters}` — minimal config for the composed VSC

XRD shrinks ~290 → ~170 lines.

New dependency

`volume-snapshot-stack` — installs the cluster-wide snapshot-controller via the piraeus-charts Helm chart. EKS Auto Mode ships the snapshot CRDs but not the controller; without it our composed VSC is inert.

Test plan

`make render` — clean output for all examples
`make test` — 11/11 KCL render tests passing
`hops config install` builds + pushes the package to colima
Live install on pat-local (EKS Auto Mode + Bottlerocket + arm64 Graviton): CNPG + Atlas + S2Z all 1/1 Ready, composed VSC present
Smoke test: PVC bound on `ebs.csi.eks.amazonaws.com` → VolumeSnapshot via composed `psql` VSC reached `readyToUse=true` with backing EBS snapshot

Suggested merge style

Squash — the 16 commits on this branch include the StackGres→CNPG→Mayastor→Longhorn→Auto Mode journey; the experiment branches are pushed (`feat/cnpg-pivot`, `feat/longhorn-pivot`) for historical reference. Squash gives main a single clean pivot commit.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added PSQLCluster composite for managed PostgreSQL cluster deployment with HA, scale-to-zero, and monitoring support
- Added PSQLBranch composite for creating ephemeral database forks via volume snapshots
- Introduced CloudNativePG-based platform stack replacing prior StackGres architecture
- Added scale-to-zero plugin for automatic idle cost reduction
- Added comprehensive example configurations for local, minimal, and production deployments
Documentation
- Updated README with new CloudNativePG architecture, design, and prerequisites

Add opt-in Karpenter NodePool composed resource. When spec.nodePool.enabled: true, renders a NodePool targeting arm64 spot on r7g.large/r7g.xlarge/m7g.large/m7g.xlarge (memory-optimized Graviton), tainted with psql=true:NoSchedule and labeled workload-type: psql. StackGres (operator/restapi/jobs) and Atlas get nodeSelector + tolerations injected into their Helm values when the NodePool is enabled. Usages pin both releases to be drained before the NodePool is deleted. Adds provider-kubernetes to upbound.yaml for the NodePool Object wrapper. Implements [[tasks/psql-stack-vela-simplyblock]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- storageClass (default on): creates a gp3 StorageClass backed by the EKS Auto Mode EBS CSI driver (ebs.csi.eks.amazonaws.com). The legacy gp2 in-tree provisioner does not work on EKS Auto Mode. - externalSecrets (opt-in): for each entry in externalSecrets.secrets[], composes a kubernetes.m.crossplane.io/Object wrapping an ESO ExternalSecret that syncs an AWS Secrets Manager value (published via hops secrets sync aws) into a Kubernetes Secret on the target cluster. Requires a ClusterSecretStore (e.g. from SecretStack); defaults to clusterSecretStoreName: hops-aws-secrets-manager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors observe stack where StorageClasses are named loki/prometheus/tempo per-component. Keeps the name specific to the stack so it doesn't collide with cluster-provided defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the generic externalSecrets.secrets[] passthrough with a Postgres-specific connections[] API. The user publishes just a password via hops secrets (single JSON key, default 'password'); the stack combines it with non-secret host/port/database/username/sslmode/namespace and emits a K8s Secret with a ready-to-use 'url' key plus discrete fields. Downstream consumers reference whichever key they need: AtlasSchema.devURLFrom → url SGCluster credentials.users.superuser.password → password applications → url (or discrete fields) Breaking change to the (locally-only) externalSecrets API; redeploy with the new shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without managing the SGCluster, every field in externalSecrets.connections[] (host/port/database/namespace) is passthrough — no abstraction. Users write ExternalSecret CRs directly (or as a Crossplane Object wrapper in local/) against the ClusterSecretStore provisioned by SecretStack. PSQLStack now = platform only: StackGres + Atlas operators + NodePool + StorageClass. If we later add an instances[] that composes SGClusters, ESO wiring can come back for free since the stack will then know host/port/database/namespace without the user restating them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rewrites PSQLStack schema to remove stackgresOperator, add cnpg and scaleToZeroPlugin blocks. Default namespace: cnpg-system. CNPG 1.29 (chart 0.27.1) replaces StackGres 1.18 as the operator. Atlas operator renumbered 210 → 220 to make room for the scale-to-zero plugin install (added in a later phase). Storage (psql StorageClass on EBS gp3) and NodePool blocks preserved unchanged — phases 2 and 3 will rewrite them for the three-profile (mayastor / lvm / ebs) storage model and the branches/primary NodePool split with hugepages + nvme-tcp pre-configured. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the single storageClass block with a storage block exposing three independent profiles: - storage.mayastor (replicated NVMe-oF via OpenEBS Mayastor) — enterprise default for primary serving clusters; CoW + HA across N replicas. Default enabled=false until phase 3 lands the NodePool with hugepages + nvme-tcp. - storage.lvm (single-node CoW via OpenEBS LVM LocalPV) — branches and dev clusters. Default enabled=false until phase 3 lands NodePool LVM volume groups. - storage.ebs (EBS gp3 via EKS Auto Mode CSI) — durable fallback, no CoW; always-on default. Render templates: - 120-storageclass.yaml.gotmpl deleted (was the single 'psql' SC) - 160-openebs-lvm.yaml.gotmpl: Helm release for OpenEBS LVM LocalPV - 165-openebs-mayastor.yaml.gotmpl: Helm release for OpenEBS Mayastor - 170-storageclass-mayastor.yaml.gotmpl: psql-mayastor SC + VolumeSnapshotClass - 175-storageclass-lvm.yaml.gotmpl: psql-lvm SC + VolumeSnapshotClass - 180-storageclass-ebs.yaml.gotmpl: psql-ebs SC (renamed from 'psql') state-init defaults the three profile blocks; state-status observes the new resource keys. Mayastor + LVM Helm releases + their StorageClasses are gated on storage.{mayastor,lvm}.enabled — only EBS materializes by default until phase 3 unblocks the others. standard.yaml example patched to drop the removed storageClass and stackgresOperator fields (full example rewrite is phase 5). Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the single Karpenter NodePool with two sub-pools targeting NVMe arm64 instance-store nodes (i4g.2xlarge / i4g.4xlarge / im4gn.2xlarge): - nodePool.branches: spot — for ephemeral PSQLBranch workloads. Spot is acceptable since branches are reproducible. - nodePool.primary: on-demand — for PSQLCluster primaries and operators (CNPG, Atlas, scale-to-zero). Spot preemption would lose a Mayastor replica, so on-demand is the right default for serving workloads. Each sub-pool has its own labels (sub-pool=branches | sub-pool=primary) and matching taints so workloads can target one specifically. Operators ride the primary sub-pool via nodeSelector + tolerations injected from state-init. Render templates: - 150-nodepool.yaml.gotmpl deleted - 140-nodepool-branches.yaml.gotmpl: spot sub-pool - 145-nodepool-primary.yaml.gotmpl: on-demand sub-pool + Usage protection for CNPG/Atlas Releases against premature NodePool deletion. state-init defaults the new sub-pool blocks; state-status observes both. nodePool.enabled stays default-false — existing claims unchanged. Out of scope for this commit: node-side prep for Mayastor + LVM (hugepages, nvme-tcp module, LVM volume group on instance-store NVMe). That's phase 3b (a separate concern that needs careful image / runtime choices) — without it, mayastor.enabled and lvm.enabled won't bind PVCs. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Inlines the upstream cnpg-i-scale-to-zero v0.1.7 release manifest as 9 Crossplane Kubernetes Objects: - ServiceAccount cnpg-scale-to-zero-plugin - ClusterRole cnpg-scale-to-zero-sidecar-role + ClusterRoleBinding - Secret scale-to-zero-config (sidecar image reference, paired to plugin version via stringData — k8s base64-encodes on apply) - Self-signed cert-manager Issuer + 2 Certificates (server + client) for the gRPC TLS material the CNPG operator uses to reach the plugin - Service scale-to-zero (cnpg.io/pluginPort=9090, cnpg.io/pluginName annotations) - Deployment scale-to-zero (the plugin gRPC server) Plugin and sidecar images both pin to spec.scaleToZeroPlugin.version (default v0.1.7). Secret is renamed scale-to-zero-config (was scale-to-zero-config-c2c2544fbk in upstream — drops the kustomize hash suffix since we emit the resources as separate Objects). All resources are gated on spec.scaleToZeroPlugin.enabled (default true — the plugin is zero-cost when no PSQLCluster opts in). Source URL is annotated for renovate tracking: source: https://github.com/xataio/cnpg-i-scale-to-zero/releases/download/$VER/manifest.yaml renovate: datasource=github-releases depName=xataio/cnpg-i-scale-to-zero Prereq: cert-manager must be installed (provided by the dns-stack in hops-ops). Without it, the Issuer + Certificate resources won't reconcile and the plugin Deployment won't have its TLS volumes available. When PSQLClusters opt into scale-to-zero, they add: metadata.annotations: xata.io/scale-to-zero-enabled: "true" xata.io/scale-to-zero-inactivity-minutes: "10" spec.plugins: - name: cnpg-i-scale-to-zero.xata.io Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(phase 5) - README rewritten: CNPG architecture, three-stage journey (EBS-only → +LVM CoW → +Mayastor HA), full Spec Reference table, prereq notes (cert-manager via cert-manager-stack; node prep deferred to phase 3b) - examples refreshed: - minimal: just clusterName, EBS-only baseline - standard: full production posture (NodePool sub-pools, Mayastor + LVM + EBS, S2Z plugin, Atlas) - local: dev cluster with LVM CoW only (no Mayastor since replication needs >1 node, no NodePool, default Helm provider config) - tests/test-render/main.k: rewritten against the CNPG schema - dropped stackgresOperator-specific tests (field removed) - 11 tests covering: minimal renders platform operators; custom labels propagate; cnpg.overrideAllValues replaces defaults; atlas values merge; namespace propagation; per-component namespace override; helmProviderConfigRef defaults; scaleToZeroPlugin can be disabled; storage.{mayastor,lvm}.enabled compose Helm + StorageClass + VolumeSnapshotClass; nodePool.enabled composes both sub-pools All 11 tests pass; render + validate green on minimal + standard. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a privileged DaemonSet that runs on each NVMe NodePool node and configures the host-level state Mayastor + OpenEBS LVM LocalPV expect to find: - Hugepages (vm.nr_hugepages, default 1024 → 2GiB of 2MiB pages, required by Mayastor SPDK) - nvme-tcp kernel module loaded (Mayastor NVMe-oF transport) - LVM volume group on the first instance-store NVMe device (OpenEBS LVM LocalPV expects the VG to pre-exist) Auto-gated: composed only when nodePool.enabled AND (storage.mayastor.enabled OR storage.lvm.enabled). Inside the script, each step is conditional on the relevant storage backend. Schema additions: - spec.nodePrep.enabled (default true; auto-gated by storage backends) - spec.nodePrep.hugepages.count (default 1024) - spec.nodePrep.image (default alpine:3.20; apk-installs lvm2 + util-linux at startup) Render template 155-node-prep-daemonset.yaml.gotmpl: - DaemonSet with nodeSelector workload-type=psql + tolerations for both psql=true:NoSchedule and sub-pool=*:NoSchedule taints - hostPID + hostNetwork + privileged init container - Init script falls back gracefully on Bottlerocket / Auto Mode where modprobe inside containers is restricted (warns and continues) - Tiny pause container keeps the DS Ready state-init / state-status / 010-state-status updated for the new resource. Validate clean (22 resources on standard), 11/11 render tests still pass. Caveat: live verification on pat-local deferred until Mayastor or LVM is enabled in a PSQLStack claim there. Schema/composition is sound; e2e testing follows when storage backends are turned on. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-claim cost on Mayastor is controlled via replicationFactor (3 for primaries, 1 for ephemeral branches). LVM was a redundant single-node CoW backend; EBS was just wrapping the cluster's existing default SC and contradicted the stack's CoW-by-default identity. Schema changes: - spec.storage flattened — no more {mayastor,lvm,ebs} profile selector. Top-level fields now: chartVersion, namespace, storageClassName (default "psql"), replicationFactor, thin, reclaimPolicy, volumeBindingMode, allowVolumeExpansion, values, overrideAllValues. - spec.nodePool.enabled defaults to TRUE now (was false). The stack's whole point is dedicated NVMe nodes; the default should make it work. - Mayastor + StorageClass + node-prep DaemonSet all gated on nodePool.enabled. Disable nodePool to opt out (gets you only CNPG + Atlas + S2Z plugin running on the cluster's default SC). Render templates removed: - 160-openebs-lvm.yaml.gotmpl - 175-storageclass-lvm.yaml.gotmpl - 180-storageclass-ebs.yaml.gotmpl Render templates updated: - 165-openebs-mayastor.yaml.gotmpl: gated on nodePool.enabled (was storage.mayastor.enabled), uses flat $state.storage shape - 170-storageclass-mayastor.yaml.gotmpl: same gating + shape - 155-node-prep-daemonset.yaml.gotmpl: dropped LVM VG step, gated on just nodePool.enabled - 010-state-status.yaml.gotmpl: dropped LVM/EBS observed keys PSQLClusters that don't want CoW can specify any other StorageClass that exists on the target cluster (e.g., the EKS Auto Mode default gp3 SC). The stack does NOT compose a non-CoW SC — that's outside its identity. Examples + README rewritten. KCL tests updated: 10/10 passing. Validate clean: 18 resources on minimal, 18 on standard. Live verification on pat-local pending — would need to delete the existing claim (with ebs/lvm enabled) and reapply the simplified one. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…iskPools Three changes prompted by review: 1. Primary sub-pool default is now spot (was on-demand). Mayastor's replicationFactor=3 absorbs preemption — losing one replica triggers a rebuild on a fresh node, not data loss. Override to on-demand via spec.nodePool.primary.requirements when needed. 2. Karpenter NodePool inner names lose the cluster prefix. Was `<clusterName>-psql-{branches,primary}`, now `<XR.name>-{branches,primary}` (e.g. `psql-branches`, `psql-primary`). Less repetition; uses the stack's XR name for disambiguation when multiple PSQLStacks share a cluster (which is rare). Wrapper Crossplane Object names unchanged. 3. node-prep DaemonSet now also registers the local NVMe instance-store device with Mayastor by creating a per-node DiskPool CR. New ServiceAccount + ClusterRole/Binding granting get/create/list/watch on diskpools.openebs.io. Init script: detects /dev/nvme[1-9]n1 by lsblk model match, kubectl-applies a DiskPool named psql-pool-<NODE_NAME> (idempotent — skip if already present). Closes the gap that left Mayastor pools empty and PVCs stuck Pending. This means no separate ObservedObjectCollection or custom function for DiskPool registration — the same DS that handles host-level prereqs (hugepages, nvme-tcp) also handles pool registration. One declarative artifact per node-prep concern. Also fixes a YAML colon issue in the primary sub-pool description. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds spec.ha block: a single toggle that enables production-style HA defaults across every HA-able platform component without users needing to know each chart's specific values keys. When spec.ha.enabled: true (default false): - CNPG operator: replicaCount=3 + topologySpreadConstraints by zone - Atlas operator: same - cnpg-i-scale-to-zero plugin Deployment (directly composed): same - OpenEBS Mayastor: agents.core / csi.controller / etcd replicaCount=3 Per-component values can still override via the existing values block — HA values land in chartDefaults, user values mergeOverwrite them. Schema: - spec.ha.enabled (bool, default false) - spec.ha.replicas (int, default 3) - spec.ha.topologySpreadByZone (bool, default true) Standard example now demonstrates HA. New KCL test asserts replicaCount flows through to CNPG + Atlas Releases when ha.enabled=true. README updated with the new fields + a Components table entry. Render + validate clean (21 resources). 11/11 KCL tests pass. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Avoids Helm ownership conflicts when the cluster already has these CRDs from another source (e.g., a previous OpenEBS LVM install, the cluster's snapshot-controller, or another chart that bundles them). The CRDs (volumesnapshotclasses.snapshot.storage.k8s.io and friends) need to come from somewhere on the cluster — the assumption is that snapshot-controller is installed separately as a cluster-level concern, not bundled with each storage backend. Encountered live during pat-local install: leftover LVM CRD annotations blocked Mayastor's upgrade. Skip CRD install in our defaults to make this robust across re-installs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

csi-node DaemonSet was running on every cluster node, but workers without the node-prep DS don't have nvme_tcp loaded, so csi-node crashloops there ("Failed to detect nvme_tcp kernel module"). Restrict csi-node via spec.csi.node.{nodeSelector,tolerations} to the same psql NodePool nodes where node-prep loads nvme_tcp. PSQLCluster workloads always schedule on workload-type=psql nodes via the existing NodePool selectors, so this doesn't restrict anything that actually consumes Mayastor PVCs. Also fixes a chartDefaults merge bug introduced when HA mode landed: both nodePool and HA were calling \`set chartDefaults "csi"\` which clobbered each other. Now build the csi sub-dict incrementally before adding it to chartDefaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- 165-longhorn.yaml.gotmpl: longhorn chart 1.10.0, V2 data engine + SPDK - 170-storageclass-longhorn.yaml.gotmpl: driver.longhorn.io SC + VSC, dataEngine=v2, diskSelector=psql - 155-node-prep-daemonset.yaml.gotmpl: modprobe nvme_tcp / vfio_pci / uio_pci_generic / ublk_drv; replace DiskPool registration with node.longhorn.io/default-disks-config annotation - definition.yaml + state-init: chart defaults bump (1.10.0, longhorn-system); doc strings updated - 010-state-status: observe `longhorn` + `storageclass-longhorn` - README + examples updated; KCL tests updated and all 11 pass Mayastor checkpoint preserved on feat/cnpg-pivot. Longhorn V2 is "Experimental" upstream as of 1.10 — drops bundled etcd/NATS/minio/ loki/alloy footprint and unblocks arm64 Graviton (Mayastor's chart hardcodes amd64 on io_engine).

Drops Mayastor/Longhorn experimentation entirely: - Remove NodePool sub-pools (branches + primary) - Remove node-prep DaemonSet - Remove Mayastor / Longhorn Helm release templates - Remove Mayastor/Longhorn StorageClass Slim spec.storage block down to spec.snapshotClass {enabled, name, driver, deletionPolicy, parameters}. Default driver ebs.csi.aws.com (EKS Auto Mode default); override for non-AWS providers. The stack no longer composes a StorageClass — PSQLClusters target whatever SC the cluster already provides. Stack now composes 4 things: CNPG operator, Atlas operator, S2Z plugin (9 Objects), and one VolumeSnapshotClass. CNPG/Atlas/S2Z templates lose nodeSelector/tolerations refs; they run wherever Auto Mode schedules them. XRD shrinks from ~290 lines to ~170. README rewritten — no more NVMe/SPDK/iscsi noise. Examples collapsed to clean shapes. Replicated CoW storage (Longhorn et al) is now a separate concern, to be provided by aws-storage-stack (self-managed ASG nodes with proper userData) when the multi-tenant CoW economics justify the operational cost. Bottlerocket/Auto Mode is incompatible with iscsi- based engines (longhorn-manager env-checks for iscsiadm even with V2; Bottlerocket's immutable rootfs blocks installation). Branch checkpoints preserved on: feat/cnpg-pivot — Mayastor experiment feat/longhorn-pivot — Longhorn V2 experiment

EKS Auto Mode uses the managed `ebs.csi.eks.amazonaws.com` CSI driver, not the upstream `ebs.csi.aws.com`. Fix the default in state-init, XRD, README, and the minimal example so out-of-the-box installs on Auto Mode produce a working VolumeSnapshotClass without override. Self-managed EBS users can still override `spec.snapshotClass.driver` to `ebs.csi.aws.com`; non-AWS users override to their CSI driver name. Verified live on pat-local: psql VSC reconciles to driver ebs.csi.eks.amazonaws.com, XR Synced=True Ready=True, CNPG + Atlas + S2Z plugin all running cleanly on Auto Mode without dedicated nodes.

EKS Auto Mode ships the snapshot.storage.k8s.io CRDs but does NOT ship the snapshot-controller. Without a controller, our composed VolumeSnapshotClass is inert — PSQLBranch snapshots will sit forever without ever reaching ReadyToUse. Document it as a prerequisite (like cert-manager). The stack itself does not compose snapshot-controller — it's a foundational cluster concern that belongs in a separate stack (TBD: extend aws-cert-stack or create a focused snapshot-stack). Verified end-to-end on pat-local with the upstream kubernetes-csi/external-snapshotter v8.2.0 manifests: - PVC bound on ebs.csi.eks.amazonaws.com - VolumeSnapshot via the composed `psql` VSC reached readyToUse=true with a real EBS snapshot backing it.

The dependency now exists as a real stack (xrs/stacks/k8s/volume-snapshot/, ghcr.io/hops-ops/volume-snapshot-stack). Replace the manual upstream-YAML install instructions with a pointer at the stack.

coderabbitai · 2026-04-28T04:49:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR migrates a PostgreSQL management stack from StackGres-based operator control to CloudNativePG-based cluster lifecycle management, introducing new PSQLCluster and PSQLBranch composite resources, storage class/snapshot class composition, scale-to-zero plugin integration, and HA configuration surfaces across the CRDs, composition functions, examples, and test suites.

Changes

CloudNativePG Platform Migration

Layer / File(s)	Summary
CRD Schema & Public API `apis/psqlstacks/definition.yaml`, `apis/psqlclusters/definition.yaml`, `apis/psqlbranches/definition.yaml`	New CRDs `PSQLCluster` and `PSQLBranch` define cluster sizing, credentials, branching, HA, and CNPG override surfaces. `PSQLStack` CRD replaces `stackgresOperator` config with `namespace`, `kubernetesProviderConfigRef`, `ha`, `scaleToZeroPlugin`, `storageClass`, and `snapshotClass`; `status.ready` becomes component-aware.
Core Composition Functions (Stack) `functions/stack/000-state-init.yaml.gotmpl`, `functions/stack/010-state-status.yaml.gotmpl`, `functions/stack/200-cnpg-operator.yaml.gotmpl`, `functions/stack/210-cnpg-scale-to-zero.yaml.gotmpl`, `functions/stack/220-atlas-operator.yaml.gotmpl`, `functions/stack/180-storageclass.yaml.gotmpl`, `functions/stack/170-volumesnapshotclass.yaml.gotmpl`	Stack composition initializes state from CNPG/scale-to-zero/Atlas/storage configs, derives HA settings, and composes Helm releases (CNPG with HA replicaCount injection, Atlas with HA topology spread), scale-to-zero plugin deployment, and Kubernetes StorageClass/VolumeSnapshotClass objects.
Core Composition Functions (Cluster) `functions/cluster/000-state-init.yaml.gotmpl`, `functions/cluster/010-state-status.yaml.gotmpl`, `functions/cluster/200-cnpg-cluster.yaml.gotmpl`, `functions/cluster/100-external-secret.yaml.gotmpl`	Cluster composition derives instance count (HA override), storage/postgres/app/superuser defaults, and renders CNPG Cluster manifest with bootstrap, storage, monitoring, optional credentials via ExternalSecrets, and scale-to-zero plugin configuration.
Core Composition Functions (Branch) `functions/branch/000-state-init.yaml.gotmpl`, `functions/branch/010-state-status.yaml.gotmpl`, `functions/branch/100-source-snapshot.yaml.gotmpl`, `functions/branch/110-branch-snapshot.yaml.gotmpl`, `functions/branch/200-cnpg-cluster.yaml.gotmpl`, `functions/branch/999-status.yaml.gotmpl`	Branch composition handles same-namespace and cross-namespace forking via VolumeSnapshots, derives snapshot content for bootstrap recovery, renders CNPG Cluster with bootstrap from snapshot, and propagates scale-to-zero/TTL annotations.
Composition Registrations `apis/psqlstacks/composition.yaml`, `apis/psqlclusters/composition.yaml`, `apis/psqlbranches/composition.yaml`	Register function pipelines for each XRD: stack/cluster/branch each run custom stack/cluster/branch function at step 1, then auto-ready at step 2.
Function-Based State Removal `functions/render/200-stackgres-operator.yaml.gotmpl` (removed)	Deletes StackGres operator Helm release template (superseded by CNPG operator composition).
State Initialization (Stack) `functions/render/000-state-init.yaml.gotmpl`	Replaces StackGres state initialization with CNPG state including namespace (`cnpg-system`), HA config, scale-to-zero/storage/snapshot defaults, and Kubernetes provider reference.
Examples (PSQLStack) `examples/psqlstacks/minimal.yaml`, `examples/psqlstacks/standard.yaml`, `examples/psqlstacks/local.yaml`	Updated/added examples demonstrating minimal CNPG-based stack, HA production posture, and local development disable of snapshots/scale-to-zero.
Examples (PSQLCluster & PSQLBranch) `examples/psqlclusters/minimal.yaml`, `examples/psqlclusters/standard.yaml`, `examples/psqlbranches/same-namespace.yaml`, `examples/psqlbranches/cross-namespace.yaml`, `examples/psqlbranches/preview-with-ttl.yaml`	New example manifests for cluster sizing/HA/credential modes and branch same-namespace/cross-namespace/TTL scenarios.
Unit Tests (Composition) `tests/test-stack/main.k`, `tests/test-cluster/main.k`, `tests/test-branch/main.k`, `tests/test-stack/kcl.mod`, `tests/test-cluster/kcl.mod`	Comprehensive KCL test suites validating stack/cluster/branch composition behavior (label merging, HA injection, storage/snapshot propagation, credential modes, CNPG override, provider defaulting).
E2E Test `tests/e2etest-psql/main.k`	Replaces Kubernetes-only Helm test with real AWS EKS Auto Mode provisioning, Crossplane provider setup (AWS + K8s), prerequisite stack installation (volume-snapshot, cert-manager), and live PSQLStack/Cluster/Branch XR deployment.
Build & CI/CD `Makefile`, `.github/workflows/on-pr.yaml`, `.github/workflows/on-push-main.yaml`	Adds multi-API example support; updates reusable workflow references from `unbounded-tech/workflows-crossplane@v2.20.0` to `hops-ops/workflows-crossplane@v3.0.0`, expands `validate` examples to JSON with per-example `api_path`, adds `e2e` timeout inputs, and updates Makefile bulk targets to derive per-example composition/definition paths.
Metadata & Documentation `upbound.yaml`, `README.md`, `.gitignore`	Updates project description and dependencies (`provider-kubernetes` added); rewrites README from StackGres management stack to CloudNativePG platform layer with HA/snapshot/scale-to-zero composition; adds ignores for configuration YAML and test environment directories.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant XRD as XRD (PSQLStack)
    participant Stack as Stack Composition<br/>(hops-ops-psql-stackstack)
    participant CNPG as CNPG Operator<br/>(Helm Release)
    participant K8s as Kubernetes<br/>(StorageClass,<br/>VolumeSnapshot)
    participant Scale as Scale-to-Zero<br/>Plugin
    participant Atlas as Atlas Operator<br/>(Helm Release)

    User->>XRD: Create PSQLStack with HA,<br/>storage, snapshot config
    XRD->>Stack: Execute composition pipeline
    Stack->>Stack: Initialize state from spec<br/>(namespace, labels, HA, provider refs)
    Stack->>CNPG: Compose Helm Release<br/>(inject HA replicaCount)
    Stack->>K8s: Compose StorageClass Object<br/>(provisioner, parameters)
    Stack->>K8s: Compose VolumeSnapshotClass Object<br/>(driver matching storage)
    alt scaleToZeroPlugin.enabled
        Stack->>Scale: Compose plugin deployment<br/>(ServiceAccount, RBAC, Certs, Service)
    end
    Stack->>Atlas: Compose Helm Release<br/>(inject HA topology spread)
    Note over Stack,Atlas: Auto-ready waits for all<br/>enabled components ready
    CNPG-->>XRD: Release reconciles, creates<br/>cloudnative-pg chart
    K8s-->>XRD: StorageClass & VolumeSnapshotClass<br/>objects created
    Scale-->>XRD: Plugin deployment ready
    Atlas-->>XRD: Release reconciles, creates<br/>atlas-operator chart
    XRD-->>User: status.ready = true when<br/>all enabled components ready

sequenceDiagram
    actor User
    participant XRD as XRD (PSQLCluster)
    participant Comp as Cluster Composition<br/>(hops-ops-psql-stackcluster)
    participant CNPG as CNPG Cluster<br/>(Kubernetes Object)
    participant ESO as External Secrets<br/>Operator (optional)
    participant K8s as Kubernetes Secrets
    participant Monitor as CNPG<br/>Monitoring

    User->>XRD: Create PSQLCluster with<br/>storage, app, HA settings
    XRD->>Comp: Execute composition pipeline
    Comp->>Comp: Initialize state<br/>(size, version, HA instances,<br/>credential modes)
    alt app.externalSecret configured
        Comp->>ESO: Compose ExternalSecret Object<br/>(fetch username/password)
    else app.secretName provided
        Comp->>K8s: Use provided secret
    else default
        Comp->>K8s: Let CNPG generate secret
    end
    Comp->>CNPG: Compose CNPG Cluster Object<br/>(instances, bootstrap, storage,<br/>monitoring, optional superuser)
    CNPG-->>Monitor: Enable PodMonitor for metrics
    alt ha.enabled
        Note over CNPG: instances override,<br/>anti-affinity topology
    end
    alt scaleToZero.enabled
        CNPG-->>CNPG: Inject scale-to-zero plugin config
    end
    ESO-->>K8s: Populate credentials if ESO used
    CNPG-->>XRD: Cluster reconciles, spins up<br/>PostgreSQL pods
    XRD-->>User: status.ready + connection metadata<br/>(host/port/database)

sequenceDiagram
    actor User
    participant XRD as XRD (PSQLBranch)
    participant Comp as Branch Composition<br/>(hops-ops-psql-stackbranch)
    participant SrcSnap as Source VolumeSnapshot<br/>(cross-ns only)
    participant BrSnap as Branch VolumeSnapshot
    participant CNPG as CNPG Cluster<br/>(bootstrapped from fork)

    User->>XRD: Create PSQLBranch<br/>(clusterName, source, storage config)
    XRD->>Comp: Execute composition pipeline
    Comp->>Comp: Initialize state<br/>(namespace, source defaults,<br/>snapshot class, scale-to-zero)
    alt crossNamespace
        Comp->>SrcSnap: Compose source VolumeSnapshot Object<br/>(target source PVC in source namespace)
    end
    Comp->>BrSnap: Compose branch VolumeSnapshot Object<br/>(reference source PVC or snapshot content)
    Comp->>Comp: Derive observed snapshot content<br/>(track boundVolumeSnapshotContentName)
    Comp->>CNPG: Compose CNPG Cluster Object<br/>(bootstrap.recovery.volumeSnapshots,<br/>instances, storage size, labels)
    alt scaleToZero.enabled
        CNPG-->>CNPG: Add idle-timeout annotation<br/>and plugin config
    end
    alt ttl.enabled
        CNPG-->>CNPG: Add TTL expiration annotation
    end
    SrcSnap-->>BrSnap: Content available for branch fork
    BrSnap-->>CNPG: VolumeSnapshot bound,<br/>content reference set
    CNPG-->>XRD: Cluster bootstraps from snapshot,<br/>fork ready
    XRD-->>User: status.ready + phase<br/>(bootstrap complete)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A stack once called Stackgres stood tall,
Now CloudNative blooms—no walls at all!
With snapshots for forks and scale-to-zero's grace,
HA spreads wide across every place.
Hop hop hooray—the platform's complete!
🌱✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/auto-mode-pivot

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)

77-89: ⚠️ Potential issue | 🟡 Minor

The Stage 3/4 examples override the wrong provider only.

helmProviderConfigRef.name: default fixes the Helm releases, but the VolumeSnapshotClass and scale-to-zero Objects still default kubernetesProviderConfigRef.name to clusterName in functions/render/000-state-init.yaml.gotmpl. As written, these examples will still look for Kubernetes ProviderConfigs named edge / local.

Suggested doc fix

 spec:
   clusterName: edge
   helmProviderConfigRef:
     name: default
+  kubernetesProviderConfigRef:
+    name: default
   snapshotClass:
     driver: driver.longhorn.io

 spec:
   clusterName: local
   helmProviderConfigRef:
     name: default
+  kubernetesProviderConfigRef:
+    name: default
   snapshotClass:
     enabled: false
   scaleToZeroPlugin:
     enabled: false

Also applies to: 95-109

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 77 - 89, The example PSQLStack YAML sets
helmProviderConfigRef.name: default but leaves kubernetesProviderConfigRef.name
to the clusterName (causing VolumeSnapshotClass and scale-to-zero Objects to
still use the wrong provider); update the example blocks (the PSQLStack examples
around the Stage 3/4 snippets) to explicitly set
kubernetesProviderConfigRef.name: default (or the same provider name used for
helmProviderConfigRef) so functions/render/000-state-init.yaml.gotmpl will
render the correct ProviderConfig for VolumeSnapshotClass and scale-to-zero
Objects; ensure both occurrences noted (the block at lines ~77-89 and the
similar block at ~95-109) are updated to keep provider names consistent.

🧹 Nitpick comments (1)

.gitignore (1)
14-14: Consider documenting/keeping an example config for devs/CI.

If apis/**/configuration.yaml is meant to be generated or environment-specific, it usually helps to add a committed example/template (e.g., configuration.yaml.example) and (optionally) a comment near the ignore line describing how to generate/copy it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 14, The ignore of apis/**/configuration.yaml hides
whether this is generated or required; add a committed example/template file
(e.g., configuration.yaml.example) alongside the real config for each API,
update README or CI docs to show how to copy/populate it (cp
configuration.yaml.example configuration.yaml or use env substitution), and add
a brief inline comment near the apis/**/configuration.yaml entry in .gitignore
indicating the template location and generation method so devs and CI know how
to produce the real file; reference the pattern apis/**/configuration.yaml and
ensure the new configuration.yaml.example is added to the repo for each service
that needs it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 190-191: Update the description for the ready field so it reflects
enabled components rather than implying all four are mandatory; change the text
for ready/status.ready to say something like "Overall readiness — true once all
enabled components (CNPG, Atlas, the scale-to-zero plugin if
scaleToZeroPlugin.enabled, and the VolumeSnapshotClass if snapshotClass.enabled)
are Ready" so the description references scaleToZeroPlugin.enabled and
snapshotClass.enabled and matches actual behavior.
- Around line 147-170: Update the snapshotClass description to match the schema
defaults: state that the stack composes exactly one VolumeSnapshotClass named
"psql" (not "named after the XR") and that the default CSI driver is
ebs.csi.eks.amazonaws.com (not ebs.csi.aws.com); keep a note that users can
override driver for other providers (e.g., ebs.csi.aws.com for self-managed EBS
or driver.longhorn.io) and ensure references to PSQLBranch and
VolumeSnapshotClass remain intact (look for snapshotClass, properties.name, and
properties.driver).

In `@functions/render/000-state-init.yaml.gotmpl`:
- Around line 62-68: Tests still set spec.stackgresOperator.values which this
initializer no longer reads; update the test to populate the new fields used
here ($cnpg via spec.cnpg, $s2z via spec.scaleToZeroPlugin and $s2z.enabled for
$s2zEnabled, and $atlas via spec.atlasOperator) or remove the obsolete
assignment. In the test file (tests/e2etest-psql/main.k) replace the
stackgresOperator.values block with equivalent entries under spec.cnpg,
spec.scaleToZeroPlugin (including an enabled flag if used), and
spec.atlasOperator, mapping any specific config keys from the old values into
the corresponding new sections, or delete the block if those settings are no
longer required.

In `@functions/render/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the comment in the volumesnapshotclass template to
reflect the correct default CSI driver: replace the stale value
"ebs.csi.aws.com" with "ebs.csi.eks.amazonaws.com" and mention that this default
originates from the state init template (which sets the default driver used by
$spec.snapshotClass.driver); ensure the comment and any example override
instructions reference "ebs.csi.eks.amazonaws.com" so readers are pointed to the
correct default.

In `@functions/render/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage guard currently only renders when both
$state.observed.cnpg.ready and $state.observed.atlasOperator.ready are true,
which is too late; change the conditional to trigger when the CNPG and Atlas
operator are present/declared (e.g. use $state.observed.cnpg.exists or
$state.observed.cnpg.present and $state.observed.atlasOperator.exists/present or
the equivalent keys that indicate intended composition) so the Usage (name: {{
$state.name }}-delete-atlas-operator-before-cnpg, resourceRef names {{
$cnpg.name }} and {{ $state.atlasOperator.name }}) is created before readiness
is reached and prevents CNPG removal during Atlas teardown.

In `@tests/test-render/main.k`:
- Around line 282-311: The test only checks positive presence via
CompositionTest.assertResources but lacks negative checks for s2z-* and the
VolumeSnapshotClass; update the CompositionTest block for the
"scale-to-zero-plugin-can-be-disabled" (and the other similar case) to
explicitly assert absence by using the test framework's absence check (e.g., add
assertResourcesAbsent with entries for resources matching kind/name patterns
"s2z-*" and the VolumeSnapshotClass or add an explicit rendered resource
count/assertion like renderedResourceCount == expected) so the test fails if
those s2z-* objects or VolumeSnapshotClass are emitted; locate the
CompositionTest blocks (metadata.name = "scale-to-zero-plugin-can-be-disabled"
and the similar test) and add the negative assertions alongside assertResources.

In `@upbound.yaml`:
- Around line 19-21: Update the metadata "description" and any README text to
remove references to an optional Karpenter NodePool and instead state that this
stack deploys CloudNativePG, the cnpg-i-scale-to-zero plugin, and the Atlas
Operator via Helm releases plus applied manifests, and optionally configures a
VolumeSnapshotClass; ensure the wording matches the rendered template
functions/render/210-cnpg-scale-to-zero.yaml.gotmpl and the XR shape in
apis/psqlstacks/definition.yaml. Locate and replace the stale phrase in the
description field and the README block (also update the repeated text around
lines 27-35) so the file consistently mentions Helm + applied manifests +
optional VolumeSnapshotClass and omits any NodePool language.

---

Outside diff comments:
In `@README.md`:
- Around line 77-89: The example PSQLStack YAML sets helmProviderConfigRef.name:
default but leaves kubernetesProviderConfigRef.name to the clusterName (causing
VolumeSnapshotClass and scale-to-zero Objects to still use the wrong provider);
update the example blocks (the PSQLStack examples around the Stage 3/4 snippets)
to explicitly set kubernetesProviderConfigRef.name: default (or the same
provider name used for helmProviderConfigRef) so
functions/render/000-state-init.yaml.gotmpl will render the correct
ProviderConfig for VolumeSnapshotClass and scale-to-zero Objects; ensure both
occurrences noted (the block at lines ~77-89 and the similar block at ~95-109)
are updated to keep provider names consistent.

---

Nitpick comments:
In @.gitignore:
- Line 14: The ignore of apis/**/configuration.yaml hides whether this is
generated or required; add a committed example/template file (e.g.,
configuration.yaml.example) alongside the real config for each API, update
README or CI docs to show how to copy/populate it (cp configuration.yaml.example
configuration.yaml or use env substitution), and add a brief inline comment near
the apis/**/configuration.yaml entry in .gitignore indicating the template
location and generation method so devs and CI know how to produce the real file;
reference the pattern apis/**/configuration.yaml and ensure the new
configuration.yaml.example is added to the repo for each service that needs it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 087a343f-4577-4bb1-9082-5215a0bdc6fc

📥 Commits

Reviewing files that changed from the base of the PR and between 495fa03 and b7a522f.

📒 Files selected for processing (15)

.gitignore
README.md
apis/psqlstacks/definition.yaml
examples/psqlstacks/local.yaml
examples/psqlstacks/minimal.yaml
examples/psqlstacks/standard.yaml
functions/render/000-state-init.yaml.gotmpl
functions/render/010-state-status.yaml.gotmpl
functions/render/170-volumesnapshotclass.yaml.gotmpl
functions/render/200-cnpg-operator.yaml.gotmpl
functions/render/200-stackgres-operator.yaml.gotmpl
functions/render/210-cnpg-scale-to-zero.yaml.gotmpl
functions/render/220-atlas-operator.yaml.gotmpl
tests/test-render/main.k
upbound.yaml

💤 Files with no reviewable changes (1)

functions/render/200-stackgres-operator.yaml.gotmpl

Merges PSQLCluster and PSQLBranch XRDs (previously standalone repos under hops-ops/psql-cluster and hops-ops/psql-branch) into this package. One Configuration package, one release cadence, one e2e flow. Changes: - apis/{psqlclusters,psqlbranches}/ — XRDs and compositions copied in - examples/{psqlclusters,psqlbranches}/ — example manifests copied in - functions/render/ → functions/stack/ — renamed to make room for siblings - functions/{cluster,branch}/ — new function packages, gotmpls copied from the standalone repos. Composition functionRefs updated: psqlstacks → hops-ops-psql-stackstack psqlclusters → hops-ops-psql-stackcluster psqlbranches → hops-ops-psql-stackbranch - tests/test-{stack,cluster,branch}/ — render tests renamed (was test-render-*) - tests/e2etest-psql/main.k — unified e2e covering all three XRs at Synced; TODO upgrade to Ready integration after volume-snapshot-stack v0.1.0 - .github/workflows/on-pr.yaml + on-push-main.yaml — switched to multi-API workflow signature, pinned at @feat/multi-api-support for testing - Makefile — EXAMPLES list extended; render/validate logic still single-API (follow-up to make per-example api_path work locally) Workflow change being tested: unbounded-tech/workflows-crossplane@feat/multi-api-support (validate.yaml now resolves api_path per example with fallback to inputs.api_path)

volume-snapshot-stack v0.1.0 is now published, so we can install it as a dependency Configuration package and bring snapshot-controller into the test cluster. With that, the whole chain reconciles to Ready in-cluster: 1. VolumeSnapshotStack XR (in extraResources) → snapshot-controller live 2. PSQLStack manifest → Helm-installs CNPG + atlas-operator + the psql VolumeSnapshotClass 3. PSQLCluster manifest → CNPG bootstraps a real Postgres with a real PVC 4. PSQLBranch manifest → snapshots the source PVC, restores into a new CNPG cluster Pattern mirrors the aws-observe-stack e2e (initResources for dependency Configuration packages, extraResources for the dependent XRs). defaultConditions: Synced → Ready timeoutSeconds: 1800 → 5400 (90 min for the full chain) cleanupTimeoutSeconds: 900 → 1800

The unified e2e was carrying over `stackgresOperator.values` and `atlasOperator.values` from before the CNPG pivot. The current PSQLStack XRD uses `cnpg` (not `stackgresOperator`), `namespace` defaults to `cnpg-system` (not `stackgres`), and the kind-cluster defaults work without any operator-values overrides. Match `local/psqlstack.yaml`'s minimal shape: clusterName + labels + ProviderConfig refs only. Add `kubernetesProviderConfigRef` since PSQLStack now also applies the VolumeSnapshotClass via the kubernetes provider.

coderabbitai

Actionable comments posted: 14

♻️ Duplicate comments (2)

functions/stack/170-volumesnapshotclass.yaml.gotmpl (1)

10-12: ⚠️ Potential issue | 🟡 Minor

Update the stale default CSI driver comment.

Line 10 documents ebs.csi.aws.com, but the stack default is ebs.csi.eks.amazonaws.com. This will mislead operators about expected defaults.

Suggested fix

-# Default driver: ebs.csi.aws.com (correct for EKS Auto Mode out of the box).
+# Default driver: ebs.csi.eks.amazonaws.com (EKS Auto Mode default in state init).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl` around lines 10 - 12,
Update the stale comment that names the default CSI driver: change the
documented driver string from "ebs.csi.aws.com" to the current stack default
"ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it reflects
the actual default; locate the comment near the spec.snapshotClass.driver
reference in the VolumeSnapshotClass/gotmpl block and replace the old driver
name with the new one.

functions/stack/200-cnpg-operator.yaml.gotmpl (1)

64-85: ⚠️ Potential issue | 🟠 Major

Render deletion guard on presence, not readiness.

Line 64 delays Usage creation until both releases are Ready. If Atlas exists but is not Ready, teardown protection may never activate when needed.

Suggested fix

-{{- if and $state.observed.cnpg.ready $state.observed.atlasOperator.ready }}
+{{- $observed := $.observed.resources | default dict }}
+{{- if and (hasKey $observed "cnpg-operator") (hasKey $observed "atlas-operator") }}
 ---
 apiVersion: protection.crossplane.io/v1beta1
 kind: Usage

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@functions/stack/200-cnpg-operator.yaml.gotmpl` around lines 64 - 85, The
Usage resource guard is gated on readiness ($state.observed.cnpg.ready and
$state.observed.atlasOperator.ready) so teardown protection can be skipped if
Atlas exists but isn't Ready; change the template condition to check
presence/existence instead of readiness (e.g., replace checks of
$state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their
existence/presence flags such as $state.observed.cnpg.exists and
$state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name
}}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the
Atlas operator are present regardless of Ready state.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/on-pr.yaml:
- Around line 30-53: The workflow is referencing mutable branch refs (uses:
unbounded-tech/workflows-crossplane/...@feat/multi-api-support) for the reusable
workflows validate, test, e2e and publish; replace those branch refs with an
immutable tag or commit SHA (e.g., `@vX.Y.Z` or a specific commit) for each uses
entry so the CI points to a fixed release—update the four uses lines
(validate.yaml, test.yaml, e2e.yaml, publish.yaml) to the chosen semantic
version tag or commit SHA consistent with the existing workflow-vnext-tag
pattern.

In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references mutable branch refs for
the reusable workflows in the validate, test, and e2e jobs (the lines using
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update
those refs to an immutable release tag (for example replace
`@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag)
so validate, test, and e2e always run the exact released workflow code.

In `@apis/psqlbranches/definition.yaml`:
- Around line 117-123: Remove the hard-coded default for branch Postgres version
by deleting the default: "17" entry under the postgresql -> properties ->
version schema in definition.yaml so an omitted version is not coerced to "17";
ensure the schema allows absence (do not add a fixed default) so callers can
distinguish "inherit from source" vs "explicitly set", and if needed mark the
version property nullable or optional instead of supplying a default.

In `@apis/psqlclusters/composition.yaml`:
- Around line 11-16: The composition references custom functions
hops-ops-psql-stackcluster, hops-ops-psql-stackstack, and
hops-ops-psql-stackbranch but they are not declared as dependencies in
upbound.yaml; update upbound.yaml's dependsOn section to include entries for
these three functions (using their package names and versions) alongside the
existing crossplane-contrib-function-auto-ready, or alternatively
document/ensure those three functions are preinstalled before deploying—make
sure to add exact package identifiers for hops-ops-psql-stackcluster,
hops-ops-psql-stackstack, and hops-ops-psql-stackbranch so the composition can
resolve the functionRef names.

In `@apis/psqlclusters/definition.yaml`:
- Around line 154-161: The OpenAPI schema exposes monitoring.enabled but the
cluster compose template is reading $monSpec.monitoring instead of the boolean
$monSpec.enabled, causing explicit false to be ignored; update the template in
functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring
section uses $monSpec.enabled (or conditionally sets monitoring.enabled to
$monSpec.enabled) rather than copying the whole $monSpec.monitoring object,
ensuring explicit false values disable PodMonitor creation as intended.

In `@apis/psqlstacks/composition.yaml`:
- Around line 11-16: The composition references a Function named
hops-ops-psql-stackstack but that Function is missing from the project
dependency manifest; add a Function resource entry named
hops-ops-psql-stackstack into your upbound.yaml (or the project’s Functions
list) so the name exactly matches the functionRef in the composition, providing
the required metadata and spec (image/source, runtime, and any config) that
Crossplane expects for that Function; ensure the Function resource kind/name is
hops-ops-psql-stackstack and then rebuild/update the package so the Function is
available at runtime.

In `@functions/branch/010-state-status.yaml.gotmpl`:
- Around line 19-27: The template currently stops fallback when a
"branch-snapshot" exists even if its boundVolumeSnapshotContentName is empty,
causing $snapContent to be cleared; update the selection logic so you prefer the
branch-snapshot only when it actually contains a non-empty
boundVolumeSnapshotContentName and otherwise fall back to source-snapshot.
Concretely, compute both get $observed "branch-snapshot" and get $observed
"source-snapshot" into $snapEntryBranch and $snapEntrySource, extract their
boundVolumeSnapshotContentName via the existing pipeline ($snapResource →
$snapAtProvider → $snapManifest → $snapStatus → boundVolumeSnapshotContentName),
and set $snapEntry/$snapContent to the branch one only if that
boundVolumeSnapshotContentName is non-empty; otherwise use the source-snapshot
values so $snapContent never gets overwritten with an empty string.

In `@functions/branch/100-source-snapshot.yaml.gotmpl`:
- Around line 16-43: The VolumeSnapshot metadata name currently uses "{{
$state.name }}-src" which can collide across namespaces; change the snapshot
name generation in the template so it includes a namespace-specific component
(for example incorporate {{ $source.namespace }} and/or {{ $source.pvcName }} or
a short hash of them) instead of just {{ $state.name }}, updating the
metadata.name line inside the forProvider.manifest block (the VolumeSnapshot
metadata) and any related references (e.g. resource name labels/annotations) so
the snapshot is unique per source namespace/PVC.

In `@functions/branch/200-cnpg-cluster.yaml.gotmpl`:
- Around line 78-85: The template always sets storage.size from
$state.branch.storage.size which causes branches to default to 10Gi even when
the source PVC is larger; change the assignment so size falls back to the source
PVC when branch size is empty (e.g. compute $size := default
$state.source.storage.size $state.branch.storage.size or equivalent) and use
that $size when building $storage, keeping the conditional storageClass logic
and assigning to $clusterSpec "storage".

In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 67-72: The template currently places the superuser secret into
bootstrap.initdb.secret (the block with "bootstrap" "initdb" "owner" "app" and
secret name $state.credentials.superuser.secretName); instead, wire the
superuser credential via spec.superuserSecret using
$state.credentials.superuser.secretName and make bootstrap.initdb.secret point
to the application-owner secret (a secret whose username matches owner: app).
Update the template to set spec.superuserSecret to
$state.credentials.superuser.secretName and ensure the "bootstrap" -> "initdb"
-> "secret" uses the app-owner secret (not the postgres superuser secret).

In `@functions/stack/000-state-init.yaml.gotmpl`:
- Around line 71-85: The comment above the VolumeSnapshotClass block is out of
sync with the template default driver; update the human-readable note that
currently says "ebs.csi.aws.com" to match the rendered default
"ebs.csi.eks.amazonaws.com" so docs match the actual fallback defined by
$snapshotSpec and the $snapshotClass dict (see variables $snapshotSpec,
$snapshotEnabled and $snapshotClass where driver defaults to
"ebs.csi.eks.amazonaws.com").

In `@Makefile`:
- Around line 25-31: The new examples in EXAMPLES (psqlclusters and
psqlbranches) are still being processed through the existing
$(DEFINITION)/$(COMPOSITION) paths for apis/psqlstacks, so the derived
XRD/composition will be generated against the wrong schema; update the Makefile
rules that consume $(EXAMPLES) and produce $(DEFINITION)/$(COMPOSITION) to apply
the api-dir macro (api-dir) when computing the api_path (e.g., use the api-dir
expansion or $(call api-dir,...) in the pattern-substitution used by the targets
that refer to DEFINITION and COMPOSITION) so each example maps to its
corresponding apis/<x>/... paths instead of always apis/psqlstacks.

In `@tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k`:
- Around line 95-120: The APIDependencies union schema
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 is inconsistent: it
exposes git, http, and k8s payload fields but the discriminator $type only
allows "k8s" | "crd" and there is no crd payload type defined. Fix by aligning
the discriminator with the payloads — either add a `crd` payload schema and
implement its field type if "crd" is intended, or expand/replace `$type` to
include the actual variants ("git", "http", "k8s") and ensure only the matching
payload field (git/http/k8s) is present for each discriminator value; update
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 accordingly.

In `@tests/test-stack/main.k`:
- Around line 298-311: The test's assertResources list is missing an assertion
for the Atlas Release despite the test description claiming "cnpg + atlas + VSC
still composed"; add an entry to the assertResources array similar to the
existing CNPG Release assertion that checks for the Atlas Release (apiVersion
"helm.m.crossplane.io/v1beta1", kind "Release", metadata.name set to the Atlas
release name used elsewhere in tests, e.g., "atlas"), so the test will fail if
Atlas is not composed.

---

Duplicate comments:
In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the stale comment that names the default CSI driver:
change the documented driver string from "ebs.csi.aws.com" to the current stack
default "ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it
reflects the actual default; locate the comment near the
spec.snapshotClass.driver reference in the VolumeSnapshotClass/gotmpl block and
replace the old driver name with the new one.

In `@functions/stack/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage resource guard is gated on readiness
($state.observed.cnpg.ready and $state.observed.atlasOperator.ready) so teardown
protection can be skipped if Atlas exists but isn't Ready; change the template
condition to check presence/existence instead of readiness (e.g., replace checks
of $state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their
existence/presence flags such as $state.observed.cnpg.exists and
$state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name
}}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the
Atlas operator are present regardless of Ready state.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fbd16ce-b73a-4e47-bd02-3efa17c8d181

📥 Commits

Reviewing files that changed from the base of the PR and between b7a522f and cdb8d5d.

📒 Files selected for processing (57)

.github/workflows/on-pr.yaml
.github/workflows/on-push-main.yaml
Makefile
apis/psqlbranches/composition.yaml
apis/psqlbranches/definition.yaml
apis/psqlclusters/composition.yaml
apis/psqlclusters/definition.yaml
apis/psqlstacks/composition.yaml
examples/psqlbranches/cross-namespace.yaml
examples/psqlbranches/preview-with-ttl.yaml
examples/psqlbranches/same-namespace.yaml
examples/psqlclusters/minimal.yaml
examples/psqlclusters/standard.yaml
functions/branch/000-state-init.yaml.gotmpl
functions/branch/010-state-status.yaml.gotmpl
functions/branch/100-source-snapshot.yaml.gotmpl
functions/branch/110-branch-snapshot.yaml.gotmpl
functions/branch/200-cnpg-cluster.yaml.gotmpl
functions/branch/999-status.yaml.gotmpl
functions/cluster/000-state-init.yaml.gotmpl
functions/cluster/010-state-status.yaml.gotmpl
functions/cluster/100-external-secret.yaml.gotmpl
functions/cluster/200-cnpg-cluster.yaml.gotmpl
functions/cluster/999-status.yaml.gotmpl
functions/stack/000-state-init.yaml.gotmpl
functions/stack/010-state-status.yaml.gotmpl
functions/stack/170-volumesnapshotclass.yaml.gotmpl
functions/stack/200-cnpg-operator.yaml.gotmpl
functions/stack/210-cnpg-scale-to-zero.yaml.gotmpl
functions/stack/220-atlas-operator.yaml.gotmpl
functions/stack/999-status.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-branch/kcl.mod
tests/test-branch/main.k
tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.k
tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k
tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
tests/test-branch/model/kcl.mod
tests/test-cluster/kcl.mod
tests/test-cluster/main.k
tests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.k
tests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.k
tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
tests/test-cluster/model/kcl.mod
tests/test-stack/kcl.mod
tests/test-stack/main.k
tests/test-stack/model

✅ Files skipped from review due to trivial changes (27)

tests/test-cluster/model/kcl.mod
tests/test-branch/model/kcl.mod
examples/psqlclusters/minimal.yaml
tests/test-stack/kcl.mod
tests/test-cluster/kcl.mod
tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
examples/psqlclusters/standard.yaml
functions/branch/999-status.yaml.gotmpl
examples/psqlbranches/preview-with-ttl.yaml
tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
apis/psqlbranches/composition.yaml
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
functions/cluster/999-status.yaml.gotmpl
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.k
tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
functions/branch/000-state-init.yaml.gotmpl
examples/psqlbranches/same-namespace.yaml
tests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.k
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.k
tests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.k
examples/psqlbranches/cross-namespace.yaml
tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.k
tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.k

coderabbitai · 2026-04-29T01:26:10Z

+    uses: unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support
    with:
      examples: |
        [
-          { "example": "examples/psqlstacks/minimal.yaml" },
-          { "example": "examples/psqlstacks/standard.yaml" }
+          { "example": "examples/psqlstacks/minimal.yaml",   "api_path": "apis/psqlstacks" },
+          { "example": "examples/psqlstacks/standard.yaml",  "api_path": "apis/psqlstacks" },
+          { "example": "examples/psqlclusters/minimal.yaml", "api_path": "apis/psqlclusters" },
+          { "example": "examples/psqlclusters/standard.yaml","api_path": "apis/psqlclusters" },
+          { "example": "examples/psqlbranches/same-namespace.yaml", "api_path": "apis/psqlbranches" }
        ]
-      api_path: apis/psqlstacks
      error_on_missing_schemas: true

  test:
-    uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@v2.20.0
+    uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@feat/multi-api-support

  e2e:
-    uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@v2.20.0
+    uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@feat/multi-api-support


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/on-push-main.yaml | head -50

Repository: hops-ops/psql-stack

Length of output: 1893

🏁 Script executed:

cat -n .github/workflows/on-push-main.yaml

Repository: hops-ops/psql-stack

Length of output: 2013

Pin reusable workflows to immutable refs before merging.

The validate, test, and e2e jobs reference @feat/multi-api-support, a mutable branch that can change without review. This breaks mainline CI reproducibility—different runs may execute different code. Pin these to immutable release tags (like the v1.21.0 used for version-and-tag).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow currently references mutable branch refs for the reusable workflows in the validate, test, and e2e jobs (the lines using unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support, test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update those refs to an immutable release tag (for example replace `@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag) so validate, test, and e2e always run the exact released workflow code.

coderabbitai · 2026-04-29T01:26:10Z

+              monitoring:
+                description: Add Prometheus scrape configuration. CNPG operator handles the actual PodMonitor creation when `monitoring.enablePodMonitor` is set on the Cluster CR.
+                type: object
+                properties:
+                  enabled:
+                    type: boolean
+                    default: true
+


⚠️ Potential issue | 🟠 Major

monitoring.enabled is exposed but not wired through.

The schema advertises a toggle here, but functions/cluster/000-state-init.yaml.gotmpl:72-76 currently reads $monSpec.monitoring instead of $monSpec.enabled, so explicit false values are ignored and the composed Cluster stays monitored.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apis/psqlclusters/definition.yaml` around lines 154 - 161, The OpenAPI schema exposes monitoring.enabled but the cluster compose template is reading $monSpec.monitoring instead of the boolean $monSpec.enabled, causing explicit false to be ignored; update the template in functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring section uses $monSpec.enabled (or conditionally sets monitoring.enabled to $monSpec.enabled) rather than copying the whole $monSpec.monitoring object, ensuring explicit false values disable PodMonitor creation as intended.

Three locations claimed the default driver was `ebs.csi.aws.com` and the VolumeSnapshotClass was "named after the XR" — both wrong relative to the actual schema (driver `ebs.csi.eks.amazonaws.com`, name defaults to `psql`). This text shows up in `kubectl explain`, generated docs, and template comments, so it needed to match. Also clarified `status.ready` description: components are toggleable, so readiness is "every enabled component is Ready" rather than implying all four are mandatory. From CodeRabbit review on PR #10.

The s2z-disabled test description claimed cnpg + atlas + VSC are still composed, but only cnpg and the VSC were asserted — a regression that broke the Atlas Release would silently pass. Added the Atlas assertion. From CodeRabbit review on PR #10.

… not readiness The Usage that protects cnpg-operator from premature deletion (so Atlas is torn down first and CNPG isn't yanked while Atlas's migration state is still live) was rendered only after both Releases reported Ready. That left a window: if Atlas was mid-progressing or in error and the user deleted the stack, no Usage existed yet, and CNPG could be deleted first — exactly the ordering this guard exists to prevent. Switched the gate from `$state.observed.{cnpg,atlasOperator}.ready` to `hasKey $.observed.resources "{cnpg,atlas}-operator"`. The Usage now appears as soon as both Releases are observed, regardless of their readiness state. Verified locally: render tests still pass; cluster reinstall via `hops config install --path` brings all three XRs back to Synced/Ready. From CodeRabbit review on PR #10.

…ed externalSecret The previous `credentials.superuser` shape was misleading: the secret named "superuser" was actually wired into CNPG's `bootstrap.initdb.secret`, which takes the *application user's* credentials. The actual postgres superuser secret was either auto-generated by CNPG or absent from the spec entirely. And it collided with CNPG's own `<cluster>-superuser` secret naming. New shape (no backwards compat — still alpha): spec.app: # always present; wires bootstrap.initdb role: app # Postgres role name database: app # Application database name secretName: "" # K8s Secret; default <cluster>-app externalSecret: # OPTIONAL — when set, ESO renders the Secret secretStore: kind: ClusterSecretStore | SecretStore name: hops-aws-secrets-manager namespace: "" # for SecretStore (defaults to XR ns) secretRef: path: my-cluster/app # remote location; JSON value with username+password spec.superuser: # OPTIONAL — omit to let CNPG auto-generate secretName: "" # default <cluster>-superuser externalSecret: { ... } # same shape as app.externalSecret When `superuser` is set, the composition renders `spec.superuserSecret` on the wrapped CNPG Cluster CR; otherwise CNPG auto-generates and stores the secret at `<cluster>-superuser` (its own convention). Field names mirror External Secrets Operator's CRD shape so anyone familiar with ESO can read it at a glance. Status now exposes connection details so dependent XRs can wire without hardcoding: status.app: { secretName, database, host, port } status.superuser: { secretName } Render template factored to a single `psqlcluster.externalSecret` definition so the app/superuser ExternalSecrets share one source of truth. Tests rewritten: - Test 1 ("minimal-renders-cluster-only") asserts default mode renders only the Cluster — no ExternalSecret, since externalSecret is now opt-in. - Test 7 ("external-secret-renders-when-opted-in") explicitly asserts the ESO ExternalSecret with the new shape. Migration for existing manifests: `credentials.superuser.managedBy: ""` → remove the block (BYO is now default). `credentials.superuser` with ESO fields → move under `spec.app.externalSecret` matching the new shape. From CodeRabbit review on PR #10.

… repos The render:all and validate:all targets were hardcoded to apis/psqlstacks via $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), so multi-API examples (psqlclusters/*, psqlbranches/*) were rendered against the wrong schema. Each example now resolves its own apis/<plural>/ dir from the example path (`examples/<plural>/<file>.yaml` → `apis/<plural>`) and uses that for both `up composition render --xrd=...` and `crossplane beta validate <api_dir>`. Verified locally: `make render` and `make validate` both clean across all five examples. From CodeRabbit review on PR #10.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

.github/workflows/on-push-main.yaml (1)
26-42: ⚠️ Potential issue | 🟠 Major

Pin the reusable workflows to immutable refs.

validate, test, and e2e still pull workflow code from @feat/multi-api-support, so a later branch push can change main CI behavior without a repo change here. Please switch these to a tag or commit SHA before merging.
#!/bin/bash
set -euo pipefail
rg -n 'uses:\s+unbounded-tech/workflows-crossplane/.+@' .github/workflows/on-push-main.yaml .github/workflows/on-pr.yaml .github/workflows/on-version-tagged.yaml
Expected result: on-push-main.yaml and on-pr.yaml should reference immutable tags/SHAs rather than branch names.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow
currently references reusable workflows by branch refs
(validate.yaml@feat/multi-api-support, test.yaml@feat/multi-api-support,
e2e.yaml@feat/multi-api-support); update each "uses:
unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support"
to point to an immutable tag or commit SHA (e.g., replace
"@feat/multi-api-support" with a specific release tag or the exact commit SHA)
so validate, test, and e2e always use pinned workflow code.

🧹 Nitpick comments (1)

functions/cluster/000-state-init.yaml.gotmpl (1)
63-67: Add defensive defaulting for spec.storage to match the pattern used elsewhere in the codebase.

While spec.storage is schema-required for PSQLCluster, defensive defaulting aligns with the pattern already established in the PSQLBranch template (functions/branch/000-state-init.yaml.gotmpl line 65) and provides resilience against edge cases.
Suggested fix
-{{- $storageSpec := $spec.storage }}
+{{- $storageSpec := $spec.storage | default dict }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@functions/cluster/000-state-init.yaml.gotmpl` around lines 63 - 67, The
template accesses $spec.storage directly which can fail in edge cases; add
defensive defaulting before using it (mirror the approach in the PSQLBranch
template) by setting $storageSpec to a default empty dict when $spec.storage is
missing, then build $storage from $storageSpec (refer to the symbols
$spec.storage, $storageSpec and $storage in the diff and the PSQLBranch template
for the exact pattern).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@functions/cluster/100-external-secret.yaml.gotmpl`:
- Around line 64-91: The ExternalSecret resource names/annotations in the
psqlcluster.externalSecret template are using "external-secret-app" and
"external-secret-superuser" which don't match the status lookup expecting
"external-secret"; update the ResourceName and ResourceNameAnnotation entries
passed into template "psqlcluster.externalSecret" for both the app and superuser
blocks to use the singular "external-secret" (e.g., ResourceName: printf
"%s-external-secret" and ResourceNameAnnotation: setResourceNameAnnotation
"external-secret") so the status readiness check can find the synced resources;
alternatively, if you prefer multiple distinct names, update the status lookup
to match these specific keys (ensure changes are made for both app and superuser
usages).

In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 24-28: The current use of merge "$state.labels (dict)" aliases
$state.labels as the destination so later set calls on $clusterLabels mutate
$state.labels; change the merge to create a fresh map as the destination (e.g.,
use merge (dict) $state.labels) so $clusterLabels is a copy and subsequent set
calls (the $_ := set $clusterLabels "...") do not mutate $state.labels and
therefore prevent branching labels from leaking elsewhere.

In `@Makefile`:
- Around line 113-117: The validate:% recipe uses undefined shell variables
$$definition, $$composition, and $$api_dir; initialize them at the start of the
recipe using the pattern stem (same way example is set) so the render/validate
commands get real paths. For example, set shell vars like
definition="examples/psqlstacks/$*.yaml", composition="compositions/$*.yaml" and
api_dir="apis" (or your repo's actual api directory) before calling up
composition render in the validate:% target so $$definition, $$composition, and
$$api_dir are non-empty when used.

In `@tests/e2etest-psql/main.k`:
- Around line 188-190: Remove the stale assignment to app.managedBy (it no
longer exists on PSQLCluster.spec.app) — delete the line setting app.managedBy
and do not add any replacement; to skip ExternalSecret/BYO mode simply omit the
app.externalSecret block entirely (the schema defaults to skipping it). Update
any nearby comments to reflect that omission and ensure no other code references
app.managedBy.

In `@tests/test-stack/main.k`:
- Around line 14-16: Update the docstring comments that still mention the old
snapshot-class driver "ebs.csi.aws.com" to the current driver used by this stack
(replace all occurrences of "ebs.csi.aws.com" in the comments near the
VolumeSnapshotClass description), keeping the rest of the wording (e.g.,
"VolumeSnapshotClass", default name "psql") intact; also apply the same
replacement to the other comment occurrence referenced in the file so both
comment blocks reflect the current snapshot driver.

---

Duplicate comments:
In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references reusable workflows by
branch refs (validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, e2e.yaml@feat/multi-api-support); update each
"uses:
unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support"
to point to an immutable tag or commit SHA (e.g., replace
"@feat/multi-api-support" with a specific release tag or the exact commit SHA)
so validate, test, and e2e always use pinned workflow code.

---

Nitpick comments:
In `@functions/cluster/000-state-init.yaml.gotmpl`:
- Around line 63-67: The template accesses $spec.storage directly which can fail
in edge cases; add defensive defaulting before using it (mirror the approach in
the PSQLBranch template) by setting $storageSpec to a default empty dict when
$spec.storage is missing, then build $storage from $storageSpec (refer to the
symbols $spec.storage, $storageSpec and $storage in the diff and the PSQLBranch
template for the exact pattern).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf8788e5-52c9-443c-a955-4bcce34be39e

📥 Commits

Reviewing files that changed from the base of the PR and between cdb8d5d and 5d8a092.

📒 Files selected for processing (17)

.github/workflows/on-pr.yaml
.github/workflows/on-push-main.yaml
Makefile
apis/psqlclusters/definition.yaml
apis/psqlstacks/definition.yaml
functions/cluster/000-state-init.yaml.gotmpl
functions/cluster/100-external-secret.yaml.gotmpl
functions/cluster/200-cnpg-cluster.yaml.gotmpl
functions/cluster/999-status.yaml.gotmpl
functions/stack/000-state-init.yaml.gotmpl
functions/stack/170-volumesnapshotclass.yaml.gotmpl
functions/stack/200-cnpg-operator.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
tests/test-cluster/main.k
tests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
tests/test-stack/main.k

✅ Files skipped from review due to trivial changes (4)

functions/stack/170-volumesnapshotclass.yaml.gotmpl
tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
functions/cluster/999-status.yaml.gotmpl
functions/stack/000-state-init.yaml.gotmpl

🚧 Files skipped from review as they are similar to previous changes (2)

functions/stack/200-cnpg-operator.yaml.gotmpl
.github/workflows/on-pr.yaml

Adds a third credential mode: omit `app.externalSecret` AND `app.secretName` and the composition omits `bootstrap.initdb.secret` from the CNPG Cluster CR so CNPG auto-generates and owns the basic-auth Secret at `<cluster-name>-app`. The previous shape always set `bootstrap.initdb.secret.name = <cluster>-app`, which forced CNPG to read a Secret that — in the no-ESO/no-BYO case — never existed, blocking bootstrap. The unified e2e hit this exact case (kind harness, no ESO ClusterSecretStore) and a stale `app.managedBy = ""` field was masking the real failure mode. Three modes now documented on the XRD: 1. Omit `app` → CNPG auto-generates `<cluster-name>-app`. 2. Set `app.externalSecret` → ESO writes the Secret CNPG reads. 3. Set `app.secretName` (no externalSecret) → BYO; pre-create the Secret. Backwards compatible: existing manifests with externalSecret or explicit secretName render identically. CNPG will adopt a pre-existing `<cluster-name>-app` if one is present. Tests: - test-cluster: minimal asserts the secret line is omitted; external-secret asserts secret.name is wired; new BYO test covers explicit secretName. - e2etest-psql: drops the bogus `app.managedBy` field that broke KCL parse; relies on the new auto-gen path. Implements [[tasks/merge-psql-client-apis-into-stack]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 159-170: The XRD declares secretRef.path contains a JSON blob with
username and password, but the ExternalSecret template
(functions/cluster/100-external-secret.yaml.gotmpl) currently reads only
.RemoteRef.key and renders a single password entry, causing mismatch with CNPG
template (functions/cluster/200-cnpg-cluster.yaml.gotmpl) which expects both
credentials; update 100-external-secret.yaml.gotmpl to consume the remote JSON
from secretRef.path/.RemoteRef (not just .RemoteRef.key) and map both username
and password into the rendered Secret (e.g., create two remote refs or a single
remoteRef with extractors for "username" and "password") so app.externalSecret
and superuser.externalSecret produce the full Secret shape that
200-cnpg-cluster.yaml.gotmpl requires.

In `@tests/e2etest-psql/main.k`:
- Around line 56-71: The retained snapshot dependency resources use fixed names
which can be reused across runs; make the Configuration metadata.name
("hops-ops-volume-snapshot-stack" in initResources) and the corresponding
VolumeSnapshotStack XR name unique per test run (e.g., append a
testRunId/timestamp/UUID) so each test installs and references a distinct
package instance, and update any other references (the VolumeSnapshotStack XR
created later around lines 125-139) to use the same generated unique name so
retention still works against the exact resource created for that run.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 075163c9-3527-4d38-9ce4-9de815008e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8a092 and b789ac7.

📒 Files selected for processing (5)

apis/psqlclusters/definition.yaml
functions/cluster/000-state-init.yaml.gotmpl
functions/cluster/200-cnpg-cluster.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-cluster/main.k

coderabbitai · 2026-04-30T17:49:37Z

+                      secretRef:
+                        description: Where in the secrets backend the credentials live. The remote value must be a JSON blob with `username` and `password` properties — both are extracted into the resulting K8s Secret.
+                        type: object
+                        properties:
+                          path:
+                            description: Path/key in the secrets backend (e.g. AWS Secrets Manager `my-cluster/app`).
+                            type: string
+                        required:
+                        - path
+                    required:
+                    - secretStore
+                    - secretRef


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align the ExternalSecret schema with the rendered secret contract.

The XRD advertises secretRef.path holding a remote value with both username and password, but functions/cluster/100-external-secret.yaml.gotmpl currently consumes .RemoteRef.key and only renders a password entry. In the current shape, the advertised app.externalSecret / superuser.externalSecret mode cannot produce the Secret shape that functions/cluster/200-cnpg-cluster.yaml.gotmpl hands to CNPG.

Also applies to: 205-214

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apis/psqlclusters/definition.yaml` around lines 159 - 170, The XRD declares secretRef.path contains a JSON blob with username and password, but the ExternalSecret template (functions/cluster/100-external-secret.yaml.gotmpl) currently reads only .RemoteRef.key and renders a single password entry, causing mismatch with CNPG template (functions/cluster/200-cnpg-cluster.yaml.gotmpl) which expects both credentials; update 100-external-secret.yaml.gotmpl to consume the remote JSON from secretRef.path/.RemoteRef (not just .RemoteRef.key) and map both username and password into the rendered Secret (e.g., create two remote refs or a single remoteRef with extractors for "username" and "password") so app.externalSecret and superuser.externalSecret produce the full Secret shape that 200-cnpg-cluster.yaml.gotmpl requires.

PSQLStack now composes a `psql` StorageClass alongside its existing `psql` VolumeSnapshotClass — they share the same CSI driver (`ebs.csi.eks.amazonaws.com` by default) since snapshots only work when the snapshotter driver matches the source PVC's provisioner. PSQLCluster + PSQLBranch default `spec.storage.class` to "psql", so consumer manifests stop leaking driver-specific knowledge. Default StorageClass shape: gp3 + WaitForFirstConsumer (correct for zonal CSI drivers — late-binds the PVC to a node so EBS volumes land in the same AZ as the consuming pod) + allowVolumeExpansion=true (CNPG resizes via the same field on its Cluster CR). E2E pivots from kind to an ephemeral EKS Auto Mode cluster (mirror of aws-observe-stack): provisions an AutoEKSCluster per run, installs volume-snapshot-stack on it, then runs PSQLStack/Cluster/Branch against the real `ebs.csi.eks.amazonaws.com` driver — same code path that runs on pat-local. Kind has no snapshot-capable CSI driver natively, so the prior kind-only e2e couldn't exercise PSQLBranch's snapshot/fork chain. Verified end-to-end on pat-local: - PSQLStack composed `psql` SC + `psql` VSC (both with ebs.csi.eks.amazonaws.com) - PSQLCluster PVC bound on `psql` SC, CNPG primary running - PSQLBranch VolumeSnapshot reached readyToUse=true, restored PVC bound on `psql` SC, CNPG fork primary running Breaking: PSQLStack adds a new composed StorageClass by default. Sites that already have a `psql` SC will conflict — set `spec.storageClass.enabled: false` to opt out. Requires three GitHub Actions vars on the repo (synced via the new `hops vars sync github`): ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

.github/workflows/on-pr.yaml (1)

30-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin reusable workflow refs to immutable versions.

These four uses entries still target @feat/multi-api-support (mutable branch), which makes PR CI behavior drift over time. Please pin to an immutable tag or commit SHA for deterministic runs.

Also applies to: 43-43, 46-46, 93-93
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/on-pr.yaml at line 30, The workflow is referencing a
mutable branch ref ("uses:
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support")
which makes CI non-deterministic; replace each mutable branch ref (all
occurrences of the string
"unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support"
and the other similar `uses:` entries flagged) with an immutable ref (a specific
tag or commit SHA), updating each `uses:` line so it points to the chosen tag or
SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic
workflow runs.

tests/e2etest-psql/main.k (1)

82-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the retained snapshot dependency resources unique per run.

These resources are intentionally retained, but both the package install and the VolumeSnapshotStack XR still use fixed names. A rerun against the same control plane can reuse stale snapshot-controller state instead of exercising a fresh dependency install for this run.

Suggested change

 _now = str(int(math.floor(datetime.ticks())))
 _test_name = "e2e-psql-" + _now
 _cluster_name = "e2e-psql-cluster-" + _now
 _branch_name = "e2e-psql-branch-" + _now
+_snapshot_package_name = "hops-ops-volume-snapshot-stack-" + _now
+_snapshot_stack_name = "snapshot-" + _now
 _namespace = "default"
@@
                 {
                     apiVersion = "pkg.crossplane.io/v1"
                     kind = "Configuration"
-                    metadata.name = "hops-ops-volume-snapshot-stack"
+                    metadata.name = _snapshot_package_name
                     spec.package = "ghcr.io/hops-ops/volume-snapshot-stack:v0.1.0"
                 }
@@
                 {
                     apiVersion = "hops.ops.com.ai/v1alpha1"
                     kind = "VolumeSnapshotStack"
                     metadata = {
-                        name = "snapshot"
+                        name = _snapshot_stack_name
                         namespace = _namespace
                     }

Also applies to: 250-255

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2etest-psql/main.k` around lines 82 - 90, The retained
VolumeSnapshotStack and its package install use fixed names ("metadata.name" =
"hops-ops-volume-snapshot-stack") so reruns can pick up stale state; make the
retained dependency resources unique per run by appending a run-specific suffix
(e.g., timestamp, CI_RUN_ID, or random ID) to the Configuration.metadata.name
and to the corresponding VolumeSnapshotStack XR name used elsewhere (the two
occurrences flagged around the current diff and at the other location), and
ensure any references/matchLabels that point to that name are updated to the
same generated suffix so the installer and the XR remain consistent per run.

🧹 Nitpick comments (1)

functions/stack/180-storageclass.yaml.gotmpl (1)
40-45: 💤 Low value

Parameter map keys are unquoted — may break for non-trivial key names.

{{ $k }} emits the key bare. If a StorageClass parameter key ever contains a colon, hash, or other YAML-special character, the rendered output will be invalid YAML. Quoting the key is zero-cost insurance.
♻️ Proposed fix
       {{- range $k, $v := $sc.parameters }}
-        {{ $k }}: {{ $v | quote }}
+        {{ $k | quote }}: {{ $v | quote }}
       {{- end }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@functions/stack/180-storageclass.yaml.gotmpl` around lines 40 - 45, The
StorageClass template emits parameter map keys unquoted (using {{ $k }}) which
can produce invalid YAML for keys with special characters; update the parameters
loop in the template that references $sc.parameters so each key is quoted (e.g.,
replace occurrences of {{ $k }} with a quoted form like {{ $k | quote }}) while
keeping values quoted as before, ensuring the keys are safely rendered for all
YAML-special characters.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 186-214: The superuser.externalSecret block is missing the
descriptive text present in app.externalSecret; update the
superuser.externalSecret schema to mirror app.externalSecret by documenting
secretStore.kind/name/namespace semantics (including default: ClusterSecretStore
and allowed enum values ClusterSecretStore/SecretStore and that namespace is
only required for SecretStore), and add a clear description for secretRef.path
stating it must point to a JSON blob containing "username" and "password" (same
contract as app.externalSecret) so kubectl explain/generated docs show
consistent expectations for superuser.externalSecret.

In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The OpenAPI schema for ha.replicas currently allows zero;
update the replicas property in the definition.yaml (property name: replicas
under ha) to add a minimum: 1 constraint so users cannot set replicas: 0; ensure
the type remains integer and keep the default: 3, optionally updating the
property's description to note the minimum of 1 for clarity.

---

Duplicate comments:
In @.github/workflows/on-pr.yaml:
- Line 30: The workflow is referencing a mutable branch ref ("uses:
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support")
which makes CI non-deterministic; replace each mutable branch ref (all
occurrences of the string
"unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support"
and the other similar `uses:` entries flagged) with an immutable ref (a specific
tag or commit SHA), updating each `uses:` line so it points to the chosen tag or
SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic
workflow runs.

In `@tests/e2etest-psql/main.k`:
- Around line 82-90: The retained VolumeSnapshotStack and its package install
use fixed names ("metadata.name" = "hops-ops-volume-snapshot-stack") so reruns
can pick up stale state; make the retained dependency resources unique per run
by appending a run-specific suffix (e.g., timestamp, CI_RUN_ID, or random ID) to
the Configuration.metadata.name and to the corresponding VolumeSnapshotStack XR
name used elsewhere (the two occurrences flagged around the current diff and at
the other location), and ensure any references/matchLabels that point to that
name are updated to the same generated suffix so the installer and the XR remain
consistent per run.

---

Nitpick comments:
In `@functions/stack/180-storageclass.yaml.gotmpl`:
- Around line 40-45: The StorageClass template emits parameter map keys unquoted
(using {{ $k }}) which can produce invalid YAML for keys with special
characters; update the parameters loop in the template that references
$sc.parameters so each key is quoted (e.g., replace occurrences of {{ $k }} with
a quoted form like {{ $k | quote }}) while keeping values quoted as before,
ensuring the keys are safely rendered for all YAML-special characters.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 85ea3da6-87ef-41c5-b108-ccb63db2d812

📥 Commits

Reviewing files that changed from the base of the PR and between b789ac7 and 990ddfd.

📒 Files selected for processing (13)

.github/workflows/on-pr.yaml
.gitignore
README.md
apis/psqlbranches/definition.yaml
apis/psqlclusters/definition.yaml
apis/psqlstacks/definition.yaml
functions/branch/000-state-init.yaml.gotmpl
functions/cluster/000-state-init.yaml.gotmpl
functions/stack/000-state-init.yaml.gotmpl
functions/stack/010-state-status.yaml.gotmpl
functions/stack/180-storageclass.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-stack/main.k

✅ Files skipped from review due to trivial changes (3)

.gitignore
functions/branch/000-state-init.yaml.gotmpl
README.md

🚧 Files skipped from review as they are similar to previous changes (4)

functions/stack/010-state-status.yaml.gotmpl
apis/psqlbranches/definition.yaml
tests/test-stack/main.k
functions/stack/000-state-init.yaml.gotmpl

coderabbitai · 2026-05-07T22:35:55Z

+                  replicas:
+                    type: integer
+                    default: 3


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ha.replicas lacks a minimum constraint — zero replicas is accepted.

Without minimum: 1, a user can set replicas: 0, which would scale every HA-able operator Deployment to zero and silently break the stack.

🛡️ Proposed fix

replicas: type: integer + minimum: 1 default: 3

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The OpenAPI schema for ha.replicas currently allows zero; update the replicas property in the definition.yaml (property name: replicas under ha) to add a minimum: 1 constraint so users cannot set replicas: 0; ensure the type remains integer and keep the default: 3, optionally updating the property's description to note the minimum of 1 for clarity.

The full chain converged on the previous run except for PSQLStack's scaleToZeroPlugin objects (s2z-cert-client, s2z-cert-server, s2z-issuer) — the S2Z plugin uses cert-manager's Issuer + Certificate for its gRPC mTLS pair, and the ephemeral EKS cluster has no cert-manager. pat-local already has cert-manager from earlier setup, which masked this in local validation. cert-stack v0.1.0 mirrors volume-snapshot-stack's shape (single clusterName field, composes a Helm release on the target). Adding it as initResource + extraResource closes the gap.

github-actions · 2026-05-08T16:14:06Z

Published Crossplane Package

The following Crossplane package was published as part of this PR:

Package: ghcr.io/hops-ops/psql-stack:pr-10-55b4a02981f9e74b194fb9e818bc6df76089e407

View Package

patrickleet · 2026-05-08T17:38:28Z

@coderabbitai review

coderabbitai · 2026-05-08T17:38:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

apis/psqlstacks/definition.yaml (1)

90-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ha.replicas still lacks a minimum: 1 constraint.

A user can set replicas: 0, silently scaling all HA-able platform Deployments to zero.

🛡️ Proposed fix

                  replicas:
                    type: integer
+                   minimum: 1
                    default: 3

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The schema for the
replicas integer (ha.replicas) allows zero; update the replicas property in
apis/psqlstacks/definition.yaml to add a minimum: 1 constraint so users cannot
set replicas: 0; locate the replicas definition (type: integer, default: 3) and
add the minimum validator for that property to enforce at least one replica.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 195-204: The parameters object currently combines
additionalProperties: type: string with x-kubernetes-preserve-unknown-fields:
true which conflicts; decide which behavior you want and make both parameter
blocks consistent: if parameter map values must be plain strings (e.g. passed
verbatim to the CSI driver) remove x-kubernetes-preserve-unknown-fields and keep
additionalProperties: type: string; if parameters may contain nested/arbitrary
structures remove additionalProperties and keep
x-kubernetes-preserve-unknown-fields; apply the same change to
snapshotClass.parameters as well so both parameter definitions use the same,
non-conflicting approach.

---

Duplicate comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The schema for the replicas integer (ha.replicas) allows
zero; update the replicas property in apis/psqlstacks/definition.yaml to add a
minimum: 1 constraint so users cannot set replicas: 0; locate the replicas
definition (type: integer, default: 3) and add the minimum validator for that
property to enforce at least one replica.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c458ce55-33a3-418a-b8e5-6f7008816953

📥 Commits

Reviewing files that changed from the base of the PR and between b789ac7 and 877c9c1.

📒 Files selected for processing (13)

.github/workflows/on-pr.yaml
.gitignore
README.md
apis/psqlbranches/definition.yaml
apis/psqlclusters/definition.yaml
apis/psqlstacks/definition.yaml
functions/branch/000-state-init.yaml.gotmpl
functions/cluster/000-state-init.yaml.gotmpl
functions/stack/000-state-init.yaml.gotmpl
functions/stack/010-state-status.yaml.gotmpl
functions/stack/180-storageclass.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-stack/main.k

✅ Files skipped from review due to trivial changes (4)

functions/stack/010-state-status.yaml.gotmpl
.gitignore
functions/stack/180-storageclass.yaml.gotmpl
README.md

🚧 Files skipped from review as they are similar to previous changes (8)

functions/stack/000-state-init.yaml.gotmpl
functions/branch/000-state-init.yaml.gotmpl
tests/e2etest-psql/main.k
functions/cluster/000-state-init.yaml.gotmpl
apis/psqlbranches/definition.yaml
apis/psqlclusters/definition.yaml
tests/test-stack/main.k
.github/workflows/on-pr.yaml

coderabbitai · 2026-05-08T17:42:47Z

+                  parameters:
+                    description: |
+                      Provisioner-specific parameters. Defaults to `{type: gp3}` for
+                      the EBS provisioner. When overriding `provisioner`, set this
+                      to whatever the new provisioner expects (e.g. `{}` for
+                      hostpath.csi.k8s.io, driver-specific keys for Longhorn).
+                    type: object
+                    additionalProperties:
+                      type: string
+                    x-kubernetes-preserve-unknown-fields: true


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

parameters combines additionalProperties with x-kubernetes-preserve-unknown-fields: true — the constraints are contradictory.

additionalProperties: type: string tells Kubernetes to validate and prune map values that aren't strings, while x-kubernetes-preserve-unknown-fields: true instructs the API server to skip structural pruning entirely. The two directives conflict: in practice the pruning bypass wins, silently accepting non-string values that the schema claims to reject.

The same pattern repeats for snapshotClass.parameters (Lines 237–239).

Pick one approach:

If values must be strings (e.g. they are passed verbatim to the CSI driver), keep additionalProperties and drop x-kubernetes-preserve-unknown-fields.

If nested/arbitrary structures are needed, drop additionalProperties and keep x-kubernetes-preserve-unknown-fields.

♻️ Proposed fix (string-values-only)

parameters: type: object additionalProperties: type: string - x-kubernetes-preserve-unknown-fields: true

Apply the same change to snapshotClass.parameters at Lines 237–239.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

parameters:

description: |

Provisioner-specific parameters. Defaults to `{type: gp3}` for

the EBS provisioner. When overriding `provisioner`, set this

to whatever the new provisioner expects (e.g. `{}` for

hostpath.csi.k8s.io, driver-specific keys for Longhorn).

type: object

additionalProperties:

type: string

x-kubernetes-preserve-unknown-fields: true

parameters:

description: |

Provisioner-specific parameters. Defaults to `{type: gp3}` for

the EBS provisioner. When overriding `provisioner`, set this

to whatever the new provisioner expects (e.g. `{}` for

hostpath.csi.k8s.io, driver-specific keys for Longhorn).

type: object

additionalProperties:

type: string

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apis/psqlstacks/definition.yaml` around lines 195 - 204, The parameters object currently combines additionalProperties: type: string with x-kubernetes-preserve-unknown-fields: true which conflicts; decide which behavior you want and make both parameter blocks consistent: if parameter map values must be plain strings (e.g. passed verbatim to the CSI driver) remove x-kubernetes-preserve-unknown-fields and keep additionalProperties: type: string; if parameters may contain nested/arbitrary structures remove additionalProperties and keep x-kubernetes-preserve-unknown-fields; apply the same change to snapshotClass.parameters as well so both parameter definitions use the same, non-conflicting approach.

Replaces the four mutable `@feat/multi-api-support` branch refs with the immutable `@v3.0.0` tag now that the multi-api work is shipped. Also retargets the org from `unbounded-tech` to `hops-ops` — the canonical home for hops platform CI workflows.

…licit set The previous schema and gotmpl both defaulted branch postgres version to "17", silently coercing branches off PG 15/16 sources to a major mismatch. Volume snapshot recovery is binary-compatible only within a major version (Postgres won't open a data dir from a different major), so a hardcoded default produced silent failures at restore time. Now: omit imageName entirely when version is unset, letting CNPG fall back to its operator-default image (close-enough when source tracks the same chart). Setting spec.postgresql.version is now an explicit pin, typically used to fix a minor (e.g. "17.4") to match the source's reported version. Adds a render test for the explicit-pin path; existing default-path test asserts imageName absence.

Existing state-init logic correctly uses hasKey to distinguish "explicitly false" from "absent" before reading $monSpec.enabled, but no test exercised the explicit-false path. Added regression test asserting that spec.monitoring.enabled=false propagates to the composed Cluster CR's monitoring.enablePodMonitor=false. Default-on path covered by Test 1.

…tent In cross-namespace branching, the branch-ns VolumeSnapshot can only bind once it learns the source's VolumeSnapshotContent name — that name is propagated through $state.observed.snapshotContent into 110-branch-snapshot's render. The previous logic preferred branch-snapshot whenever it was present in observed, even when its boundVolumeSnapshotContentName was still empty — which it always is on first reconcile, before the source content has propagated. Result: empty content overwrites the populated source content, the branch-ns snapshot renders with `volumeSnapshotContentName: ""`, and the chain stalls. Now: read both branch-ns and source-ns content via the existing pipeline, prefer branch when non-empty, fall back to source otherwise. Fixes the chicken-and-egg that prevented cross-namespace branches from binding on first-pass reconcile. Same-namespace branching is unaffected: source-snapshot is gated on crossNamespace and isn't composed there, so $sourceContent is empty and the branch-ns content (always populated post-bind) wins.

Adds two CompositionTests that exercise 010-state-status's snapshot- content fallback against inline observedResources: 1. branch-snapshot bound content empty + source-snapshot bound: branch-ns VolumeSnapshot must render with the source's content name (proves the chicken-and-egg fix from 18d3115). 2. branch-snapshot bound + source-snapshot bound: steady-state — branch's content wins (proves the fallback doesn't overwrite a populated branch content with source's). Without the fix, test 1 would render volumeSnapshotContentName: "" and fail. With it, source's content propagates through state.observed.snapshotContent into 110-branch-snapshot's render. Locks in the cross-namespace bind behavior — the previous test surface only exercised render shape, not state propagation across reconciles.

…espace The source-ns VolumeSnapshot lives in a namespace shared by multiple branches (the source PSQLCluster's). Naming it just `<branchName>-src` collides when two PSQLBranch XRs have the same metadata.name in different branch namespaces — both would create `preview-pr-1-src` in `team-app`, racing on the same object. Now: `<branchNS>-<branchName>-src`. Encodes the branch XR's own namespace into the name so (sourceNS, branchNS, branchName) is unique. K8s names are bound by RFC 1123 subdomain (253 chars) — concatenation is safe at any reasonable namespace/branch length. Branch-ns snapshot (`<branchName>-snap`) is unchanged: the branch namespace is already implicit by where the resource lives, and branch XR names are unique within a single namespace. Adds a regression test that asserts the source-ns name varies with branch namespace.

Previously the gotmpl coerced unset `branch.storage.size` to a hardcoded "10Gi" — silently mis-sizing branches off any source PVC larger than that. CNPG/EBS can't shrink during recovery, so a 10Gi branch off a 100Gi source fails when CNPG tries to bind the restored PVC. New shape: - PSQLBranch XRD adds optional `spec.source.storage.size` so consumers can mirror the source PSQLCluster's known capacity. The branch composition has no automatic visibility into the source's PVC, so the user declares it once on the branch spec. - Size precedence in the cnpg-cluster render: branch.storage.size (explicit override, e.g. growing the branch) → source.storage.size (inherit from the source) → omitted (CNPG's webhook rejects with a clear error) - Drops the silent 10Gi gotmpl fallback in 000-state-init. Examples + e2e branch updated to declare source.storage.size mirroring the source PSQLCluster's spec.storage.size. Local pat-local manifests already set branch.storage.size explicitly so unaffected. Symlinks the per-test KCL `model/` directory to `.up/kcl/models` so schema changes from `up project build` propagate to tests automatically. Previously the bundled models drifted from the XRD on every change and needed manual regeneration. `.up/` is gitignored — fresh clones run `make build` (or `up project build`) to populate the symlink target. Adds two regression tests: branch.size overrides source.size, and source.size used when branch.size is empty.

…ce names The state-status template looked up `external-secret` in the observed map, but 100-external-secret renders two distinct resources with suffixed names (`external-secret-app`, `external-secret-superuser`) when their respective config blocks are set. The lookup never matched, so $state.observed.externalSecret.ready was always false. Now: aggregate over both names. Ready=true when every present ES Object reports Ready=true, and Ready=true when neither is present (BYO / CNPG-managed-secret paths — nothing to wait on). Note: the field is currently unused (999-status doesn't surface it, no composition gates on it). Fixing now to match the documented intent so future consumers don't inherit the bug.

The validate:% recipe was using $$definition, $$composition, $$api_dir shell variables that were never initialized — `make validate:minimal` silently fed empty strings to `up composition render` and `crossplane beta validate`, so the recipe never produced a real validation result. Aligns validate:% with render:%: both now use the top-level Makefile variables ($(DEFINITION), $(COMPOSITION), $(XRD_DIR)) which point at apis/psqlstacks. Single-target shorthand stays psqlstacks-only as documented in README. The :all targets keep their derive-per-example shell logic for the multi-API case.

Existing test 7 only asserted the ExternalSecret existed with the right secretStoreRef and target.name — it didn't lock in the data[] mappings or the basic-auth target template. The current gotmpl correctly maps both username and password as separate remoteRef entries (with `property: username` / `property: password`) extracted from the same JSON blob at secretRef.path, then synthesizes a kubernetes.io/basic-auth Secret. Asserting the full shape so a regression that drops one key fails the test instead of silently shipping a half-populated Secret.

ESO 0.10+ moved to v1 GA, and recent ESO charts (the version cert-stack/aws-secret-stack already use) drop the v1beta1 served-version: applies fail with `no matches for kind "ExternalSecret" in version "external-secrets.io/v1beta1"`. Schema shape (data[].remoteRef.{key,property} + target.template) is identical between v1beta1 and v1, so this is a pure apiVersion bump. Aligns with the other stacks that already use v1 (auth, gitops, cloudflare/dns, aws/secret).

The superuser.externalSecret schema had the same shape as app.externalSecret but no field descriptions, so kubectl explain and generated docs gave inconsistent guidance for what's the same contract. Adds the missing descriptions (secretStore.{kind,name,namespace} and secretRef.{path}) verbatim from app.externalSecret. Pure documentation change — no behavior, default, or required-fields shift.

patrickleet · 2026-05-08T21:22:54Z

@coderabbitai review

coderabbitai · 2026-05-08T21:23:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

Makefile (2)
13-14: 💤 Low value

api-dir macro is unused — bulk targets reimplement it inline.

The api-dir helper at line 14 is never invoked; both render:all and validate:all inline the same awk -F/ '{print "apis/" $2}' derivation. Either invoke $(call api-dir,$$example) from the recipes (note: shell-vs-Make expansion needs care here, since $$example is a shell var) or drop the macro and the comment block at lines 12-14 to avoid confusion.

Also applies to: 40-40, 73-73
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 13 - 14, The Makefile defines an unused helper macro
api-dir which duplicates logic also written inline in the render:all and
validate:all recipes; either replace those inline awk derivations with a proper
call to the macro using $(call api-dir,$$example) (taking care to pass the shell
variable example as $$example so Make expands correctly) in the render:all and
validate:all recipe bodies, or remove the api-dir macro and its comment block
entirely to avoid confusion; update all other occurrences flagged (lines near 40
and 73) to use the same approach so the helper is consistently used or
consistently removed.
109-117: ⚡ Quick win

Single-example aliases silently route every stem to apis/psqlstacks.

render:% and validate:% still hardcode examples/psqlstacks/$*.yaml and the legacy $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR) globals. Now that examples exist under psqlclusters/ and psqlbranches/, invoking e.g. make render:minimal or make validate:standard will render/validate the psqlstacks example against the psqlstacks XRD even when the user intended a multi-API one — and there's no error to indicate the mismatch.

Two reasonable options:

Make the stem encode the api plural (e.g. make render:psqlclusters/minimal) and derive api_dir from it the same way the bulk targets do.

Rename these targets to render:psqlstacks:% / validate:psqlstacks:% so the psqlstacks-only scope is explicit.
Sketch for option 1
 render\:%:
-	`@example`="examples/psqlstacks/$*.yaml"; \
-	up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example
+	`@example`="examples/$*.yaml"; \
+	api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \
+	up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example

 validate\:%:
-	`@example`="examples/psqlstacks/$*.yaml"; \
-	up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example \
-		--include-full-xr --quiet | \
-		crossplane beta validate $(XRD_DIR) --error-on-missing-schemas -
+	`@example`="examples/$*.yaml"; \
+	api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \
+	up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example \
+		--include-full-xr --quiet | \
+		crossplane beta validate $$api_dir --error-on-missing-schemas -
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 109 - 117, The render:% and validate:% phony targets
currently hardcode examples/psqlstacks/$*.yaml and global
$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to
psqlstacks; change them to derive the API plural from the stem (like the bulk
targets do) and build example and XRD/composition paths from that API: compute
api_dir from the stem (e.g. split $* on "/" to get the first field), set
example=examples/$(api_dir)/$*.yaml and set the corresponding
DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and
crossplane beta validate with those derived variables; alternatively, if you
prefer explicit scoping, rename the targets to render:psqlstacks:% and
validate:psqlstacks:% to make the psqlstacks-only behavior explicit.
.github/workflows/on-pr.yaml (1)
53-64: 💤 Low value

Move aws-account-id to a GitHub variable for consistency with sibling inputs.

ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, and PRIVATE_SUBNET_ID_B are sourced from vars.*, but aws-account-id is hardcoded. AWS account IDs aren't strictly secret, but treating them as configuration (a) keeps this workflow portable across environments, (b) avoids a repo-wide search/replace if the account ever changes, and (c) matches the convention already established in this same job.
Diff
       aws: true
       aws-use-oidc: true
-      aws-account-id: "034489662075"
+      aws-account-id: ${{ vars.AWS_ACCOUNT_ID }}
       aws-region: us-east-2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/on-pr.yaml around lines 53 - 64, The workflow hardcodes
aws-account-id while related values use vars.*, so change the aws-account-id
entry to read from a repo variable (e.g., replace aws-account-id: "034489662075"
with aws-account-id: ${{ vars.AWS_ACCOUNT_ID }}), keeping the rest of the job
(env-vars with ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B)
unchanged so the job consistently sources configuration from GitHub variables.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test-branch/main.k`:
- Around line 429-456: The test metav1alpha1.CompositionTest named
"scaletozero-disabled-strips-plugin" currently only asserts name/namespace;
update its assertResources for the entry with metadata.name
"br-no-s2z-cnpg-cluster" to explicitly assert that
spec.forProvider.manifest.metadata.annotations does NOT contain the
"cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} /
absent) and that spec.forProvider.manifest.spec.plugins does NOT include the
scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the
assertResources object to add these negative assertions so the test fails if the
annotation or plugin are still emitted.

---

Nitpick comments:
In @.github/workflows/on-pr.yaml:
- Around line 53-64: The workflow hardcodes aws-account-id while related values
use vars.*, so change the aws-account-id entry to read from a repo variable
(e.g., replace aws-account-id: "034489662075" with aws-account-id: ${{
vars.AWS_ACCOUNT_ID }}), keeping the rest of the job (env-vars with
ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B) unchanged so the job
consistently sources configuration from GitHub variables.

In `@Makefile`:
- Around line 13-14: The Makefile defines an unused helper macro api-dir which
duplicates logic also written inline in the render:all and validate:all recipes;
either replace those inline awk derivations with a proper call to the macro
using $(call api-dir,$$example) (taking care to pass the shell variable example
as $$example so Make expands correctly) in the render:all and validate:all
recipe bodies, or remove the api-dir macro and its comment block entirely to
avoid confusion; update all other occurrences flagged (lines near 40 and 73) to
use the same approach so the helper is consistently used or consistently
removed.
- Around line 109-117: The render:% and validate:% phony targets currently
hardcode examples/psqlstacks/$*.yaml and global
$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to
psqlstacks; change them to derive the API plural from the stem (like the bulk
targets do) and build example and XRD/composition paths from that API: compute
api_dir from the stem (e.g. split $* on "/" to get the first field), set
example=examples/$(api_dir)/$*.yaml and set the corresponding
DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and
crossplane beta validate with those derived variables; alternatively, if you
prefer explicit scoping, rename the targets to render:psqlstacks:% and
validate:psqlstacks:% to make the psqlstacks-only behavior explicit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79f332a7-c175-4e49-b64c-1c1e2aae7d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 877c9c1 and 6b8e83c.

📒 Files selected for processing (16)

.github/workflows/on-pr.yaml
Makefile
apis/psqlbranches/definition.yaml
apis/psqlclusters/definition.yaml
examples/psqlbranches/same-namespace.yaml
functions/branch/000-state-init.yaml.gotmpl
functions/branch/010-state-status.yaml.gotmpl
functions/branch/100-source-snapshot.yaml.gotmpl
functions/branch/200-cnpg-cluster.yaml.gotmpl
functions/cluster/010-state-status.yaml.gotmpl
functions/cluster/100-external-secret.yaml.gotmpl
tests/e2etest-psql/main.k
tests/test-branch/main.k
tests/test-branch/model
tests/test-cluster/main.k
tests/test-cluster/model

✅ Files skipped from review due to trivial changes (2)

tests/test-cluster/model
examples/psqlbranches/same-namespace.yaml

🚧 Files skipped from review as they are similar to previous changes (8)

functions/cluster/010-state-status.yaml.gotmpl
functions/cluster/100-external-secret.yaml.gotmpl
functions/branch/010-state-status.yaml.gotmpl
functions/branch/100-source-snapshot.yaml.gotmpl
tests/test-cluster/main.k
tests/e2etest-psql/main.k
functions/branch/200-cnpg-cluster.yaml.gotmpl
apis/psqlclusters/definition.yaml

coderabbitai · 2026-05-08T21:27:32Z

+    metav1alpha1.CompositionTest {
+        metadata.name = "scaletozero-disabled-strips-plugin"
+        spec = {
+            compositionPath = "apis/psqlbranches/composition.yaml"
+            xrdPath = "apis/psqlbranches/definition.yaml"
+            timeoutSeconds = 60
+            validate = False
+            xr = stacksv1alpha1.PSQLBranch {
+                metadata = {name = "br-no-s2z", namespace = "default"}
+                spec = {
+                    clusterName = "my-cluster"
+                    source = {name = "src"}
+                    scaleToZero = {enabled = False}
+                }
+            }
+            assertResources = [
+                {
+                    apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
+                    kind = "Object"
+                    metadata.name = "br-no-s2z-cnpg-cluster"
+                    spec.forProvider.manifest.metadata = {
+                        name = "br-no-s2z"
+                        namespace = "default"
+                    }
+                }
+            ]
+        }
+    }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether the KCL CompositionTest harness in this repo supports # negative/absence assertions, and how existing tests express "key absent". fd -t f 'main.k' tests | head -20 rg -nP --type-add 'kcl:*.k' --type=kcl -C3 '\b(assertResources|assertNotPresent|absent|notExpected)\b'

Repository: hops-ops/psql-stack

Length of output: 17809

🏁 Script executed:

#!/bin/bash # Examine the scaletozero-disabled and ttl-enabled tests in test-branch/main.k sed -n '429,488p' tests/test-branch/main.k

Repository: hops-ops/psql-stack

Length of output: 2453

🏁 Script executed:

#!/bin/bash # Look for comments or patterns showing how assertions work, especially for annotations rg -n 'metadata\.annotations|spec\.plugins' tests/ --type-add 'kcl:*.k' --type=kcl -B2 -A2

Repository: hops-ops/psql-stack

Length of output: 1810

🏁 Script executed:

#!/bin/bash # Search for any test documentation or comments about assertion semantics rg -n 'assertResources|assert' tests/test-branch/main.k -B1 -A5 | head -100

Repository: hops-ops/psql-stack

Length of output: 4268

scaletozero-disabled-strips-plugin doesn't assert the absence of the scale-to-zero annotation and plugin.

The test name promises that disabling scaleToZero removes both the cnpg-i-scale-to-zero.xata.io/idle-timeout annotation and the plugin entry. However, assertResources only checks metadata.name and namespace. Compare this to the scaletozero-enabled test (lines 407–422), which explicitly asserts spec.forProvider.manifest.metadata.annotations and spec.forProvider.manifest.spec.plugins—or the ttl-enabled-adds-annotation test (lines 476–485), which asserts the full annotation set. If the composition regressed and kept emitting the annotation and plugin when scaleToZero.enabled = False, this test would pass silently.

Add explicit assertions for the absent fields:

Suggested tightening

assertResources = [ { apiVersion = "kubernetes.m.crossplane.io/v1alpha1" kind = "Object" metadata.name = "br-no-s2z-cnpg-cluster" spec.forProvider.manifest.metadata = { name = "br-no-s2z" namespace = "default" } + spec.forProvider.manifest.metadata.annotations = {} + spec.forProvider.manifest.spec.plugins = [] } ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

metav1alpha1.CompositionTest {

metadata.name = "scaletozero-disabled-strips-plugin"

spec = {

compositionPath = "apis/psqlbranches/composition.yaml"

xrdPath = "apis/psqlbranches/definition.yaml"

timeoutSeconds = 60

validate = False

xr = stacksv1alpha1.PSQLBranch {

metadata = {name = "br-no-s2z", namespace = "default"}

spec = {

clusterName = "my-cluster"

source = {name = "src"}

scaleToZero = {enabled = False}

}

}

assertResources = [

{

apiVersion = "kubernetes.m.crossplane.io/v1alpha1"

kind = "Object"

metadata.name = "br-no-s2z-cnpg-cluster"

spec.forProvider.manifest.metadata = {

name = "br-no-s2z"

namespace = "default"

}

}

]

}

}

metav1alpha1.CompositionTest {

metadata.name = "scaletozero-disabled-strips-plugin"

spec = {

compositionPath = "apis/psqlbranches/composition.yaml"

xrdPath = "apis/psqlbranches/definition.yaml"

timeoutSeconds = 60

validate = False

xr = stacksv1alpha1.PSQLBranch {

metadata = {name = "br-no-s2z", namespace = "default"}

spec = {

clusterName = "my-cluster"

source = {name = "src"}

scaleToZero = {enabled = False}

}

}

assertResources = [

{

apiVersion = "kubernetes.m.crossplane.io/v1alpha1"

kind = "Object"

metadata.name = "br-no-s2z-cnpg-cluster"

spec.forProvider.manifest.metadata = {

name = "br-no-s2z"

namespace = "default"

}

spec.forProvider.manifest.metadata.annotations = {}

spec.forProvider.manifest.spec.plugins = []

}

]

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test-branch/main.k` around lines 429 - 456, The test metav1alpha1.CompositionTest named "scaletozero-disabled-strips-plugin" currently only asserts name/namespace; update its assertResources for the entry with metadata.name "br-no-s2z-cnpg-cluster" to explicitly assert that spec.forProvider.manifest.metadata.annotations does NOT contain the "cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} / absent) and that spec.forProvider.manifest.spec.plugins does NOT include the scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the assertResources object to add these negative assertions so the test fails if the annotation or plugin are still emitted.

patrickleet and others added 21 commits April 22, 2026 21:44

docs: point snapshot-controller prereq at volume-snapshot-stack

b7a522f

The dependency now exists as a real stack (xrs/stacks/k8s/volume-snapshot/, ghcr.io/hops-ops/volume-snapshot-stack). Replace the manual upstream-YAML install instructions with a pointer at the stack.

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

patrickleet added 3 commits April 28, 2026 15:24

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

patrickleet added 3 commits April 29, 2026 12:20

patrickleet added 2 commits April 29, 2026 13:28

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread functions/cluster/100-external-secret.yaml.gotmpl

Comment thread functions/cluster/200-cnpg-cluster.yaml.gotmpl

Comment thread Makefile Outdated

Comment thread tests/e2etest-psql/main.k Outdated

Comment thread tests/test-stack/main.k Outdated

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

patrickleet added 12 commits May 8, 2026 13:16

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

patrickleet merged commit ced7961 into main May 8, 2026
11 checks passed

Conversation

patrickleet commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the pivot

Schema diff

New dependency

Test plan

Suggested merge style

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Published Crossplane Package

Uh oh!

patrickleet commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

patrickleet commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading