feat: pivot psql-stack to EKS Auto Mode defaults#10
Conversation
Add opt-in Karpenter NodePool composed resource. When spec.nodePool.enabled: true, renders a NodePool targeting arm64 spot on r7g.large/r7g.xlarge/m7g.large/m7g.xlarge (memory-optimized Graviton), tainted with psql=true:NoSchedule and labeled workload-type: psql. StackGres (operator/restapi/jobs) and Atlas get nodeSelector + tolerations injected into their Helm values when the NodePool is enabled. Usages pin both releases to be drained before the NodePool is deleted. Adds provider-kubernetes to upbound.yaml for the NodePool Object wrapper. Implements [[tasks/psql-stack-vela-simplyblock]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- storageClass (default on): creates a gp3 StorageClass backed by the EKS Auto Mode EBS CSI driver (ebs.csi.eks.amazonaws.com). The legacy gp2 in-tree provisioner does not work on EKS Auto Mode. - externalSecrets (opt-in): for each entry in externalSecrets.secrets[], composes a kubernetes.m.crossplane.io/Object wrapping an ESO ExternalSecret that syncs an AWS Secrets Manager value (published via hops secrets sync aws) into a Kubernetes Secret on the target cluster. Requires a ClusterSecretStore (e.g. from SecretStack); defaults to clusterSecretStoreName: hops-aws-secrets-manager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors observe stack where StorageClasses are named loki/prometheus/tempo per-component. Keeps the name specific to the stack so it doesn't collide with cluster-provided defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the generic externalSecrets.secrets[] passthrough with a Postgres-specific connections[] API. The user publishes just a password via hops secrets (single JSON key, default 'password'); the stack combines it with non-secret host/port/database/username/sslmode/namespace and emits a K8s Secret with a ready-to-use 'url' key plus discrete fields. Downstream consumers reference whichever key they need: AtlasSchema.devURLFrom → url SGCluster credentials.users.superuser.password → password applications → url (or discrete fields) Breaking change to the (locally-only) externalSecrets API; redeploy with the new shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without managing the SGCluster, every field in externalSecrets.connections[] (host/port/database/namespace) is passthrough — no abstraction. Users write ExternalSecret CRs directly (or as a Crossplane Object wrapper in local/) against the ClusterSecretStore provisioned by SecretStack. PSQLStack now = platform only: StackGres + Atlas operators + NodePool + StorageClass. If we later add an instances[] that composes SGClusters, ESO wiring can come back for free since the stack will then know host/port/database/namespace without the user restating them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites PSQLStack schema to remove stackgresOperator, add cnpg and scaleToZeroPlugin blocks. Default namespace: cnpg-system. CNPG 1.29 (chart 0.27.1) replaces StackGres 1.18 as the operator. Atlas operator renumbered 210 → 220 to make room for the scale-to-zero plugin install (added in a later phase). Storage (psql StorageClass on EBS gp3) and NodePool blocks preserved unchanged — phases 2 and 3 will rewrite them for the three-profile (mayastor / lvm / ebs) storage model and the branches/primary NodePool split with hugepages + nvme-tcp pre-configured. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single storageClass block with a storage block exposing three
independent profiles:
- storage.mayastor (replicated NVMe-oF via OpenEBS Mayastor) — enterprise
default for primary serving clusters; CoW + HA across N replicas. Default
enabled=false until phase 3 lands the NodePool with hugepages + nvme-tcp.
- storage.lvm (single-node CoW via OpenEBS LVM LocalPV) — branches and dev
clusters. Default enabled=false until phase 3 lands NodePool LVM volume
groups.
- storage.ebs (EBS gp3 via EKS Auto Mode CSI) — durable fallback, no CoW;
always-on default.
Render templates:
- 120-storageclass.yaml.gotmpl deleted (was the single 'psql' SC)
- 160-openebs-lvm.yaml.gotmpl: Helm release for OpenEBS LVM LocalPV
- 165-openebs-mayastor.yaml.gotmpl: Helm release for OpenEBS Mayastor
- 170-storageclass-mayastor.yaml.gotmpl: psql-mayastor SC + VolumeSnapshotClass
- 175-storageclass-lvm.yaml.gotmpl: psql-lvm SC + VolumeSnapshotClass
- 180-storageclass-ebs.yaml.gotmpl: psql-ebs SC (renamed from 'psql')
state-init defaults the three profile blocks; state-status observes the new
resource keys. Mayastor + LVM Helm releases + their StorageClasses are gated
on storage.{mayastor,lvm}.enabled — only EBS materializes by default until
phase 3 unblocks the others.
standard.yaml example patched to drop the removed storageClass and
stackgresOperator fields (full example rewrite is phase 5).
Implements [[tasks/psql-stack-cnpg]]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single Karpenter NodePool with two sub-pools targeting NVMe arm64 instance-store nodes (i4g.2xlarge / i4g.4xlarge / im4gn.2xlarge): - nodePool.branches: spot — for ephemeral PSQLBranch workloads. Spot is acceptable since branches are reproducible. - nodePool.primary: on-demand — for PSQLCluster primaries and operators (CNPG, Atlas, scale-to-zero). Spot preemption would lose a Mayastor replica, so on-demand is the right default for serving workloads. Each sub-pool has its own labels (sub-pool=branches | sub-pool=primary) and matching taints so workloads can target one specifically. Operators ride the primary sub-pool via nodeSelector + tolerations injected from state-init. Render templates: - 150-nodepool.yaml.gotmpl deleted - 140-nodepool-branches.yaml.gotmpl: spot sub-pool - 145-nodepool-primary.yaml.gotmpl: on-demand sub-pool + Usage protection for CNPG/Atlas Releases against premature NodePool deletion. state-init defaults the new sub-pool blocks; state-status observes both. nodePool.enabled stays default-false — existing claims unchanged. Out of scope for this commit: node-side prep for Mayastor + LVM (hugepages, nvme-tcp module, LVM volume group on instance-store NVMe). That's phase 3b (a separate concern that needs careful image / runtime choices) — without it, mayastor.enabled and lvm.enabled won't bind PVCs. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inlines the upstream cnpg-i-scale-to-zero v0.1.7 release manifest as 9 Crossplane Kubernetes Objects: - ServiceAccount cnpg-scale-to-zero-plugin - ClusterRole cnpg-scale-to-zero-sidecar-role + ClusterRoleBinding - Secret scale-to-zero-config (sidecar image reference, paired to plugin version via stringData — k8s base64-encodes on apply) - Self-signed cert-manager Issuer + 2 Certificates (server + client) for the gRPC TLS material the CNPG operator uses to reach the plugin - Service scale-to-zero (cnpg.io/pluginPort=9090, cnpg.io/pluginName annotations) - Deployment scale-to-zero (the plugin gRPC server) Plugin and sidecar images both pin to spec.scaleToZeroPlugin.version (default v0.1.7). Secret is renamed scale-to-zero-config (was scale-to-zero-config-c2c2544fbk in upstream — drops the kustomize hash suffix since we emit the resources as separate Objects). All resources are gated on spec.scaleToZeroPlugin.enabled (default true — the plugin is zero-cost when no PSQLCluster opts in). Source URL is annotated for renovate tracking: source: https://github.com/xataio/cnpg-i-scale-to-zero/releases/download/$VER/manifest.yaml renovate: datasource=github-releases depName=xataio/cnpg-i-scale-to-zero Prereq: cert-manager must be installed (provided by the dns-stack in hops-ops). Without it, the Issuer + Certificate resources won't reconcile and the plugin Deployment won't have its TLS volumes available. When PSQLClusters opt into scale-to-zero, they add: metadata.annotations: xata.io/scale-to-zero-enabled: "true" xata.io/scale-to-zero-inactivity-minutes: "10" spec.plugins: - name: cnpg-i-scale-to-zero.xata.io Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(phase 5)
- README rewritten: CNPG architecture, three-stage journey (EBS-only →
+LVM CoW → +Mayastor HA), full Spec Reference table, prereq notes
(cert-manager via cert-manager-stack; node prep deferred to phase 3b)
- examples refreshed:
- minimal: just clusterName, EBS-only baseline
- standard: full production posture (NodePool sub-pools, Mayastor +
LVM + EBS, S2Z plugin, Atlas)
- local: dev cluster with LVM CoW only (no Mayastor since replication
needs >1 node, no NodePool, default Helm provider config)
- tests/test-render/main.k: rewritten against the CNPG schema
- dropped stackgresOperator-specific tests (field removed)
- 11 tests covering: minimal renders platform operators; custom labels
propagate; cnpg.overrideAllValues replaces defaults; atlas values
merge; namespace propagation; per-component namespace override;
helmProviderConfigRef defaults; scaleToZeroPlugin can be disabled;
storage.{mayastor,lvm}.enabled compose Helm + StorageClass +
VolumeSnapshotClass; nodePool.enabled composes both sub-pools
All 11 tests pass; render + validate green on minimal + standard.
Implements [[tasks/psql-stack-cnpg]]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a privileged DaemonSet that runs on each NVMe NodePool node and configures the host-level state Mayastor + OpenEBS LVM LocalPV expect to find: - Hugepages (vm.nr_hugepages, default 1024 → 2GiB of 2MiB pages, required by Mayastor SPDK) - nvme-tcp kernel module loaded (Mayastor NVMe-oF transport) - LVM volume group on the first instance-store NVMe device (OpenEBS LVM LocalPV expects the VG to pre-exist) Auto-gated: composed only when nodePool.enabled AND (storage.mayastor.enabled OR storage.lvm.enabled). Inside the script, each step is conditional on the relevant storage backend. Schema additions: - spec.nodePrep.enabled (default true; auto-gated by storage backends) - spec.nodePrep.hugepages.count (default 1024) - spec.nodePrep.image (default alpine:3.20; apk-installs lvm2 + util-linux at startup) Render template 155-node-prep-daemonset.yaml.gotmpl: - DaemonSet with nodeSelector workload-type=psql + tolerations for both psql=true:NoSchedule and sub-pool=*:NoSchedule taints - hostPID + hostNetwork + privileged init container - Init script falls back gracefully on Bottlerocket / Auto Mode where modprobe inside containers is restricted (warns and continues) - Tiny pause container keeps the DS Ready state-init / state-status / 010-state-status updated for the new resource. Validate clean (22 resources on standard), 11/11 render tests still pass. Caveat: live verification on pat-local deferred until Mayastor or LVM is enabled in a PSQLStack claim there. Schema/composition is sound; e2e testing follows when storage backends are turned on. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-claim cost on Mayastor is controlled via replicationFactor (3 for
primaries, 1 for ephemeral branches). LVM was a redundant single-node
CoW backend; EBS was just wrapping the cluster's existing default SC
and contradicted the stack's CoW-by-default identity.
Schema changes:
- spec.storage flattened — no more {mayastor,lvm,ebs} profile selector.
Top-level fields now: chartVersion, namespace, storageClassName
(default "psql"), replicationFactor, thin, reclaimPolicy,
volumeBindingMode, allowVolumeExpansion, values, overrideAllValues.
- spec.nodePool.enabled defaults to TRUE now (was false). The stack's
whole point is dedicated NVMe nodes; the default should make it work.
- Mayastor + StorageClass + node-prep DaemonSet all gated on
nodePool.enabled. Disable nodePool to opt out (gets you only
CNPG + Atlas + S2Z plugin running on the cluster's default SC).
Render templates removed:
- 160-openebs-lvm.yaml.gotmpl
- 175-storageclass-lvm.yaml.gotmpl
- 180-storageclass-ebs.yaml.gotmpl
Render templates updated:
- 165-openebs-mayastor.yaml.gotmpl: gated on nodePool.enabled (was
storage.mayastor.enabled), uses flat $state.storage shape
- 170-storageclass-mayastor.yaml.gotmpl: same gating + shape
- 155-node-prep-daemonset.yaml.gotmpl: dropped LVM VG step, gated on
just nodePool.enabled
- 010-state-status.yaml.gotmpl: dropped LVM/EBS observed keys
PSQLClusters that don't want CoW can specify any other StorageClass
that exists on the target cluster (e.g., the EKS Auto Mode default
gp3 SC). The stack does NOT compose a non-CoW SC — that's outside
its identity.
Examples + README rewritten. KCL tests updated: 10/10 passing.
Validate clean: 18 resources on minimal, 18 on standard.
Live verification on pat-local pending — would need to delete the
existing claim (with ebs/lvm enabled) and reapply the simplified one.
Implements [[tasks/psql-stack-cnpg]]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iskPools
Three changes prompted by review:
1. Primary sub-pool default is now spot (was on-demand). Mayastor's
replicationFactor=3 absorbs preemption — losing one replica triggers
a rebuild on a fresh node, not data loss. Override to on-demand via
spec.nodePool.primary.requirements when needed.
2. Karpenter NodePool inner names lose the cluster prefix. Was
`<clusterName>-psql-{branches,primary}`, now `<XR.name>-{branches,primary}`
(e.g. `psql-branches`, `psql-primary`). Less repetition; uses the
stack's XR name for disambiguation when multiple PSQLStacks share a
cluster (which is rare). Wrapper Crossplane Object names unchanged.
3. node-prep DaemonSet now also registers the local NVMe instance-store
device with Mayastor by creating a per-node DiskPool CR. New ServiceAccount
+ ClusterRole/Binding granting get/create/list/watch on
diskpools.openebs.io. Init script: detects /dev/nvme[1-9]n1 by lsblk
model match, kubectl-applies a DiskPool named psql-pool-<NODE_NAME>
(idempotent — skip if already present). Closes the gap that left
Mayastor pools empty and PVCs stuck Pending.
This means no separate ObservedObjectCollection or custom function for
DiskPool registration — the same DS that handles host-level prereqs
(hugepages, nvme-tcp) also handles pool registration. One declarative
artifact per node-prep concern.
Also fixes a YAML colon issue in the primary sub-pool description.
Implements [[tasks/psql-stack-cnpg]]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds spec.ha block: a single toggle that enables production-style HA defaults across every HA-able platform component without users needing to know each chart's specific values keys. When spec.ha.enabled: true (default false): - CNPG operator: replicaCount=3 + topologySpreadConstraints by zone - Atlas operator: same - cnpg-i-scale-to-zero plugin Deployment (directly composed): same - OpenEBS Mayastor: agents.core / csi.controller / etcd replicaCount=3 Per-component values can still override via the existing values block — HA values land in chartDefaults, user values mergeOverwrite them. Schema: - spec.ha.enabled (bool, default false) - spec.ha.replicas (int, default 3) - spec.ha.topologySpreadByZone (bool, default true) Standard example now demonstrates HA. New KCL test asserts replicaCount flows through to CNPG + Atlas Releases when ha.enabled=true. README updated with the new fields + a Components table entry. Render + validate clean (21 resources). 11/11 KCL tests pass. Implements [[tasks/psql-stack-cnpg]] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoids Helm ownership conflicts when the cluster already has these CRDs from another source (e.g., a previous OpenEBS LVM install, the cluster's snapshot-controller, or another chart that bundles them). The CRDs (volumesnapshotclasses.snapshot.storage.k8s.io and friends) need to come from somewhere on the cluster — the assumption is that snapshot-controller is installed separately as a cluster-level concern, not bundled with each storage backend. Encountered live during pat-local install: leftover LVM CRD annotations blocked Mayastor's upgrade. Skip CRD install in our defaults to make this robust across re-installs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
csi-node DaemonSet was running on every cluster node, but workers without
the node-prep DS don't have nvme_tcp loaded, so csi-node crashloops there
("Failed to detect nvme_tcp kernel module").
Restrict csi-node via spec.csi.node.{nodeSelector,tolerations} to the
same psql NodePool nodes where node-prep loads nvme_tcp. PSQLCluster
workloads always schedule on workload-type=psql nodes via the existing
NodePool selectors, so this doesn't restrict anything that actually
consumes Mayastor PVCs.
Also fixes a chartDefaults merge bug introduced when HA mode landed:
both nodePool and HA were calling \`set chartDefaults "csi"\` which
clobbered each other. Now build the csi sub-dict incrementally before
adding it to chartDefaults.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 165-longhorn.yaml.gotmpl: longhorn chart 1.10.0, V2 data engine + SPDK - 170-storageclass-longhorn.yaml.gotmpl: driver.longhorn.io SC + VSC, dataEngine=v2, diskSelector=psql - 155-node-prep-daemonset.yaml.gotmpl: modprobe nvme_tcp / vfio_pci / uio_pci_generic / ublk_drv; replace DiskPool registration with node.longhorn.io/default-disks-config annotation - definition.yaml + state-init: chart defaults bump (1.10.0, longhorn-system); doc strings updated - 010-state-status: observe `longhorn` + `storageclass-longhorn` - README + examples updated; KCL tests updated and all 11 pass Mayastor checkpoint preserved on feat/cnpg-pivot. Longhorn V2 is "Experimental" upstream as of 1.10 — drops bundled etcd/NATS/minio/ loki/alloy footprint and unblocks arm64 Graviton (Mayastor's chart hardcodes amd64 on io_engine).
Drops Mayastor/Longhorn experimentation entirely:
- Remove NodePool sub-pools (branches + primary)
- Remove node-prep DaemonSet
- Remove Mayastor / Longhorn Helm release templates
- Remove Mayastor/Longhorn StorageClass
Slim spec.storage block down to spec.snapshotClass {enabled, name,
driver, deletionPolicy, parameters}. Default driver ebs.csi.aws.com
(EKS Auto Mode default); override for non-AWS providers. The stack
no longer composes a StorageClass — PSQLClusters target whatever SC
the cluster already provides.
Stack now composes 4 things: CNPG operator, Atlas operator, S2Z
plugin (9 Objects), and one VolumeSnapshotClass. CNPG/Atlas/S2Z
templates lose nodeSelector/tolerations refs; they run wherever Auto
Mode schedules them.
XRD shrinks from ~290 lines to ~170. README rewritten — no more
NVMe/SPDK/iscsi noise. Examples collapsed to clean shapes.
Replicated CoW storage (Longhorn et al) is now a separate concern,
to be provided by aws-storage-stack (self-managed ASG nodes with
proper userData) when the multi-tenant CoW economics justify the
operational cost. Bottlerocket/Auto Mode is incompatible with iscsi-
based engines (longhorn-manager env-checks for iscsiadm even with
V2; Bottlerocket's immutable rootfs blocks installation).
Branch checkpoints preserved on:
feat/cnpg-pivot — Mayastor experiment
feat/longhorn-pivot — Longhorn V2 experiment
EKS Auto Mode uses the managed `ebs.csi.eks.amazonaws.com` CSI driver, not the upstream `ebs.csi.aws.com`. Fix the default in state-init, XRD, README, and the minimal example so out-of-the-box installs on Auto Mode produce a working VolumeSnapshotClass without override. Self-managed EBS users can still override `spec.snapshotClass.driver` to `ebs.csi.aws.com`; non-AWS users override to their CSI driver name. Verified live on pat-local: psql VSC reconciles to driver ebs.csi.eks.amazonaws.com, XR Synced=True Ready=True, CNPG + Atlas + S2Z plugin all running cleanly on Auto Mode without dedicated nodes.
EKS Auto Mode ships the snapshot.storage.k8s.io CRDs but does NOT
ship the snapshot-controller. Without a controller, our composed
VolumeSnapshotClass is inert — PSQLBranch snapshots will sit
forever without ever reaching ReadyToUse.
Document it as a prerequisite (like cert-manager). The stack itself
does not compose snapshot-controller — it's a foundational cluster
concern that belongs in a separate stack (TBD: extend aws-cert-stack
or create a focused snapshot-stack).
Verified end-to-end on pat-local with the upstream
kubernetes-csi/external-snapshotter v8.2.0 manifests:
- PVC bound on ebs.csi.eks.amazonaws.com
- VolumeSnapshot via the composed `psql` VSC reached
readyToUse=true with a real EBS snapshot backing it.
The dependency now exists as a real stack (xrs/stacks/k8s/volume-snapshot/, ghcr.io/hops-ops/volume-snapshot-stack). Replace the manual upstream-YAML install instructions with a pointer at the stack.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR migrates a PostgreSQL management stack from StackGres-based operator control to CloudNativePG-based cluster lifecycle management, introducing new ChangesCloudNativePG Platform Migration
Sequence Diagram(s)sequenceDiagram
actor User
participant XRD as XRD (PSQLStack)
participant Stack as Stack Composition<br/>(hops-ops-psql-stackstack)
participant CNPG as CNPG Operator<br/>(Helm Release)
participant K8s as Kubernetes<br/>(StorageClass,<br/>VolumeSnapshot)
participant Scale as Scale-to-Zero<br/>Plugin
participant Atlas as Atlas Operator<br/>(Helm Release)
User->>XRD: Create PSQLStack with HA,<br/>storage, snapshot config
XRD->>Stack: Execute composition pipeline
Stack->>Stack: Initialize state from spec<br/>(namespace, labels, HA, provider refs)
Stack->>CNPG: Compose Helm Release<br/>(inject HA replicaCount)
Stack->>K8s: Compose StorageClass Object<br/>(provisioner, parameters)
Stack->>K8s: Compose VolumeSnapshotClass Object<br/>(driver matching storage)
alt scaleToZeroPlugin.enabled
Stack->>Scale: Compose plugin deployment<br/>(ServiceAccount, RBAC, Certs, Service)
end
Stack->>Atlas: Compose Helm Release<br/>(inject HA topology spread)
Note over Stack,Atlas: Auto-ready waits for all<br/>enabled components ready
CNPG-->>XRD: Release reconciles, creates<br/>cloudnative-pg chart
K8s-->>XRD: StorageClass & VolumeSnapshotClass<br/>objects created
Scale-->>XRD: Plugin deployment ready
Atlas-->>XRD: Release reconciles, creates<br/>atlas-operator chart
XRD-->>User: status.ready = true when<br/>all enabled components ready
sequenceDiagram
actor User
participant XRD as XRD (PSQLCluster)
participant Comp as Cluster Composition<br/>(hops-ops-psql-stackcluster)
participant CNPG as CNPG Cluster<br/>(Kubernetes Object)
participant ESO as External Secrets<br/>Operator (optional)
participant K8s as Kubernetes Secrets
participant Monitor as CNPG<br/>Monitoring
User->>XRD: Create PSQLCluster with<br/>storage, app, HA settings
XRD->>Comp: Execute composition pipeline
Comp->>Comp: Initialize state<br/>(size, version, HA instances,<br/>credential modes)
alt app.externalSecret configured
Comp->>ESO: Compose ExternalSecret Object<br/>(fetch username/password)
else app.secretName provided
Comp->>K8s: Use provided secret
else default
Comp->>K8s: Let CNPG generate secret
end
Comp->>CNPG: Compose CNPG Cluster Object<br/>(instances, bootstrap, storage,<br/>monitoring, optional superuser)
CNPG-->>Monitor: Enable PodMonitor for metrics
alt ha.enabled
Note over CNPG: instances override,<br/>anti-affinity topology
end
alt scaleToZero.enabled
CNPG-->>CNPG: Inject scale-to-zero plugin config
end
ESO-->>K8s: Populate credentials if ESO used
CNPG-->>XRD: Cluster reconciles, spins up<br/>PostgreSQL pods
XRD-->>User: status.ready + connection metadata<br/>(host/port/database)
sequenceDiagram
actor User
participant XRD as XRD (PSQLBranch)
participant Comp as Branch Composition<br/>(hops-ops-psql-stackbranch)
participant SrcSnap as Source VolumeSnapshot<br/>(cross-ns only)
participant BrSnap as Branch VolumeSnapshot
participant CNPG as CNPG Cluster<br/>(bootstrapped from fork)
User->>XRD: Create PSQLBranch<br/>(clusterName, source, storage config)
XRD->>Comp: Execute composition pipeline
Comp->>Comp: Initialize state<br/>(namespace, source defaults,<br/>snapshot class, scale-to-zero)
alt crossNamespace
Comp->>SrcSnap: Compose source VolumeSnapshot Object<br/>(target source PVC in source namespace)
end
Comp->>BrSnap: Compose branch VolumeSnapshot Object<br/>(reference source PVC or snapshot content)
Comp->>Comp: Derive observed snapshot content<br/>(track boundVolumeSnapshotContentName)
Comp->>CNPG: Compose CNPG Cluster Object<br/>(bootstrap.recovery.volumeSnapshots,<br/>instances, storage size, labels)
alt scaleToZero.enabled
CNPG-->>CNPG: Add idle-timeout annotation<br/>and plugin config
end
alt ttl.enabled
CNPG-->>CNPG: Add TTL expiration annotation
end
SrcSnap-->>BrSnap: Content available for branch fork
BrSnap-->>CNPG: VolumeSnapshot bound,<br/>content reference set
CNPG-->>XRD: Cluster bootstraps from snapshot,<br/>fork ready
XRD-->>User: status.ready + phase<br/>(bootstrap complete)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
77-89:⚠️ Potential issue | 🟡 MinorThe Stage 3/4 examples override the wrong provider only.
helmProviderConfigRef.name: defaultfixes the Helm releases, but theVolumeSnapshotClassand scale-to-zeroObjects still defaultkubernetesProviderConfigRef.nametoclusterNameinfunctions/render/000-state-init.yaml.gotmpl. As written, these examples will still look for Kubernetes ProviderConfigs namededge/local.Suggested doc fix
spec: clusterName: edge helmProviderConfigRef: name: default + kubernetesProviderConfigRef: + name: default snapshotClass: driver: driver.longhorn.iospec: clusterName: local helmProviderConfigRef: name: default + kubernetesProviderConfigRef: + name: default snapshotClass: enabled: false scaleToZeroPlugin: enabled: falseAlso applies to: 95-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 77 - 89, The example PSQLStack YAML sets helmProviderConfigRef.name: default but leaves kubernetesProviderConfigRef.name to the clusterName (causing VolumeSnapshotClass and scale-to-zero Objects to still use the wrong provider); update the example blocks (the PSQLStack examples around the Stage 3/4 snippets) to explicitly set kubernetesProviderConfigRef.name: default (or the same provider name used for helmProviderConfigRef) so functions/render/000-state-init.yaml.gotmpl will render the correct ProviderConfig for VolumeSnapshotClass and scale-to-zero Objects; ensure both occurrences noted (the block at lines ~77-89 and the similar block at ~95-109) are updated to keep provider names consistent.
🧹 Nitpick comments (1)
.gitignore (1)
14-14: Consider documenting/keeping an example config for devs/CI.If
apis/**/configuration.yamlis meant to be generated or environment-specific, it usually helps to add a committed example/template (e.g.,configuration.yaml.example) and (optionally) a comment near the ignore line describing how to generate/copy it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 14, The ignore of apis/**/configuration.yaml hides whether this is generated or required; add a committed example/template file (e.g., configuration.yaml.example) alongside the real config for each API, update README or CI docs to show how to copy/populate it (cp configuration.yaml.example configuration.yaml or use env substitution), and add a brief inline comment near the apis/**/configuration.yaml entry in .gitignore indicating the template location and generation method so devs and CI know how to produce the real file; reference the pattern apis/**/configuration.yaml and ensure the new configuration.yaml.example is added to the repo for each service that needs it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 190-191: Update the description for the ready field so it reflects
enabled components rather than implying all four are mandatory; change the text
for ready/status.ready to say something like "Overall readiness — true once all
enabled components (CNPG, Atlas, the scale-to-zero plugin if
scaleToZeroPlugin.enabled, and the VolumeSnapshotClass if snapshotClass.enabled)
are Ready" so the description references scaleToZeroPlugin.enabled and
snapshotClass.enabled and matches actual behavior.
- Around line 147-170: Update the snapshotClass description to match the schema
defaults: state that the stack composes exactly one VolumeSnapshotClass named
"psql" (not "named after the XR") and that the default CSI driver is
ebs.csi.eks.amazonaws.com (not ebs.csi.aws.com); keep a note that users can
override driver for other providers (e.g., ebs.csi.aws.com for self-managed EBS
or driver.longhorn.io) and ensure references to PSQLBranch and
VolumeSnapshotClass remain intact (look for snapshotClass, properties.name, and
properties.driver).
In `@functions/render/000-state-init.yaml.gotmpl`:
- Around line 62-68: Tests still set spec.stackgresOperator.values which this
initializer no longer reads; update the test to populate the new fields used
here ($cnpg via spec.cnpg, $s2z via spec.scaleToZeroPlugin and $s2z.enabled for
$s2zEnabled, and $atlas via spec.atlasOperator) or remove the obsolete
assignment. In the test file (tests/e2etest-psql/main.k) replace the
stackgresOperator.values block with equivalent entries under spec.cnpg,
spec.scaleToZeroPlugin (including an enabled flag if used), and
spec.atlasOperator, mapping any specific config keys from the old values into
the corresponding new sections, or delete the block if those settings are no
longer required.
In `@functions/render/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the comment in the volumesnapshotclass template to
reflect the correct default CSI driver: replace the stale value
"ebs.csi.aws.com" with "ebs.csi.eks.amazonaws.com" and mention that this default
originates from the state init template (which sets the default driver used by
$spec.snapshotClass.driver); ensure the comment and any example override
instructions reference "ebs.csi.eks.amazonaws.com" so readers are pointed to the
correct default.
In `@functions/render/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage guard currently only renders when both
$state.observed.cnpg.ready and $state.observed.atlasOperator.ready are true,
which is too late; change the conditional to trigger when the CNPG and Atlas
operator are present/declared (e.g. use $state.observed.cnpg.exists or
$state.observed.cnpg.present and $state.observed.atlasOperator.exists/present or
the equivalent keys that indicate intended composition) so the Usage (name: {{
$state.name }}-delete-atlas-operator-before-cnpg, resourceRef names {{
$cnpg.name }} and {{ $state.atlasOperator.name }}) is created before readiness
is reached and prevents CNPG removal during Atlas teardown.
In `@tests/test-render/main.k`:
- Around line 282-311: The test only checks positive presence via
CompositionTest.assertResources but lacks negative checks for s2z-* and the
VolumeSnapshotClass; update the CompositionTest block for the
"scale-to-zero-plugin-can-be-disabled" (and the other similar case) to
explicitly assert absence by using the test framework's absence check (e.g., add
assertResourcesAbsent with entries for resources matching kind/name patterns
"s2z-*" and the VolumeSnapshotClass or add an explicit rendered resource
count/assertion like renderedResourceCount == expected) so the test fails if
those s2z-* objects or VolumeSnapshotClass are emitted; locate the
CompositionTest blocks (metadata.name = "scale-to-zero-plugin-can-be-disabled"
and the similar test) and add the negative assertions alongside assertResources.
In `@upbound.yaml`:
- Around line 19-21: Update the metadata "description" and any README text to
remove references to an optional Karpenter NodePool and instead state that this
stack deploys CloudNativePG, the cnpg-i-scale-to-zero plugin, and the Atlas
Operator via Helm releases plus applied manifests, and optionally configures a
VolumeSnapshotClass; ensure the wording matches the rendered template
functions/render/210-cnpg-scale-to-zero.yaml.gotmpl and the XR shape in
apis/psqlstacks/definition.yaml. Locate and replace the stale phrase in the
description field and the README block (also update the repeated text around
lines 27-35) so the file consistently mentions Helm + applied manifests +
optional VolumeSnapshotClass and omits any NodePool language.
---
Outside diff comments:
In `@README.md`:
- Around line 77-89: The example PSQLStack YAML sets helmProviderConfigRef.name:
default but leaves kubernetesProviderConfigRef.name to the clusterName (causing
VolumeSnapshotClass and scale-to-zero Objects to still use the wrong provider);
update the example blocks (the PSQLStack examples around the Stage 3/4 snippets)
to explicitly set kubernetesProviderConfigRef.name: default (or the same
provider name used for helmProviderConfigRef) so
functions/render/000-state-init.yaml.gotmpl will render the correct
ProviderConfig for VolumeSnapshotClass and scale-to-zero Objects; ensure both
occurrences noted (the block at lines ~77-89 and the similar block at ~95-109)
are updated to keep provider names consistent.
---
Nitpick comments:
In @.gitignore:
- Line 14: The ignore of apis/**/configuration.yaml hides whether this is
generated or required; add a committed example/template file (e.g.,
configuration.yaml.example) alongside the real config for each API, update
README or CI docs to show how to copy/populate it (cp configuration.yaml.example
configuration.yaml or use env substitution), and add a brief inline comment near
the apis/**/configuration.yaml entry in .gitignore indicating the template
location and generation method so devs and CI know how to produce the real file;
reference the pattern apis/**/configuration.yaml and ensure the new
configuration.yaml.example is added to the repo for each service that needs it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 087a343f-4577-4bb1-9082-5215a0bdc6fc
📒 Files selected for processing (15)
.gitignoreREADME.mdapis/psqlstacks/definition.yamlexamples/psqlstacks/local.yamlexamples/psqlstacks/minimal.yamlexamples/psqlstacks/standard.yamlfunctions/render/000-state-init.yaml.gotmplfunctions/render/010-state-status.yaml.gotmplfunctions/render/170-volumesnapshotclass.yaml.gotmplfunctions/render/200-cnpg-operator.yaml.gotmplfunctions/render/200-stackgres-operator.yaml.gotmplfunctions/render/210-cnpg-scale-to-zero.yaml.gotmplfunctions/render/220-atlas-operator.yaml.gotmpltests/test-render/main.kupbound.yaml
💤 Files with no reviewable changes (1)
- functions/render/200-stackgres-operator.yaml.gotmpl
Merges PSQLCluster and PSQLBranch XRDs (previously standalone repos under
hops-ops/psql-cluster and hops-ops/psql-branch) into this package. One
Configuration package, one release cadence, one e2e flow.
Changes:
- apis/{psqlclusters,psqlbranches}/ — XRDs and compositions copied in
- examples/{psqlclusters,psqlbranches}/ — example manifests copied in
- functions/render/ → functions/stack/ — renamed to make room for siblings
- functions/{cluster,branch}/ — new function packages, gotmpls copied from
the standalone repos. Composition functionRefs updated:
psqlstacks → hops-ops-psql-stackstack
psqlclusters → hops-ops-psql-stackcluster
psqlbranches → hops-ops-psql-stackbranch
- tests/test-{stack,cluster,branch}/ — render tests renamed (was test-render-*)
- tests/e2etest-psql/main.k — unified e2e covering all three XRs at Synced;
TODO upgrade to Ready integration after volume-snapshot-stack v0.1.0
- .github/workflows/on-pr.yaml + on-push-main.yaml — switched to multi-API
workflow signature, pinned at @feat/multi-api-support for testing
- Makefile — EXAMPLES list extended; render/validate logic still single-API
(follow-up to make per-example api_path work locally)
Workflow change being tested:
unbounded-tech/workflows-crossplane@feat/multi-api-support
(validate.yaml now resolves api_path per example with fallback to inputs.api_path)
volume-snapshot-stack v0.1.0 is now published, so we can install it as a
dependency Configuration package and bring snapshot-controller into the
test cluster. With that, the whole chain reconciles to Ready in-cluster:
1. VolumeSnapshotStack XR (in extraResources) → snapshot-controller live
2. PSQLStack manifest → Helm-installs CNPG + atlas-operator + the psql
VolumeSnapshotClass
3. PSQLCluster manifest → CNPG bootstraps a real Postgres with a real PVC
4. PSQLBranch manifest → snapshots the source PVC, restores into a new
CNPG cluster
Pattern mirrors the aws-observe-stack e2e (initResources for dependency
Configuration packages, extraResources for the dependent XRs).
defaultConditions: Synced → Ready
timeoutSeconds: 1800 → 5400 (90 min for the full chain)
cleanupTimeoutSeconds: 900 → 1800
The unified e2e was carrying over `stackgresOperator.values` and `atlasOperator.values` from before the CNPG pivot. The current PSQLStack XRD uses `cnpg` (not `stackgresOperator`), `namespace` defaults to `cnpg-system` (not `stackgres`), and the kind-cluster defaults work without any operator-values overrides. Match `local/psqlstack.yaml`'s minimal shape: clusterName + labels + ProviderConfig refs only. Add `kubernetesProviderConfigRef` since PSQLStack now also applies the VolumeSnapshotClass via the kubernetes provider.
There was a problem hiding this comment.
Actionable comments posted: 14
♻️ Duplicate comments (2)
functions/stack/170-volumesnapshotclass.yaml.gotmpl (1)
10-12:⚠️ Potential issue | 🟡 MinorUpdate the stale default CSI driver comment.
Line 10 documents
ebs.csi.aws.com, but the stack default isebs.csi.eks.amazonaws.com. This will mislead operators about expected defaults.Suggested fix
-# Default driver: ebs.csi.aws.com (correct for EKS Auto Mode out of the box). +# Default driver: ebs.csi.eks.amazonaws.com (EKS Auto Mode default in state init).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl` around lines 10 - 12, Update the stale comment that names the default CSI driver: change the documented driver string from "ebs.csi.aws.com" to the current stack default "ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it reflects the actual default; locate the comment near the spec.snapshotClass.driver reference in the VolumeSnapshotClass/gotmpl block and replace the old driver name with the new one.functions/stack/200-cnpg-operator.yaml.gotmpl (1)
64-85:⚠️ Potential issue | 🟠 MajorRender deletion guard on presence, not readiness.
Line 64 delays
Usagecreation until both releases are Ready. If Atlas exists but is not Ready, teardown protection may never activate when needed.Suggested fix
-{{- if and $state.observed.cnpg.ready $state.observed.atlasOperator.ready }} +{{- $observed := $.observed.resources | default dict }} +{{- if and (hasKey $observed "cnpg-operator") (hasKey $observed "atlas-operator") }} --- apiVersion: protection.crossplane.io/v1beta1 kind: Usage🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@functions/stack/200-cnpg-operator.yaml.gotmpl` around lines 64 - 85, The Usage resource guard is gated on readiness ($state.observed.cnpg.ready and $state.observed.atlasOperator.ready) so teardown protection can be skipped if Atlas exists but isn't Ready; change the template condition to check presence/existence instead of readiness (e.g., replace checks of $state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their existence/presence flags such as $state.observed.cnpg.exists and $state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name }}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the Atlas operator are present regardless of Ready state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/on-pr.yaml:
- Around line 30-53: The workflow is referencing mutable branch refs (uses:
unbounded-tech/workflows-crossplane/...@feat/multi-api-support) for the reusable
workflows validate, test, e2e and publish; replace those branch refs with an
immutable tag or commit SHA (e.g., `@vX.Y.Z` or a specific commit) for each uses
entry so the CI points to a fixed release—update the four uses lines
(validate.yaml, test.yaml, e2e.yaml, publish.yaml) to the chosen semantic
version tag or commit SHA consistent with the existing workflow-vnext-tag
pattern.
In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references mutable branch refs for
the reusable workflows in the validate, test, and e2e jobs (the lines using
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update
those refs to an immutable release tag (for example replace
`@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag)
so validate, test, and e2e always run the exact released workflow code.
In `@apis/psqlbranches/definition.yaml`:
- Around line 117-123: Remove the hard-coded default for branch Postgres version
by deleting the default: "17" entry under the postgresql -> properties ->
version schema in definition.yaml so an omitted version is not coerced to "17";
ensure the schema allows absence (do not add a fixed default) so callers can
distinguish "inherit from source" vs "explicitly set", and if needed mark the
version property nullable or optional instead of supplying a default.
In `@apis/psqlclusters/composition.yaml`:
- Around line 11-16: The composition references custom functions
hops-ops-psql-stackcluster, hops-ops-psql-stackstack, and
hops-ops-psql-stackbranch but they are not declared as dependencies in
upbound.yaml; update upbound.yaml's dependsOn section to include entries for
these three functions (using their package names and versions) alongside the
existing crossplane-contrib-function-auto-ready, or alternatively
document/ensure those three functions are preinstalled before deploying—make
sure to add exact package identifiers for hops-ops-psql-stackcluster,
hops-ops-psql-stackstack, and hops-ops-psql-stackbranch so the composition can
resolve the functionRef names.
In `@apis/psqlclusters/definition.yaml`:
- Around line 154-161: The OpenAPI schema exposes monitoring.enabled but the
cluster compose template is reading $monSpec.monitoring instead of the boolean
$monSpec.enabled, causing explicit false to be ignored; update the template in
functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring
section uses $monSpec.enabled (or conditionally sets monitoring.enabled to
$monSpec.enabled) rather than copying the whole $monSpec.monitoring object,
ensuring explicit false values disable PodMonitor creation as intended.
In `@apis/psqlstacks/composition.yaml`:
- Around line 11-16: The composition references a Function named
hops-ops-psql-stackstack but that Function is missing from the project
dependency manifest; add a Function resource entry named
hops-ops-psql-stackstack into your upbound.yaml (or the project’s Functions
list) so the name exactly matches the functionRef in the composition, providing
the required metadata and spec (image/source, runtime, and any config) that
Crossplane expects for that Function; ensure the Function resource kind/name is
hops-ops-psql-stackstack and then rebuild/update the package so the Function is
available at runtime.
In `@functions/branch/010-state-status.yaml.gotmpl`:
- Around line 19-27: The template currently stops fallback when a
"branch-snapshot" exists even if its boundVolumeSnapshotContentName is empty,
causing $snapContent to be cleared; update the selection logic so you prefer the
branch-snapshot only when it actually contains a non-empty
boundVolumeSnapshotContentName and otherwise fall back to source-snapshot.
Concretely, compute both get $observed "branch-snapshot" and get $observed
"source-snapshot" into $snapEntryBranch and $snapEntrySource, extract their
boundVolumeSnapshotContentName via the existing pipeline ($snapResource →
$snapAtProvider → $snapManifest → $snapStatus → boundVolumeSnapshotContentName),
and set $snapEntry/$snapContent to the branch one only if that
boundVolumeSnapshotContentName is non-empty; otherwise use the source-snapshot
values so $snapContent never gets overwritten with an empty string.
In `@functions/branch/100-source-snapshot.yaml.gotmpl`:
- Around line 16-43: The VolumeSnapshot metadata name currently uses "{{
$state.name }}-src" which can collide across namespaces; change the snapshot
name generation in the template so it includes a namespace-specific component
(for example incorporate {{ $source.namespace }} and/or {{ $source.pvcName }} or
a short hash of them) instead of just {{ $state.name }}, updating the
metadata.name line inside the forProvider.manifest block (the VolumeSnapshot
metadata) and any related references (e.g. resource name labels/annotations) so
the snapshot is unique per source namespace/PVC.
In `@functions/branch/200-cnpg-cluster.yaml.gotmpl`:
- Around line 78-85: The template always sets storage.size from
$state.branch.storage.size which causes branches to default to 10Gi even when
the source PVC is larger; change the assignment so size falls back to the source
PVC when branch size is empty (e.g. compute $size := default
$state.source.storage.size $state.branch.storage.size or equivalent) and use
that $size when building $storage, keeping the conditional storageClass logic
and assigning to $clusterSpec "storage".
In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 67-72: The template currently places the superuser secret into
bootstrap.initdb.secret (the block with "bootstrap" "initdb" "owner" "app" and
secret name $state.credentials.superuser.secretName); instead, wire the
superuser credential via spec.superuserSecret using
$state.credentials.superuser.secretName and make bootstrap.initdb.secret point
to the application-owner secret (a secret whose username matches owner: app).
Update the template to set spec.superuserSecret to
$state.credentials.superuser.secretName and ensure the "bootstrap" -> "initdb"
-> "secret" uses the app-owner secret (not the postgres superuser secret).
In `@functions/stack/000-state-init.yaml.gotmpl`:
- Around line 71-85: The comment above the VolumeSnapshotClass block is out of
sync with the template default driver; update the human-readable note that
currently says "ebs.csi.aws.com" to match the rendered default
"ebs.csi.eks.amazonaws.com" so docs match the actual fallback defined by
$snapshotSpec and the $snapshotClass dict (see variables $snapshotSpec,
$snapshotEnabled and $snapshotClass where driver defaults to
"ebs.csi.eks.amazonaws.com").
In `@Makefile`:
- Around line 25-31: The new examples in EXAMPLES (psqlclusters and
psqlbranches) are still being processed through the existing
$(DEFINITION)/$(COMPOSITION) paths for apis/psqlstacks, so the derived
XRD/composition will be generated against the wrong schema; update the Makefile
rules that consume $(EXAMPLES) and produce $(DEFINITION)/$(COMPOSITION) to apply
the api-dir macro (api-dir) when computing the api_path (e.g., use the api-dir
expansion or $(call api-dir,...) in the pattern-substitution used by the targets
that refer to DEFINITION and COMPOSITION) so each example maps to its
corresponding apis/<x>/... paths instead of always apis/psqlstacks.
In `@tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k`:
- Around line 95-120: The APIDependencies union schema
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 is inconsistent: it
exposes git, http, and k8s payload fields but the discriminator $type only
allows "k8s" | "crd" and there is no crd payload type defined. Fix by aligning
the discriminator with the payloads — either add a `crd` payload schema and
implement its field type if "crd" is intended, or expand/replace `$type` to
include the actual variants ("git", "http", "k8s") and ensure only the matching
payload field (git/http/k8s) is present for each discriminator value; update
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 accordingly.
In `@tests/test-stack/main.k`:
- Around line 298-311: The test's assertResources list is missing an assertion
for the Atlas Release despite the test description claiming "cnpg + atlas + VSC
still composed"; add an entry to the assertResources array similar to the
existing CNPG Release assertion that checks for the Atlas Release (apiVersion
"helm.m.crossplane.io/v1beta1", kind "Release", metadata.name set to the Atlas
release name used elsewhere in tests, e.g., "atlas"), so the test will fail if
Atlas is not composed.
---
Duplicate comments:
In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the stale comment that names the default CSI driver:
change the documented driver string from "ebs.csi.aws.com" to the current stack
default "ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it
reflects the actual default; locate the comment near the
spec.snapshotClass.driver reference in the VolumeSnapshotClass/gotmpl block and
replace the old driver name with the new one.
In `@functions/stack/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage resource guard is gated on readiness
($state.observed.cnpg.ready and $state.observed.atlasOperator.ready) so teardown
protection can be skipped if Atlas exists but isn't Ready; change the template
condition to check presence/existence instead of readiness (e.g., replace checks
of $state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their
existence/presence flags such as $state.observed.cnpg.exists and
$state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name
}}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the
Atlas operator are present regardless of Ready state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1fbd16ce-b73a-4e47-bd02-3efa17c8d181
📒 Files selected for processing (57)
.github/workflows/on-pr.yaml.github/workflows/on-push-main.yamlMakefileapis/psqlbranches/composition.yamlapis/psqlbranches/definition.yamlapis/psqlclusters/composition.yamlapis/psqlclusters/definition.yamlapis/psqlstacks/composition.yamlexamples/psqlbranches/cross-namespace.yamlexamples/psqlbranches/preview-with-ttl.yamlexamples/psqlbranches/same-namespace.yamlexamples/psqlclusters/minimal.yamlexamples/psqlclusters/standard.yamlfunctions/branch/000-state-init.yaml.gotmplfunctions/branch/010-state-status.yaml.gotmplfunctions/branch/100-source-snapshot.yaml.gotmplfunctions/branch/110-branch-snapshot.yaml.gotmplfunctions/branch/200-cnpg-cluster.yaml.gotmplfunctions/branch/999-status.yaml.gotmplfunctions/cluster/000-state-init.yaml.gotmplfunctions/cluster/010-state-status.yaml.gotmplfunctions/cluster/100-external-secret.yaml.gotmplfunctions/cluster/200-cnpg-cluster.yaml.gotmplfunctions/cluster/999-status.yaml.gotmplfunctions/stack/000-state-init.yaml.gotmplfunctions/stack/010-state-status.yaml.gotmplfunctions/stack/170-volumesnapshotclass.yaml.gotmplfunctions/stack/200-cnpg-operator.yaml.gotmplfunctions/stack/210-cnpg-scale-to-zero.yaml.gotmplfunctions/stack/220-atlas-operator.yaml.gotmplfunctions/stack/999-status.yaml.gotmpltests/e2etest-psql/main.ktests/test-branch/kcl.modtests/test-branch/main.ktests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.ktests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.ktests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.ktests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.ktests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.ktests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.ktests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.ktests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.ktests/test-branch/model/kcl.modtests/test-cluster/kcl.modtests/test-cluster/main.ktests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.ktests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.ktests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.ktests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.ktests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.ktests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.ktests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.ktests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.ktests/test-cluster/model/kcl.modtests/test-stack/kcl.modtests/test-stack/main.ktests/test-stack/model
✅ Files skipped from review due to trivial changes (27)
- tests/test-cluster/model/kcl.mod
- tests/test-branch/model/kcl.mod
- examples/psqlclusters/minimal.yaml
- tests/test-stack/kcl.mod
- tests/test-cluster/kcl.mod
- tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
- examples/psqlclusters/standard.yaml
- functions/branch/999-status.yaml.gotmpl
- examples/psqlbranches/preview-with-ttl.yaml
- tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
- apis/psqlbranches/composition.yaml
- tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
- functions/cluster/999-status.yaml.gotmpl
- tests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.k
- tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
- tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
- tests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
- functions/branch/000-state-init.yaml.gotmpl
- examples/psqlbranches/same-namespace.yaml
- tests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.k
- tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.k
- tests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.k
- tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.k
- tests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.k
- examples/psqlbranches/cross-namespace.yaml
- tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.k
- tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.k
| uses: unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support | ||
| with: | ||
| examples: | | ||
| [ | ||
| { "example": "examples/psqlstacks/minimal.yaml" }, | ||
| { "example": "examples/psqlstacks/standard.yaml" } | ||
| { "example": "examples/psqlstacks/minimal.yaml", "api_path": "apis/psqlstacks" }, | ||
| { "example": "examples/psqlstacks/standard.yaml", "api_path": "apis/psqlstacks" }, | ||
| { "example": "examples/psqlclusters/minimal.yaml", "api_path": "apis/psqlclusters" }, | ||
| { "example": "examples/psqlclusters/standard.yaml","api_path": "apis/psqlclusters" }, | ||
| { "example": "examples/psqlbranches/same-namespace.yaml", "api_path": "apis/psqlbranches" } | ||
| ] | ||
| api_path: apis/psqlstacks | ||
| error_on_missing_schemas: true | ||
|
|
||
| test: | ||
| uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@v2.20.0 | ||
| uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@feat/multi-api-support | ||
|
|
||
| e2e: | ||
| uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@v2.20.0 | ||
| uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@feat/multi-api-support |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .github/workflows/on-push-main.yaml | head -50Repository: hops-ops/psql-stack
Length of output: 1893
🏁 Script executed:
cat -n .github/workflows/on-push-main.yamlRepository: hops-ops/psql-stack
Length of output: 2013
Pin reusable workflows to immutable refs before merging.
The validate, test, and e2e jobs reference @feat/multi-api-support, a mutable branch that can change without review. This breaks mainline CI reproducibility—different runs may execute different code. Pin these to immutable release tags (like the v1.21.0 used for version-and-tag).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow
currently references mutable branch refs for the reusable workflows in the
validate, test, and e2e jobs (the lines using
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update
those refs to an immutable release tag (for example replace
`@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag)
so validate, test, and e2e always run the exact released workflow code.
| monitoring: | ||
| description: Add Prometheus scrape configuration. CNPG operator handles the actual PodMonitor creation when `monitoring.enablePodMonitor` is set on the Cluster CR. | ||
| type: object | ||
| properties: | ||
| enabled: | ||
| type: boolean | ||
| default: true | ||
|
|
There was a problem hiding this comment.
monitoring.enabled is exposed but not wired through.
The schema advertises a toggle here, but functions/cluster/000-state-init.yaml.gotmpl:72-76 currently reads $monSpec.monitoring instead of $monSpec.enabled, so explicit false values are ignored and the composed Cluster stays monitored.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apis/psqlclusters/definition.yaml` around lines 154 - 161, The OpenAPI schema
exposes monitoring.enabled but the cluster compose template is reading
$monSpec.monitoring instead of the boolean $monSpec.enabled, causing explicit
false to be ignored; update the template in
functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring
section uses $monSpec.enabled (or conditionally sets monitoring.enabled to
$monSpec.enabled) rather than copying the whole $monSpec.monitoring object,
ensuring explicit false values disable PodMonitor creation as intended.
Three locations claimed the default driver was `ebs.csi.aws.com` and the VolumeSnapshotClass was "named after the XR" — both wrong relative to the actual schema (driver `ebs.csi.eks.amazonaws.com`, name defaults to `psql`). This text shows up in `kubectl explain`, generated docs, and template comments, so it needed to match. Also clarified `status.ready` description: components are toggleable, so readiness is "every enabled component is Ready" rather than implying all four are mandatory. From CodeRabbit review on PR #10.
The s2z-disabled test description claimed cnpg + atlas + VSC are still composed, but only cnpg and the VSC were asserted — a regression that broke the Atlas Release would silently pass. Added the Atlas assertion. From CodeRabbit review on PR #10.
… not readiness
The Usage that protects cnpg-operator from premature deletion (so Atlas is
torn down first and CNPG isn't yanked while Atlas's migration state is
still live) was rendered only after both Releases reported Ready. That
left a window: if Atlas was mid-progressing or in error and the user
deleted the stack, no Usage existed yet, and CNPG could be deleted first
— exactly the ordering this guard exists to prevent.
Switched the gate from `$state.observed.{cnpg,atlasOperator}.ready` to
`hasKey $.observed.resources "{cnpg,atlas}-operator"`. The Usage now
appears as soon as both Releases are observed, regardless of their
readiness state.
Verified locally: render tests still pass; cluster reinstall via
`hops config install --path` brings all three XRs back to Synced/Ready.
From CodeRabbit review on PR #10.
…ed externalSecret
The previous `credentials.superuser` shape was misleading: the secret named
"superuser" was actually wired into CNPG's `bootstrap.initdb.secret`, which
takes the *application user's* credentials. The actual postgres superuser
secret was either auto-generated by CNPG or absent from the spec entirely.
And it collided with CNPG's own `<cluster>-superuser` secret naming.
New shape (no backwards compat — still alpha):
spec.app: # always present; wires bootstrap.initdb
role: app # Postgres role name
database: app # Application database name
secretName: "" # K8s Secret; default <cluster>-app
externalSecret: # OPTIONAL — when set, ESO renders the Secret
secretStore:
kind: ClusterSecretStore | SecretStore
name: hops-aws-secrets-manager
namespace: "" # for SecretStore (defaults to XR ns)
secretRef:
path: my-cluster/app # remote location; JSON value with username+password
spec.superuser: # OPTIONAL — omit to let CNPG auto-generate
secretName: "" # default <cluster>-superuser
externalSecret: { ... } # same shape as app.externalSecret
When `superuser` is set, the composition renders `spec.superuserSecret` on
the wrapped CNPG Cluster CR; otherwise CNPG auto-generates and stores the
secret at `<cluster>-superuser` (its own convention).
Field names mirror External Secrets Operator's CRD shape so anyone familiar
with ESO can read it at a glance.
Status now exposes connection details so dependent XRs can wire without
hardcoding:
status.app: { secretName, database, host, port }
status.superuser: { secretName }
Render template factored to a single `psqlcluster.externalSecret` definition
so the app/superuser ExternalSecrets share one source of truth.
Tests rewritten:
- Test 1 ("minimal-renders-cluster-only") asserts default mode renders only
the Cluster — no ExternalSecret, since externalSecret is now opt-in.
- Test 7 ("external-secret-renders-when-opted-in") explicitly asserts the
ESO ExternalSecret with the new shape.
Migration for existing manifests: `credentials.superuser.managedBy: ""`
→ remove the block (BYO is now default). `credentials.superuser` with ESO
fields → move under `spec.app.externalSecret` matching the new shape.
From CodeRabbit review on PR #10.
… repos The render:all and validate:all targets were hardcoded to apis/psqlstacks via $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), so multi-API examples (psqlclusters/*, psqlbranches/*) were rendered against the wrong schema. Each example now resolves its own apis/<plural>/ dir from the example path (`examples/<plural>/<file>.yaml` → `apis/<plural>`) and uses that for both `up composition render --xrd=...` and `crossplane beta validate <api_dir>`. Verified locally: `make render` and `make validate` both clean across all five examples. From CodeRabbit review on PR #10.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
.github/workflows/on-push-main.yaml (1)
26-42:⚠️ Potential issue | 🟠 MajorPin the reusable workflows to immutable refs.
validate,test, ande2estill pull workflow code from@feat/multi-api-support, so a later branch push can changemainCI behavior without a repo change here. Please switch these to a tag or commit SHA before merging.#!/bin/bash set -euo pipefail rg -n 'uses:\s+unbounded-tech/workflows-crossplane/.+@' .github/workflows/on-push-main.yaml .github/workflows/on-pr.yaml .github/workflows/on-version-tagged.yamlExpected result:
on-push-main.yamlandon-pr.yamlshould reference immutable tags/SHAs rather than branch names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow currently references reusable workflows by branch refs (validate.yaml@feat/multi-api-support, test.yaml@feat/multi-api-support, e2e.yaml@feat/multi-api-support); update each "uses: unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support" to point to an immutable tag or commit SHA (e.g., replace "@feat/multi-api-support" with a specific release tag or the exact commit SHA) so validate, test, and e2e always use pinned workflow code.
🧹 Nitpick comments (1)
functions/cluster/000-state-init.yaml.gotmpl (1)
63-67: Add defensive defaulting forspec.storageto match the pattern used elsewhere in the codebase.While
spec.storageis schema-required for PSQLCluster, defensive defaulting aligns with the pattern already established in the PSQLBranch template (functions/branch/000-state-init.yaml.gotmplline 65) and provides resilience against edge cases.Suggested fix
-{{- $storageSpec := $spec.storage }} +{{- $storageSpec := $spec.storage | default dict }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@functions/cluster/000-state-init.yaml.gotmpl` around lines 63 - 67, The template accesses $spec.storage directly which can fail in edge cases; add defensive defaulting before using it (mirror the approach in the PSQLBranch template) by setting $storageSpec to a default empty dict when $spec.storage is missing, then build $storage from $storageSpec (refer to the symbols $spec.storage, $storageSpec and $storage in the diff and the PSQLBranch template for the exact pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@functions/cluster/100-external-secret.yaml.gotmpl`:
- Around line 64-91: The ExternalSecret resource names/annotations in the
psqlcluster.externalSecret template are using "external-secret-app" and
"external-secret-superuser" which don't match the status lookup expecting
"external-secret"; update the ResourceName and ResourceNameAnnotation entries
passed into template "psqlcluster.externalSecret" for both the app and superuser
blocks to use the singular "external-secret" (e.g., ResourceName: printf
"%s-external-secret" and ResourceNameAnnotation: setResourceNameAnnotation
"external-secret") so the status readiness check can find the synced resources;
alternatively, if you prefer multiple distinct names, update the status lookup
to match these specific keys (ensure changes are made for both app and superuser
usages).
In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 24-28: The current use of merge "$state.labels (dict)" aliases
$state.labels as the destination so later set calls on $clusterLabels mutate
$state.labels; change the merge to create a fresh map as the destination (e.g.,
use merge (dict) $state.labels) so $clusterLabels is a copy and subsequent set
calls (the $_ := set $clusterLabels "...") do not mutate $state.labels and
therefore prevent branching labels from leaking elsewhere.
In `@Makefile`:
- Around line 113-117: The validate:% recipe uses undefined shell variables
$$definition, $$composition, and $$api_dir; initialize them at the start of the
recipe using the pattern stem (same way example is set) so the render/validate
commands get real paths. For example, set shell vars like
definition="examples/psqlstacks/$*.yaml", composition="compositions/$*.yaml" and
api_dir="apis" (or your repo's actual api directory) before calling up
composition render in the validate:% target so $$definition, $$composition, and
$$api_dir are non-empty when used.
In `@tests/e2etest-psql/main.k`:
- Around line 188-190: Remove the stale assignment to app.managedBy (it no
longer exists on PSQLCluster.spec.app) — delete the line setting app.managedBy
and do not add any replacement; to skip ExternalSecret/BYO mode simply omit the
app.externalSecret block entirely (the schema defaults to skipping it). Update
any nearby comments to reflect that omission and ensure no other code references
app.managedBy.
In `@tests/test-stack/main.k`:
- Around line 14-16: Update the docstring comments that still mention the old
snapshot-class driver "ebs.csi.aws.com" to the current driver used by this stack
(replace all occurrences of "ebs.csi.aws.com" in the comments near the
VolumeSnapshotClass description), keeping the rest of the wording (e.g.,
"VolumeSnapshotClass", default name "psql") intact; also apply the same
replacement to the other comment occurrence referenced in the file so both
comment blocks reflect the current snapshot driver.
---
Duplicate comments:
In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references reusable workflows by
branch refs (validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, e2e.yaml@feat/multi-api-support); update each
"uses:
unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support"
to point to an immutable tag or commit SHA (e.g., replace
"@feat/multi-api-support" with a specific release tag or the exact commit SHA)
so validate, test, and e2e always use pinned workflow code.
---
Nitpick comments:
In `@functions/cluster/000-state-init.yaml.gotmpl`:
- Around line 63-67: The template accesses $spec.storage directly which can fail
in edge cases; add defensive defaulting before using it (mirror the approach in
the PSQLBranch template) by setting $storageSpec to a default empty dict when
$spec.storage is missing, then build $storage from $storageSpec (refer to the
symbols $spec.storage, $storageSpec and $storage in the diff and the PSQLBranch
template for the exact pattern).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bf8788e5-52c9-443c-a955-4bcce34be39e
📒 Files selected for processing (17)
.github/workflows/on-pr.yaml.github/workflows/on-push-main.yamlMakefileapis/psqlclusters/definition.yamlapis/psqlstacks/definition.yamlfunctions/cluster/000-state-init.yaml.gotmplfunctions/cluster/100-external-secret.yaml.gotmplfunctions/cluster/200-cnpg-cluster.yaml.gotmplfunctions/cluster/999-status.yaml.gotmplfunctions/stack/000-state-init.yaml.gotmplfunctions/stack/170-volumesnapshotclass.yaml.gotmplfunctions/stack/200-cnpg-operator.yaml.gotmpltests/e2etest-psql/main.ktests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.ktests/test-cluster/main.ktests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.ktests/test-stack/main.k
✅ Files skipped from review due to trivial changes (4)
- functions/stack/170-volumesnapshotclass.yaml.gotmpl
- tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
- functions/cluster/999-status.yaml.gotmpl
- functions/stack/000-state-init.yaml.gotmpl
🚧 Files skipped from review as they are similar to previous changes (2)
- functions/stack/200-cnpg-operator.yaml.gotmpl
- .github/workflows/on-pr.yaml
Adds a third credential mode: omit `app.externalSecret` AND `app.secretName`
and the composition omits `bootstrap.initdb.secret` from the CNPG Cluster CR
so CNPG auto-generates and owns the basic-auth Secret at `<cluster-name>-app`.
The previous shape always set `bootstrap.initdb.secret.name = <cluster>-app`,
which forced CNPG to read a Secret that — in the no-ESO/no-BYO case — never
existed, blocking bootstrap. The unified e2e hit this exact case (kind
harness, no ESO ClusterSecretStore) and a stale `app.managedBy = ""` field
was masking the real failure mode.
Three modes now documented on the XRD:
1. Omit `app` → CNPG auto-generates `<cluster-name>-app`.
2. Set `app.externalSecret` → ESO writes the Secret CNPG reads.
3. Set `app.secretName` (no externalSecret) → BYO; pre-create the Secret.
Backwards compatible: existing manifests with externalSecret or explicit
secretName render identically. CNPG will adopt a pre-existing
`<cluster-name>-app` if one is present.
Tests:
- test-cluster: minimal asserts the secret line is omitted; external-secret
asserts secret.name is wired; new BYO test covers explicit secretName.
- e2etest-psql: drops the bogus `app.managedBy` field that broke KCL parse;
relies on the new auto-gen path.
Implements [[tasks/merge-psql-client-apis-into-stack]]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 159-170: The XRD declares secretRef.path contains a JSON blob with
username and password, but the ExternalSecret template
(functions/cluster/100-external-secret.yaml.gotmpl) currently reads only
.RemoteRef.key and renders a single password entry, causing mismatch with CNPG
template (functions/cluster/200-cnpg-cluster.yaml.gotmpl) which expects both
credentials; update 100-external-secret.yaml.gotmpl to consume the remote JSON
from secretRef.path/.RemoteRef (not just .RemoteRef.key) and map both username
and password into the rendered Secret (e.g., create two remote refs or a single
remoteRef with extractors for "username" and "password") so app.externalSecret
and superuser.externalSecret produce the full Secret shape that
200-cnpg-cluster.yaml.gotmpl requires.
In `@tests/e2etest-psql/main.k`:
- Around line 56-71: The retained snapshot dependency resources use fixed names
which can be reused across runs; make the Configuration metadata.name
("hops-ops-volume-snapshot-stack" in initResources) and the corresponding
VolumeSnapshotStack XR name unique per test run (e.g., append a
testRunId/timestamp/UUID) so each test installs and references a distinct
package instance, and update any other references (the VolumeSnapshotStack XR
created later around lines 125-139) to use the same generated unique name so
retention still works against the exact resource created for that run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 075163c9-3527-4d38-9ce4-9de815008e0a
📒 Files selected for processing (5)
apis/psqlclusters/definition.yamlfunctions/cluster/000-state-init.yaml.gotmplfunctions/cluster/200-cnpg-cluster.yaml.gotmpltests/e2etest-psql/main.ktests/test-cluster/main.k
| secretRef: | ||
| description: Where in the secrets backend the credentials live. The remote value must be a JSON blob with `username` and `password` properties — both are extracted into the resulting K8s Secret. | ||
| type: object | ||
| properties: | ||
| path: | ||
| description: Path/key in the secrets backend (e.g. AWS Secrets Manager `my-cluster/app`). | ||
| type: string | ||
| required: | ||
| - path | ||
| required: | ||
| - secretStore | ||
| - secretRef |
There was a problem hiding this comment.
Align the ExternalSecret schema with the rendered secret contract.
The XRD advertises secretRef.path holding a remote value with both username and password, but functions/cluster/100-external-secret.yaml.gotmpl currently consumes .RemoteRef.key and only renders a password entry. In the current shape, the advertised app.externalSecret / superuser.externalSecret mode cannot produce the Secret shape that functions/cluster/200-cnpg-cluster.yaml.gotmpl hands to CNPG.
Also applies to: 205-214
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apis/psqlclusters/definition.yaml` around lines 159 - 170, The XRD declares
secretRef.path contains a JSON blob with username and password, but the
ExternalSecret template (functions/cluster/100-external-secret.yaml.gotmpl)
currently reads only .RemoteRef.key and renders a single password entry, causing
mismatch with CNPG template (functions/cluster/200-cnpg-cluster.yaml.gotmpl)
which expects both credentials; update 100-external-secret.yaml.gotmpl to
consume the remote JSON from secretRef.path/.RemoteRef (not just .RemoteRef.key)
and map both username and password into the rendered Secret (e.g., create two
remote refs or a single remoteRef with extractors for "username" and "password")
so app.externalSecret and superuser.externalSecret produce the full Secret shape
that 200-cnpg-cluster.yaml.gotmpl requires.
PSQLStack now composes a `psql` StorageClass alongside its existing
`psql` VolumeSnapshotClass — they share the same CSI driver
(`ebs.csi.eks.amazonaws.com` by default) since snapshots only work
when the snapshotter driver matches the source PVC's provisioner.
PSQLCluster + PSQLBranch default `spec.storage.class` to "psql", so
consumer manifests stop leaking driver-specific knowledge.
Default StorageClass shape: gp3 + WaitForFirstConsumer (correct for
zonal CSI drivers — late-binds the PVC to a node so EBS volumes land
in the same AZ as the consuming pod) + allowVolumeExpansion=true
(CNPG resizes via the same field on its Cluster CR).
E2E pivots from kind to an ephemeral EKS Auto Mode cluster (mirror
of aws-observe-stack): provisions an AutoEKSCluster per run, installs
volume-snapshot-stack on it, then runs PSQLStack/Cluster/Branch
against the real `ebs.csi.eks.amazonaws.com` driver — same code path
that runs on pat-local. Kind has no snapshot-capable CSI driver
natively, so the prior kind-only e2e couldn't exercise PSQLBranch's
snapshot/fork chain.
Verified end-to-end on pat-local:
- PSQLStack composed `psql` SC + `psql` VSC (both with
ebs.csi.eks.amazonaws.com)
- PSQLCluster PVC bound on `psql` SC, CNPG primary running
- PSQLBranch VolumeSnapshot reached readyToUse=true, restored PVC
bound on `psql` SC, CNPG fork primary running
Breaking: PSQLStack adds a new composed StorageClass by default. Sites
that already have a `psql` SC will conflict — set
`spec.storageClass.enabled: false` to opt out.
Requires three GitHub Actions vars on the repo (synced via the new
`hops vars sync github`): ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A,
PRIVATE_SUBNET_ID_B.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
.github/workflows/on-pr.yaml (1)
30-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin reusable workflow refs to immutable versions.
These four
usesentries still target@feat/multi-api-support(mutable branch), which makes PR CI behavior drift over time. Please pin to an immutable tag or commit SHA for deterministic runs.Also applies to: 43-43, 46-46, 93-93
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/on-pr.yaml at line 30, The workflow is referencing a mutable branch ref ("uses: unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support") which makes CI non-deterministic; replace each mutable branch ref (all occurrences of the string "unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support" and the other similar `uses:` entries flagged) with an immutable ref (a specific tag or commit SHA), updating each `uses:` line so it points to the chosen tag or SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic workflow runs.tests/e2etest-psql/main.k (1)
82-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the retained snapshot dependency resources unique per run.
These resources are intentionally retained, but both the package install and the
VolumeSnapshotStackXR still use fixed names. A rerun against the same control plane can reuse stale snapshot-controller state instead of exercising a fresh dependency install for this run.Suggested change
_now = str(int(math.floor(datetime.ticks()))) _test_name = "e2e-psql-" + _now _cluster_name = "e2e-psql-cluster-" + _now _branch_name = "e2e-psql-branch-" + _now +_snapshot_package_name = "hops-ops-volume-snapshot-stack-" + _now +_snapshot_stack_name = "snapshot-" + _now _namespace = "default" @@ { apiVersion = "pkg.crossplane.io/v1" kind = "Configuration" - metadata.name = "hops-ops-volume-snapshot-stack" + metadata.name = _snapshot_package_name spec.package = "ghcr.io/hops-ops/volume-snapshot-stack:v0.1.0" } @@ { apiVersion = "hops.ops.com.ai/v1alpha1" kind = "VolumeSnapshotStack" metadata = { - name = "snapshot" + name = _snapshot_stack_name namespace = _namespace }Also applies to: 250-255
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2etest-psql/main.k` around lines 82 - 90, The retained VolumeSnapshotStack and its package install use fixed names ("metadata.name" = "hops-ops-volume-snapshot-stack") so reruns can pick up stale state; make the retained dependency resources unique per run by appending a run-specific suffix (e.g., timestamp, CI_RUN_ID, or random ID) to the Configuration.metadata.name and to the corresponding VolumeSnapshotStack XR name used elsewhere (the two occurrences flagged around the current diff and at the other location), and ensure any references/matchLabels that point to that name are updated to the same generated suffix so the installer and the XR remain consistent per run.
🧹 Nitpick comments (1)
functions/stack/180-storageclass.yaml.gotmpl (1)
40-45: 💤 Low valueParameter map keys are unquoted — may break for non-trivial key names.
{{ $k }}emits the key bare. If a StorageClass parameter key ever contains a colon, hash, or other YAML-special character, the rendered output will be invalid YAML. Quoting the key is zero-cost insurance.♻️ Proposed fix
{{- range $k, $v := $sc.parameters }} - {{ $k }}: {{ $v | quote }} + {{ $k | quote }}: {{ $v | quote }} {{- end }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@functions/stack/180-storageclass.yaml.gotmpl` around lines 40 - 45, The StorageClass template emits parameter map keys unquoted (using {{ $k }}) which can produce invalid YAML for keys with special characters; update the parameters loop in the template that references $sc.parameters so each key is quoted (e.g., replace occurrences of {{ $k }} with a quoted form like {{ $k | quote }}) while keeping values quoted as before, ensuring the keys are safely rendered for all YAML-special characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 186-214: The superuser.externalSecret block is missing the
descriptive text present in app.externalSecret; update the
superuser.externalSecret schema to mirror app.externalSecret by documenting
secretStore.kind/name/namespace semantics (including default: ClusterSecretStore
and allowed enum values ClusterSecretStore/SecretStore and that namespace is
only required for SecretStore), and add a clear description for secretRef.path
stating it must point to a JSON blob containing "username" and "password" (same
contract as app.externalSecret) so kubectl explain/generated docs show
consistent expectations for superuser.externalSecret.
In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The OpenAPI schema for ha.replicas currently allows zero;
update the replicas property in the definition.yaml (property name: replicas
under ha) to add a minimum: 1 constraint so users cannot set replicas: 0; ensure
the type remains integer and keep the default: 3, optionally updating the
property's description to note the minimum of 1 for clarity.
---
Duplicate comments:
In @.github/workflows/on-pr.yaml:
- Line 30: The workflow is referencing a mutable branch ref ("uses:
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support")
which makes CI non-deterministic; replace each mutable branch ref (all
occurrences of the string
"unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support"
and the other similar `uses:` entries flagged) with an immutable ref (a specific
tag or commit SHA), updating each `uses:` line so it points to the chosen tag or
SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic
workflow runs.
In `@tests/e2etest-psql/main.k`:
- Around line 82-90: The retained VolumeSnapshotStack and its package install
use fixed names ("metadata.name" = "hops-ops-volume-snapshot-stack") so reruns
can pick up stale state; make the retained dependency resources unique per run
by appending a run-specific suffix (e.g., timestamp, CI_RUN_ID, or random ID) to
the Configuration.metadata.name and to the corresponding VolumeSnapshotStack XR
name used elsewhere (the two occurrences flagged around the current diff and at
the other location), and ensure any references/matchLabels that point to that
name are updated to the same generated suffix so the installer and the XR remain
consistent per run.
---
Nitpick comments:
In `@functions/stack/180-storageclass.yaml.gotmpl`:
- Around line 40-45: The StorageClass template emits parameter map keys unquoted
(using {{ $k }}) which can produce invalid YAML for keys with special
characters; update the parameters loop in the template that references
$sc.parameters so each key is quoted (e.g., replace occurrences of {{ $k }} with
a quoted form like {{ $k | quote }}) while keeping values quoted as before,
ensuring the keys are safely rendered for all YAML-special characters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 85ea3da6-87ef-41c5-b108-ccb63db2d812
📒 Files selected for processing (13)
.github/workflows/on-pr.yaml.gitignoreREADME.mdapis/psqlbranches/definition.yamlapis/psqlclusters/definition.yamlapis/psqlstacks/definition.yamlfunctions/branch/000-state-init.yaml.gotmplfunctions/cluster/000-state-init.yaml.gotmplfunctions/stack/000-state-init.yaml.gotmplfunctions/stack/010-state-status.yaml.gotmplfunctions/stack/180-storageclass.yaml.gotmpltests/e2etest-psql/main.ktests/test-stack/main.k
✅ Files skipped from review due to trivial changes (3)
- .gitignore
- functions/branch/000-state-init.yaml.gotmpl
- README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- functions/stack/010-state-status.yaml.gotmpl
- apis/psqlbranches/definition.yaml
- tests/test-stack/main.k
- functions/stack/000-state-init.yaml.gotmpl
| replicas: | ||
| type: integer | ||
| default: 3 |
There was a problem hiding this comment.
ha.replicas lacks a minimum constraint — zero replicas is accepted.
Without minimum: 1, a user can set replicas: 0, which would scale every HA-able operator Deployment to zero and silently break the stack.
🛡️ Proposed fix
replicas:
type: integer
+ minimum: 1
default: 3🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The OpenAPI schema for
ha.replicas currently allows zero; update the replicas property in the
definition.yaml (property name: replicas under ha) to add a minimum: 1
constraint so users cannot set replicas: 0; ensure the type remains integer and
keep the default: 3, optionally updating the property's description to note the
minimum of 1 for clarity.
The full chain converged on the previous run except for PSQLStack's scaleToZeroPlugin objects (s2z-cert-client, s2z-cert-server, s2z-issuer) — the S2Z plugin uses cert-manager's Issuer + Certificate for its gRPC mTLS pair, and the ephemeral EKS cluster has no cert-manager. pat-local already has cert-manager from earlier setup, which masked this in local validation. cert-stack v0.1.0 mirrors volume-snapshot-stack's shape (single clusterName field, composes a Helm release on the target). Adding it as initResource + extraResource closes the gap.
Published Crossplane PackageThe following Crossplane package was published as part of this PR: Package: ghcr.io/hops-ops/psql-stack:pr-10-55b4a02981f9e74b194fb9e818bc6df76089e407 |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apis/psqlstacks/definition.yaml (1)
90-92:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
ha.replicasstill lacks aminimum: 1constraint.A user can set
replicas: 0, silently scaling all HA-able platform Deployments to zero.🛡️ Proposed fix
replicas: type: integer + minimum: 1 default: 3🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The schema for the replicas integer (ha.replicas) allows zero; update the replicas property in apis/psqlstacks/definition.yaml to add a minimum: 1 constraint so users cannot set replicas: 0; locate the replicas definition (type: integer, default: 3) and add the minimum validator for that property to enforce at least one replica.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 195-204: The parameters object currently combines
additionalProperties: type: string with x-kubernetes-preserve-unknown-fields:
true which conflicts; decide which behavior you want and make both parameter
blocks consistent: if parameter map values must be plain strings (e.g. passed
verbatim to the CSI driver) remove x-kubernetes-preserve-unknown-fields and keep
additionalProperties: type: string; if parameters may contain nested/arbitrary
structures remove additionalProperties and keep
x-kubernetes-preserve-unknown-fields; apply the same change to
snapshotClass.parameters as well so both parameter definitions use the same,
non-conflicting approach.
---
Duplicate comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The schema for the replicas integer (ha.replicas) allows
zero; update the replicas property in apis/psqlstacks/definition.yaml to add a
minimum: 1 constraint so users cannot set replicas: 0; locate the replicas
definition (type: integer, default: 3) and add the minimum validator for that
property to enforce at least one replica.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c458ce55-33a3-418a-b8e5-6f7008816953
📒 Files selected for processing (13)
.github/workflows/on-pr.yaml.gitignoreREADME.mdapis/psqlbranches/definition.yamlapis/psqlclusters/definition.yamlapis/psqlstacks/definition.yamlfunctions/branch/000-state-init.yaml.gotmplfunctions/cluster/000-state-init.yaml.gotmplfunctions/stack/000-state-init.yaml.gotmplfunctions/stack/010-state-status.yaml.gotmplfunctions/stack/180-storageclass.yaml.gotmpltests/e2etest-psql/main.ktests/test-stack/main.k
✅ Files skipped from review due to trivial changes (4)
- functions/stack/010-state-status.yaml.gotmpl
- .gitignore
- functions/stack/180-storageclass.yaml.gotmpl
- README.md
🚧 Files skipped from review as they are similar to previous changes (8)
- functions/stack/000-state-init.yaml.gotmpl
- functions/branch/000-state-init.yaml.gotmpl
- tests/e2etest-psql/main.k
- functions/cluster/000-state-init.yaml.gotmpl
- apis/psqlbranches/definition.yaml
- apis/psqlclusters/definition.yaml
- tests/test-stack/main.k
- .github/workflows/on-pr.yaml
| parameters: | ||
| description: | | ||
| Provisioner-specific parameters. Defaults to `{type: gp3}` for | ||
| the EBS provisioner. When overriding `provisioner`, set this | ||
| to whatever the new provisioner expects (e.g. `{}` for | ||
| hostpath.csi.k8s.io, driver-specific keys for Longhorn). | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| x-kubernetes-preserve-unknown-fields: true |
There was a problem hiding this comment.
parameters combines additionalProperties with x-kubernetes-preserve-unknown-fields: true — the constraints are contradictory.
additionalProperties: type: string tells Kubernetes to validate and prune map values that aren't strings, while x-kubernetes-preserve-unknown-fields: true instructs the API server to skip structural pruning entirely. The two directives conflict: in practice the pruning bypass wins, silently accepting non-string values that the schema claims to reject.
The same pattern repeats for snapshotClass.parameters (Lines 237–239).
Pick one approach:
- If values must be strings (e.g. they are passed verbatim to the CSI driver), keep
additionalPropertiesand dropx-kubernetes-preserve-unknown-fields. - If nested/arbitrary structures are needed, drop
additionalPropertiesand keepx-kubernetes-preserve-unknown-fields.
♻️ Proposed fix (string-values-only)
parameters:
type: object
additionalProperties:
type: string
- x-kubernetes-preserve-unknown-fields: trueApply the same change to snapshotClass.parameters at Lines 237–239.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parameters: | |
| description: | | |
| Provisioner-specific parameters. Defaults to `{type: gp3}` for | |
| the EBS provisioner. When overriding `provisioner`, set this | |
| to whatever the new provisioner expects (e.g. `{}` for | |
| hostpath.csi.k8s.io, driver-specific keys for Longhorn). | |
| type: object | |
| additionalProperties: | |
| type: string | |
| x-kubernetes-preserve-unknown-fields: true | |
| parameters: | |
| description: | | |
| Provisioner-specific parameters. Defaults to `{type: gp3}` for | |
| the EBS provisioner. When overriding `provisioner`, set this | |
| to whatever the new provisioner expects (e.g. `{}` for | |
| hostpath.csi.k8s.io, driver-specific keys for Longhorn). | |
| type: object | |
| additionalProperties: | |
| type: string |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apis/psqlstacks/definition.yaml` around lines 195 - 204, The parameters
object currently combines additionalProperties: type: string with
x-kubernetes-preserve-unknown-fields: true which conflicts; decide which
behavior you want and make both parameter blocks consistent: if parameter map
values must be plain strings (e.g. passed verbatim to the CSI driver) remove
x-kubernetes-preserve-unknown-fields and keep additionalProperties: type:
string; if parameters may contain nested/arbitrary structures remove
additionalProperties and keep x-kubernetes-preserve-unknown-fields; apply the
same change to snapshotClass.parameters as well so both parameter definitions
use the same, non-conflicting approach.
Replaces the four mutable `@feat/multi-api-support` branch refs with the immutable `@v3.0.0` tag now that the multi-api work is shipped. Also retargets the org from `unbounded-tech` to `hops-ops` — the canonical home for hops platform CI workflows.
…licit set The previous schema and gotmpl both defaulted branch postgres version to "17", silently coercing branches off PG 15/16 sources to a major mismatch. Volume snapshot recovery is binary-compatible only within a major version (Postgres won't open a data dir from a different major), so a hardcoded default produced silent failures at restore time. Now: omit imageName entirely when version is unset, letting CNPG fall back to its operator-default image (close-enough when source tracks the same chart). Setting spec.postgresql.version is now an explicit pin, typically used to fix a minor (e.g. "17.4") to match the source's reported version. Adds a render test for the explicit-pin path; existing default-path test asserts imageName absence.
Existing state-init logic correctly uses hasKey to distinguish "explicitly false" from "absent" before reading $monSpec.enabled, but no test exercised the explicit-false path. Added regression test asserting that spec.monitoring.enabled=false propagates to the composed Cluster CR's monitoring.enablePodMonitor=false. Default-on path covered by Test 1.
…tent In cross-namespace branching, the branch-ns VolumeSnapshot can only bind once it learns the source's VolumeSnapshotContent name — that name is propagated through $state.observed.snapshotContent into 110-branch-snapshot's render. The previous logic preferred branch-snapshot whenever it was present in observed, even when its boundVolumeSnapshotContentName was still empty — which it always is on first reconcile, before the source content has propagated. Result: empty content overwrites the populated source content, the branch-ns snapshot renders with `volumeSnapshotContentName: ""`, and the chain stalls. Now: read both branch-ns and source-ns content via the existing pipeline, prefer branch when non-empty, fall back to source otherwise. Fixes the chicken-and-egg that prevented cross-namespace branches from binding on first-pass reconcile. Same-namespace branching is unaffected: source-snapshot is gated on crossNamespace and isn't composed there, so $sourceContent is empty and the branch-ns content (always populated post-bind) wins.
Adds two CompositionTests that exercise 010-state-status's snapshot- content fallback against inline observedResources: 1. branch-snapshot bound content empty + source-snapshot bound: branch-ns VolumeSnapshot must render with the source's content name (proves the chicken-and-egg fix from 18d3115). 2. branch-snapshot bound + source-snapshot bound: steady-state — branch's content wins (proves the fallback doesn't overwrite a populated branch content with source's). Without the fix, test 1 would render volumeSnapshotContentName: "" and fail. With it, source's content propagates through state.observed.snapshotContent into 110-branch-snapshot's render. Locks in the cross-namespace bind behavior — the previous test surface only exercised render shape, not state propagation across reconciles.
…espace The source-ns VolumeSnapshot lives in a namespace shared by multiple branches (the source PSQLCluster's). Naming it just `<branchName>-src` collides when two PSQLBranch XRs have the same metadata.name in different branch namespaces — both would create `preview-pr-1-src` in `team-app`, racing on the same object. Now: `<branchNS>-<branchName>-src`. Encodes the branch XR's own namespace into the name so (sourceNS, branchNS, branchName) is unique. K8s names are bound by RFC 1123 subdomain (253 chars) — concatenation is safe at any reasonable namespace/branch length. Branch-ns snapshot (`<branchName>-snap`) is unchanged: the branch namespace is already implicit by where the resource lives, and branch XR names are unique within a single namespace. Adds a regression test that asserts the source-ns name varies with branch namespace.
Previously the gotmpl coerced unset `branch.storage.size` to a hardcoded
"10Gi" — silently mis-sizing branches off any source PVC larger than that.
CNPG/EBS can't shrink during recovery, so a 10Gi branch off a 100Gi source
fails when CNPG tries to bind the restored PVC.
New shape:
- PSQLBranch XRD adds optional `spec.source.storage.size` so consumers
can mirror the source PSQLCluster's known capacity. The branch
composition has no automatic visibility into the source's PVC, so
the user declares it once on the branch spec.
- Size precedence in the cnpg-cluster render:
branch.storage.size (explicit override, e.g. growing the branch)
→ source.storage.size (inherit from the source)
→ omitted (CNPG's webhook rejects with a clear error)
- Drops the silent 10Gi gotmpl fallback in 000-state-init.
Examples + e2e branch updated to declare source.storage.size mirroring
the source PSQLCluster's spec.storage.size. Local pat-local manifests
already set branch.storage.size explicitly so unaffected.
Symlinks the per-test KCL `model/` directory to `.up/kcl/models` so
schema changes from `up project build` propagate to tests automatically.
Previously the bundled models drifted from the XRD on every change and
needed manual regeneration. `.up/` is gitignored — fresh clones run
`make build` (or `up project build`) to populate the symlink target.
Adds two regression tests: branch.size overrides source.size, and
source.size used when branch.size is empty.
…ce names The state-status template looked up `external-secret` in the observed map, but 100-external-secret renders two distinct resources with suffixed names (`external-secret-app`, `external-secret-superuser`) when their respective config blocks are set. The lookup never matched, so $state.observed.externalSecret.ready was always false. Now: aggregate over both names. Ready=true when every present ES Object reports Ready=true, and Ready=true when neither is present (BYO / CNPG-managed-secret paths — nothing to wait on). Note: the field is currently unused (999-status doesn't surface it, no composition gates on it). Fixing now to match the documented intent so future consumers don't inherit the bug.
The validate:% recipe was using $$definition, $$composition, $$api_dir shell variables that were never initialized — `make validate:minimal` silently fed empty strings to `up composition render` and `crossplane beta validate`, so the recipe never produced a real validation result. Aligns validate:% with render:%: both now use the top-level Makefile variables ($(DEFINITION), $(COMPOSITION), $(XRD_DIR)) which point at apis/psqlstacks. Single-target shorthand stays psqlstacks-only as documented in README. The :all targets keep their derive-per-example shell logic for the multi-API case.
Existing test 7 only asserted the ExternalSecret existed with the right secretStoreRef and target.name — it didn't lock in the data[] mappings or the basic-auth target template. The current gotmpl correctly maps both username and password as separate remoteRef entries (with `property: username` / `property: password`) extracted from the same JSON blob at secretRef.path, then synthesizes a kubernetes.io/basic-auth Secret. Asserting the full shape so a regression that drops one key fails the test instead of silently shipping a half-populated Secret.
ESO 0.10+ moved to v1 GA, and recent ESO charts (the version
cert-stack/aws-secret-stack already use) drop the v1beta1
served-version: applies fail with `no matches for kind
"ExternalSecret" in version "external-secrets.io/v1beta1"`.
Schema shape (data[].remoteRef.{key,property} + target.template) is
identical between v1beta1 and v1, so this is a pure apiVersion bump.
Aligns with the other stacks that already use v1 (auth, gitops,
cloudflare/dns, aws/secret).
The superuser.externalSecret schema had the same shape as
app.externalSecret but no field descriptions, so kubectl explain and
generated docs gave inconsistent guidance for what's the same contract.
Adds the missing descriptions (secretStore.{kind,name,namespace} and
secretRef.{path}) verbatim from app.externalSecret. Pure documentation
change — no behavior, default, or required-fields shift.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
Makefile (2)
13-14: 💤 Low value
api-dirmacro is unused — bulk targets reimplement it inline.The
api-dirhelper at line 14 is never invoked; bothrender:allandvalidate:allinline the sameawk -F/ '{print "apis/" $2}'derivation. Either invoke$(call api-dir,$$example)from the recipes (note: shell-vs-Make expansion needs care here, since$$exampleis a shell var) or drop the macro and the comment block at lines 12-14 to avoid confusion.Also applies to: 40-40, 73-73
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 13 - 14, The Makefile defines an unused helper macro api-dir which duplicates logic also written inline in the render:all and validate:all recipes; either replace those inline awk derivations with a proper call to the macro using $(call api-dir,$$example) (taking care to pass the shell variable example as $$example so Make expands correctly) in the render:all and validate:all recipe bodies, or remove the api-dir macro and its comment block entirely to avoid confusion; update all other occurrences flagged (lines near 40 and 73) to use the same approach so the helper is consistently used or consistently removed.
109-117: ⚡ Quick winSingle-example aliases silently route every stem to
apis/psqlstacks.
render:%andvalidate:%still hardcodeexamples/psqlstacks/$*.yamland the legacy$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR)globals. Now that examples exist underpsqlclusters/andpsqlbranches/, invoking e.g.make render:minimalormake validate:standardwill render/validate the psqlstacks example against the psqlstacks XRD even when the user intended a multi-API one — and there's no error to indicate the mismatch.Two reasonable options:
- Make the stem encode the api plural (e.g.
make render:psqlclusters/minimal) and deriveapi_dirfrom it the same way the bulk targets do.- Rename these targets to
render:psqlstacks:%/validate:psqlstacks:%so the psqlstacks-only scope is explicit.Sketch for option 1
render\:%: - `@example`="examples/psqlstacks/$*.yaml"; \ - up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example + `@example`="examples/$*.yaml"; \ + api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \ + up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example validate\:%: - `@example`="examples/psqlstacks/$*.yaml"; \ - up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example \ - --include-full-xr --quiet | \ - crossplane beta validate $(XRD_DIR) --error-on-missing-schemas - + `@example`="examples/$*.yaml"; \ + api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \ + up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example \ + --include-full-xr --quiet | \ + crossplane beta validate $$api_dir --error-on-missing-schemas -🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 109 - 117, The render:% and validate:% phony targets currently hardcode examples/psqlstacks/$*.yaml and global $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to psqlstacks; change them to derive the API plural from the stem (like the bulk targets do) and build example and XRD/composition paths from that API: compute api_dir from the stem (e.g. split $* on "/" to get the first field), set example=examples/$(api_dir)/$*.yaml and set the corresponding DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and crossplane beta validate with those derived variables; alternatively, if you prefer explicit scoping, rename the targets to render:psqlstacks:% and validate:psqlstacks:% to make the psqlstacks-only behavior explicit..github/workflows/on-pr.yaml (1)
53-64: 💤 Low valueMove
aws-account-idto a GitHub variable for consistency with sibling inputs.
ADMIN_ROLE_ARN,PRIVATE_SUBNET_ID_A, andPRIVATE_SUBNET_ID_Bare sourced fromvars.*, butaws-account-idis hardcoded. AWS account IDs aren't strictly secret, but treating them as configuration (a) keeps this workflow portable across environments, (b) avoids a repo-wide search/replace if the account ever changes, and (c) matches the convention already established in this same job.Diff
aws: true aws-use-oidc: true - aws-account-id: "034489662075" + aws-account-id: ${{ vars.AWS_ACCOUNT_ID }} aws-region: us-east-2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/on-pr.yaml around lines 53 - 64, The workflow hardcodes aws-account-id while related values use vars.*, so change the aws-account-id entry to read from a repo variable (e.g., replace aws-account-id: "034489662075" with aws-account-id: ${{ vars.AWS_ACCOUNT_ID }}), keeping the rest of the job (env-vars with ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B) unchanged so the job consistently sources configuration from GitHub variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test-branch/main.k`:
- Around line 429-456: The test metav1alpha1.CompositionTest named
"scaletozero-disabled-strips-plugin" currently only asserts name/namespace;
update its assertResources for the entry with metadata.name
"br-no-s2z-cnpg-cluster" to explicitly assert that
spec.forProvider.manifest.metadata.annotations does NOT contain the
"cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} /
absent) and that spec.forProvider.manifest.spec.plugins does NOT include the
scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the
assertResources object to add these negative assertions so the test fails if the
annotation or plugin are still emitted.
---
Nitpick comments:
In @.github/workflows/on-pr.yaml:
- Around line 53-64: The workflow hardcodes aws-account-id while related values
use vars.*, so change the aws-account-id entry to read from a repo variable
(e.g., replace aws-account-id: "034489662075" with aws-account-id: ${{
vars.AWS_ACCOUNT_ID }}), keeping the rest of the job (env-vars with
ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B) unchanged so the job
consistently sources configuration from GitHub variables.
In `@Makefile`:
- Around line 13-14: The Makefile defines an unused helper macro api-dir which
duplicates logic also written inline in the render:all and validate:all recipes;
either replace those inline awk derivations with a proper call to the macro
using $(call api-dir,$$example) (taking care to pass the shell variable example
as $$example so Make expands correctly) in the render:all and validate:all
recipe bodies, or remove the api-dir macro and its comment block entirely to
avoid confusion; update all other occurrences flagged (lines near 40 and 73) to
use the same approach so the helper is consistently used or consistently
removed.
- Around line 109-117: The render:% and validate:% phony targets currently
hardcode examples/psqlstacks/$*.yaml and global
$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to
psqlstacks; change them to derive the API plural from the stem (like the bulk
targets do) and build example and XRD/composition paths from that API: compute
api_dir from the stem (e.g. split $* on "/" to get the first field), set
example=examples/$(api_dir)/$*.yaml and set the corresponding
DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and
crossplane beta validate with those derived variables; alternatively, if you
prefer explicit scoping, rename the targets to render:psqlstacks:% and
validate:psqlstacks:% to make the psqlstacks-only behavior explicit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79f332a7-c175-4e49-b64c-1c1e2aae7d2f
📒 Files selected for processing (16)
.github/workflows/on-pr.yamlMakefileapis/psqlbranches/definition.yamlapis/psqlclusters/definition.yamlexamples/psqlbranches/same-namespace.yamlfunctions/branch/000-state-init.yaml.gotmplfunctions/branch/010-state-status.yaml.gotmplfunctions/branch/100-source-snapshot.yaml.gotmplfunctions/branch/200-cnpg-cluster.yaml.gotmplfunctions/cluster/010-state-status.yaml.gotmplfunctions/cluster/100-external-secret.yaml.gotmpltests/e2etest-psql/main.ktests/test-branch/main.ktests/test-branch/modeltests/test-cluster/main.ktests/test-cluster/model
✅ Files skipped from review due to trivial changes (2)
- tests/test-cluster/model
- examples/psqlbranches/same-namespace.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
- functions/cluster/010-state-status.yaml.gotmpl
- functions/cluster/100-external-secret.yaml.gotmpl
- functions/branch/010-state-status.yaml.gotmpl
- functions/branch/100-source-snapshot.yaml.gotmpl
- tests/test-cluster/main.k
- tests/e2etest-psql/main.k
- functions/branch/200-cnpg-cluster.yaml.gotmpl
- apis/psqlclusters/definition.yaml
| metav1alpha1.CompositionTest { | ||
| metadata.name = "scaletozero-disabled-strips-plugin" | ||
| spec = { | ||
| compositionPath = "apis/psqlbranches/composition.yaml" | ||
| xrdPath = "apis/psqlbranches/definition.yaml" | ||
| timeoutSeconds = 60 | ||
| validate = False | ||
| xr = stacksv1alpha1.PSQLBranch { | ||
| metadata = {name = "br-no-s2z", namespace = "default"} | ||
| spec = { | ||
| clusterName = "my-cluster" | ||
| source = {name = "src"} | ||
| scaleToZero = {enabled = False} | ||
| } | ||
| } | ||
| assertResources = [ | ||
| { | ||
| apiVersion = "kubernetes.m.crossplane.io/v1alpha1" | ||
| kind = "Object" | ||
| metadata.name = "br-no-s2z-cnpg-cluster" | ||
| spec.forProvider.manifest.metadata = { | ||
| name = "br-no-s2z" | ||
| namespace = "default" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether the KCL CompositionTest harness in this repo supports
# negative/absence assertions, and how existing tests express "key absent".
fd -t f 'main.k' tests | head -20
rg -nP --type-add 'kcl:*.k' --type=kcl -C3 '\b(assertResources|assertNotPresent|absent|notExpected)\b'Repository: hops-ops/psql-stack
Length of output: 17809
🏁 Script executed:
#!/bin/bash
# Examine the scaletozero-disabled and ttl-enabled tests in test-branch/main.k
sed -n '429,488p' tests/test-branch/main.kRepository: hops-ops/psql-stack
Length of output: 2453
🏁 Script executed:
#!/bin/bash
# Look for comments or patterns showing how assertions work, especially for annotations
rg -n 'metadata\.annotations|spec\.plugins' tests/ --type-add 'kcl:*.k' --type=kcl -B2 -A2Repository: hops-ops/psql-stack
Length of output: 1810
🏁 Script executed:
#!/bin/bash
# Search for any test documentation or comments about assertion semantics
rg -n 'assertResources|assert' tests/test-branch/main.k -B1 -A5 | head -100Repository: hops-ops/psql-stack
Length of output: 4268
scaletozero-disabled-strips-plugin doesn't assert the absence of the scale-to-zero annotation and plugin.
The test name promises that disabling scaleToZero removes both the cnpg-i-scale-to-zero.xata.io/idle-timeout annotation and the plugin entry. However, assertResources only checks metadata.name and namespace. Compare this to the scaletozero-enabled test (lines 407–422), which explicitly asserts spec.forProvider.manifest.metadata.annotations and spec.forProvider.manifest.spec.plugins—or the ttl-enabled-adds-annotation test (lines 476–485), which asserts the full annotation set. If the composition regressed and kept emitting the annotation and plugin when scaleToZero.enabled = False, this test would pass silently.
Add explicit assertions for the absent fields:
Suggested tightening
assertResources = [
{
apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
kind = "Object"
metadata.name = "br-no-s2z-cnpg-cluster"
spec.forProvider.manifest.metadata = {
name = "br-no-s2z"
namespace = "default"
}
+ spec.forProvider.manifest.metadata.annotations = {}
+ spec.forProvider.manifest.spec.plugins = []
}
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metav1alpha1.CompositionTest { | |
| metadata.name = "scaletozero-disabled-strips-plugin" | |
| spec = { | |
| compositionPath = "apis/psqlbranches/composition.yaml" | |
| xrdPath = "apis/psqlbranches/definition.yaml" | |
| timeoutSeconds = 60 | |
| validate = False | |
| xr = stacksv1alpha1.PSQLBranch { | |
| metadata = {name = "br-no-s2z", namespace = "default"} | |
| spec = { | |
| clusterName = "my-cluster" | |
| source = {name = "src"} | |
| scaleToZero = {enabled = False} | |
| } | |
| } | |
| assertResources = [ | |
| { | |
| apiVersion = "kubernetes.m.crossplane.io/v1alpha1" | |
| kind = "Object" | |
| metadata.name = "br-no-s2z-cnpg-cluster" | |
| spec.forProvider.manifest.metadata = { | |
| name = "br-no-s2z" | |
| namespace = "default" | |
| } | |
| } | |
| ] | |
| } | |
| } | |
| metav1alpha1.CompositionTest { | |
| metadata.name = "scaletozero-disabled-strips-plugin" | |
| spec = { | |
| compositionPath = "apis/psqlbranches/composition.yaml" | |
| xrdPath = "apis/psqlbranches/definition.yaml" | |
| timeoutSeconds = 60 | |
| validate = False | |
| xr = stacksv1alpha1.PSQLBranch { | |
| metadata = {name = "br-no-s2z", namespace = "default"} | |
| spec = { | |
| clusterName = "my-cluster" | |
| source = {name = "src"} | |
| scaleToZero = {enabled = False} | |
| } | |
| } | |
| assertResources = [ | |
| { | |
| apiVersion = "kubernetes.m.crossplane.io/v1alpha1" | |
| kind = "Object" | |
| metadata.name = "br-no-s2z-cnpg-cluster" | |
| spec.forProvider.manifest.metadata = { | |
| name = "br-no-s2z" | |
| namespace = "default" | |
| } | |
| spec.forProvider.manifest.metadata.annotations = {} | |
| spec.forProvider.manifest.spec.plugins = [] | |
| } | |
| ] | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test-branch/main.k` around lines 429 - 456, The test
metav1alpha1.CompositionTest named "scaletozero-disabled-strips-plugin"
currently only asserts name/namespace; update its assertResources for the entry
with metadata.name "br-no-s2z-cnpg-cluster" to explicitly assert that
spec.forProvider.manifest.metadata.annotations does NOT contain the
"cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} /
absent) and that spec.forProvider.manifest.spec.plugins does NOT include the
scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the
assertResources object to add these negative assertions so the test fails if the
annotation or plugin are still emitted.
Summary
Pivots the stack from StackGres → CNPG → Mayastor → Longhorn → EKS Auto Mode primitives. The journey is preserved on
feat/cnpg-pivot(Mayastor) andfeat/longhorn-pivot(Longhorn V2) checkpoint branches.The end state is a dramatically slimmer stack that runs cleanly on EKS Auto Mode without dedicated NodePools, custom AMIs, or in-cluster storage clusters. Composes 4 things:
psqlVolumeSnapshotClass(driver:ebs.csi.eks.amazonaws.com) for PSQLBranch's CoW fork targetPSQLClusters target whatever
StorageClassthe cluster already provides (gp3on Auto Mode); the stack stops opining about SCs.Why the pivot
kubernetes.io/arch: amd64on the io_engine DaemonSet; broken on arm64 Graviton (which we use for cost-optimal NVMe instance types). Also brings ~4 vCPU + 6 GiB platform tax (etcd×3, NATS, minio, loki, alloy) that duplicates other stacks.longhorn-managerenv-checks foriscsiadmat startup unconditionally. Bottlerocket (EKS Auto Mode's OS) has an immutable rootfs — no path to install iscsiadm. Verified live on pat-local: manager CrashLoopBackOff with the env-check failure.Schema diff
Drops:
Adds:
XRD shrinks ~290 → ~170 lines.
New dependency
Test plan
Suggested merge style
Squash — the 16 commits on this branch include the StackGres→CNPG→Mayastor→Longhorn→Auto Mode journey; the experiment branches are pushed (`feat/cnpg-pivot`, `feat/longhorn-pivot`) for historical reference. Squash gives main a single clean pivot commit.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
PSQLClustercomposite for managed PostgreSQL cluster deployment with HA, scale-to-zero, and monitoring supportPSQLBranchcomposite for creating ephemeral database forks via volume snapshotsDocumentation