Skip to content

lin121291/PgBackup-Operator

Repository files navigation

PgBackup Operator

Kubernetes Operator for automating PostgreSQL backups to AWS S3. Define a PgBackup custom resource, and the operator automatically creates and manages a CronJob that runs pg_dump on schedule and uploads the backup to S3.

How It Works

  1. Create a Secret containing your database password and AWS credentials
  2. Apply a PgBackup CR specifying the schedule, database connection info, and S3 bucket
  3. The operator creates a CronJob in the same namespace
  4. Kubernetes runs the backup job on schedule, executing pg_dump and uploading to S3
  5. Deleting the PgBackup automatically cleans up the associated CronJob (via OwnerReference)

CRD Spec

apiVersion: backup.my.domain/v1alpha1
kind: PgBackup
metadata:
  name: my-backup
spec:
  schedule: "0 2 * * *"                          # Cron expression
  dbHost: "postgres.default.svc.cluster.local"    # PostgreSQL host
  dbUser: "admin"                                 # Database username
  dbName: "mydb"                                  # Database name
  s3Bucket: "s3://my-backup-bucket/"              # S3 destination
  secretRef: "pg-backup-secret"                   # Secret with credentials

The referenced Secret must contain:

apiVersion: v1
kind: Secret
metadata:
  name: pg-backup-secret
type: Opaque
stringData:
  DB_PASSWORD: "your-db-password"
  AWS_ACCESS_KEY_ID: "your-access-key"
  AWS_SECRET_ACCESS_KEY: "your-secret-key"

Monitoring

The operator exposes Prometheus metrics on the /metrics endpoint and includes pre-configured alert rules.

Custom Metrics

Metric Type Description
pgbackup_jobs_succeeded_total Gauge Number of succeeded backup jobs per PgBackup
pgbackup_jobs_failed_total Gauge Number of failed backup jobs per PgBackup
pgbackup_last_successful_backup_timestamp Gauge Unix timestamp of the last successful backup
pgbackup_job_duration_seconds Gauge Duration of the most recent backup job
pgbackup_resources_total Gauge Total number of PgBackup resources managed

In addition, controller-runtime automatically exposes reconcile metrics (error rates, queue depth, latency) and Go runtime metrics.

Alert Rules

The operator ships with a PrometheusRule resource (config/prometheus/rules.yaml) containing the following alerts:

Alert Severity Condition
PgBackupNoRecentSuccess critical No successful backup in 48 hours
PgBackupJobsFailing critical Failed backup jobs for 30+ minutes
PgBackupDurationTooLong warning Backup duration exceeds 1 hour
PgBackupReconcileErrors warning >5 reconcile errors in 15 minutes
PgBackupResourceDrift warning PgBackup CR count != CronJob count

Prerequisites

  • Prometheus Operator installed in the cluster (for ServiceMonitor and PrometheusRule CRDs)
  • Prometheus is enabled by default in config/default/kustomization.yaml

Getting Started

Prerequisites

  • Go 1.25+
  • Docker 17.03+
  • kubectl v1.11.3+
  • Access to a Kubernetes cluster

Local Development

# Install CRDs into the cluster
make install

# Run the controller locally
make run

# Apply a sample PgBackup
kubectl apply -f config/samples/backup_v1alpha1_pgbackup.yaml

# Verify the CronJob was created
kubectl get cronjob

# Manually trigger a backup job
kubectl create job --from=cronjob/pgbackup-pgbackup-sample test-backup

Deploy to Cluster

# Build and push the controller image
make docker-build docker-push IMG=<your-registry>/pg-backup-operator:tag

# Deploy the controller
make deploy IMG=<your-registry>/pg-backup-operator:tag

Uninstall

kubectl delete -k config/samples/   # Remove CRs
make uninstall                       # Remove CRDs
make undeploy                        # Remove controller

Development

make build          # Build manager binary
make test           # Run unit tests (envtest)
make test-e2e       # Run e2e tests (requires Kind)
make lint           # Run golangci-lint
make manifests      # Regenerate CRDs and RBAC after editing *_types.go
make generate       # Regenerate DeepCopy methods after editing *_types.go

Architecture

api/v1alpha1/pgbackup_types.go              # CRD spec/status definitions
internal/controller/pgbackup_controller.go   # Reconciliation loop + metrics
cmd/main.go                                  # Manager entry point
config/prometheus/monitor.yaml               # ServiceMonitor for Prometheus scraping
config/prometheus/rules.yaml                 # PrometheusRule alert definitions

The controller watches PgBackup resources and reconciles them into CronJob objects. Each CronJob runs a worker container with postgresql-client and aws-cli that executes pg_dump and uploads the result to S3. The controller also watches child Job resources to collect backup success/failure metrics for Prometheus.

License

Copyright 2026.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Kubernetes Operator for automating PostgreSQL backups to AWS S3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors