English | 中文
PostgreSQL is a powerful open-source object-relational database system with more than 30 years of active development, known for reliability, robustness, and performance. This service builds on PostgreSQL to deliver enterprise-grade, highly available, and high-performance database capabilities.
- High availability: Supports primary-standby replication and automatic failover
- Durable storage: Relies on Kubernetes persistent volumes
- Monitoring and alerting: Integrates Prometheus metrics and alert rules
- Backup and restore: Supports logical backups and WAL archiving
- Connection pooling: Optional PgBouncer integration
- Security and authentication: Multiple authentication methods plus SSL
- Auto scaling: Dynamically adjusts the replica count
- Resource management: Flexible CPU and memory sizing
- Network isolation: Works with host-network and Pod-network modes
- Time zone control: Defaults to the Asia/Shanghai time zone
- Logging: Supports file-based and stdout logging
- Auditing: Bundles the pgaudit extension
- 16.1 (latest)
- 14.13 (recommended)
- 14.7 (default)
- 14.2
- 13.8
- 13.6
- 13.5
- 12.10
- 11.15
- PostgreSQL Operator: v1.7.1-2.8.2
- Spilo image: v1.5.0-spilo
- PostgreSQL Exporter: v0.17.1-1.0.0-exporter
- PgBouncer: master-19
- Logical Backup: v1.7.1
- Use cases: Development, testing, and small workloads
- Traits: Single instance, minimal resources, quick to deploy
- Topology: 1 PostgreSQL instance
- Use cases: Production workloads that need read/write separation
- Traits: Primary-replica replication with automatic failover
- Topology: 1 primary + N replicas, count configurable
- Use cases: Mission-critical workloads with strict uptime targets
- Traits: Multi-instance deployment, automatic failover, strong consistency
- Topology: 3+ instances with synchronous replication
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Cluster │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Master │ │ Replica │ │ Replica │ │
│ │ (Primary) │ │ (Standby) │ │ (Standby) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Patroni (HA Manager) │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Service │ │ ConfigMap │ │ Secret │ │
│ │ (Endpoints)│ │ (Config) │ │ (Passwords) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Kubernetes Storage (PVC) │
└─────────────────────────────────────────────────────────────┘
- PostgreSQL: Core database engine
- Patroni: HA manager for failover orchestration
- Spilo: Container image bundling PostgreSQL, Patroni, and tooling
- PostgreSQL Exporter: Prometheus metrics collector
- PgBouncer: Optional connection pooler
- Logical Backup: Logical backup utility
- Recommended topology: Standalone
- Resources: CPU 1 core, memory 4 Gi, storage 20 Gi
- Suggested version: PostgreSQL 14.7
- Monitoring: Basic metrics only
- Recommended topology: Primary-standby or highly available
- Resources: CPU 2+ cores, memory 8+ Gi, storage 100+ Gi
- Suggested version: PostgreSQL 14.13 or 16.1
- Monitoring: Full metrics plus alerting
# Recommended production settings
resources:
limits:
cpu: "2" # 2 cores
memory: "8Gi" # 8 GB
requests:
cpu: "1" # 1 core
memory: "4Gi" # 4 GB
# Storage profile
volume:
size: 100 # 100 GB
storageClass: "fast-ssd" # SSD-backed class# Primary-standby configuration
patroni:
ttl: 30 # Heartbeat timeout
loop_wait: 10 # Check interval
retry_timeout: 10 # Retry timeout
synchronous_mode: true # Enable sync mode
maximum_lag_on_failover: 33554432 # Lag threshold# Monitoring and alerting
monitor:
enableAlert: true # Turn on alerts
enableExporter: true # Enable metrics exporter
exporterResource:
exporter_default_cpu_limit: 100m
exporter_default_memory_limit: 128Mi- Enforce strong passwords with mixed character classes
- Enable SSL/TLS for all connections
- Rotate database credentials periodically
- Apply least-privilege access controls
- Adjust
shared_buffersandwork_memto match workload - Enable the
pg_stat_statementsextension for query insights - Tune
checkpointandwalparameters for throughput - Use a connection pool to limit connection churn
- Enable WAL archiving for continuous protection
- Schedule recurring logical backups
- Regularly test restore drills
- Store backups in a secure off-site location
- Track connection count, query latency, and disk usage
- Define alert thresholds for critical KPIs
- Review slow-query logs routinely
- Watch replication lag across replicas
- Run maintenance tasks (VACUUM, ANALYZE) on a schedule
- Monitor database size and growth trends
- Establish capacity plans
- Document incident response procedures
- Replication lag: Validate network health and disk I/O
- Too many connections: Raise
max_connectionsor rely on pooling - Insufficient disk: Purge logs or expand persistent volumes
- Slow queries: Review execution plans, indexes, and SQL
- Primary failure: Patroni triggers automatic failover
- Data corruption: Restore from backups or rebuild replicas
- Network partition: Wait for recovery or intervene manually
- Rehearse the upgrade in a staging environment first
- Prepare detailed upgrade and rollback documentation
- Schedule changes during low-traffic windows
- Execute comprehensive post-upgrade tests
- Back up the existing database
- Upgrade the PostgreSQL Operator
- Update the PostgreSQL version configuration
- Perform a rolling restart or upgrade
- Validate that the upgrade succeeded
| Project | Description |
|---|---|
| OpenSaola Operator | Core Kubernetes operator for middleware lifecycle management |
| saola-cli | Command-line tool for middleware management |
| MySQL | MySQL database package |
| Kafka | Apache Kafka streaming platform package |
| Redis | Redis in-memory data store package |
| Elasticsearch | Elasticsearch search engine package |
| ZooKeeper | Apache ZooKeeper coordination service package |
| RabbitMQ | RabbitMQ message broker package |
Note: Always verify configuration and functionality thoroughly in a test environment before deploying to production.