English | 中文
Distributed streaming platform for high-throughput, low-latency data pipelines on Kubernetes.
Apache Kafka is a distributed streaming platform designed for high-throughput, low-latency data stream processing. This package provides an enterprise-grade Kafka deployment on Kubernetes with cluster management, monitoring, high availability, security authentication, and integration with ZooKeeper for reliable message delivery and stream processing.
- Message queuing: High-throughput distributed messaging system
- Stream processing: Real-time data stream processing and transformation
- Event-driven architecture: Supports event-driven patterns and microservice communication
- Data integration: Acts as a data pipeline connecting heterogeneous systems
- Log aggregation: Centralized log collection and processing
- Cluster management: Multi-node Kafka cluster deployment
- Automatic failover: Automatic partition reassignment on node failure
- Durable storage: Persistent message storage with data recovery support
- Monitoring and alerting: Integrated Prometheus metrics and alert rules
- Security authentication: SASL/SCRAM-SHA-256 authentication support
- Log management: Structured log output and log collection
- Resource management: CPU and memory resource limits
- Node affinity: Pod anti-affinity and node affinity configuration
- Tolerations: Taint toleration settings
- Health checks: Built-in liveness and readiness probes
- Metrics export: Kafka metrics exporter for Prometheus
- Management UI: Integrated Kafka Manager administration tool
- Cruise Control: Automatic cluster rebalancing and optimization
- External access: NodePort and external IP access support
- Disaster recovery: Cross-cluster data replication
- Performance tuning: Built-in performance optimization configuration
- Scalability: Horizontal scaling and dynamic expansion
- 3.9.1 (latest)
- 3.1.0
- Kafka Operator: v1.12.0
- Kafka Manager: 1.1.0
- Cruise Control: 2.5.108
- Kafka Exporter: v1.3.1
- Use cases: Development, testing, and quick deployment
- Replicas: 1
- Traits: Minimal resource footprint, simple deployment
- Use cases: Production workloads
- Replicas: 3
- Traits: High availability with automatic failover
- Use cases: Production environments with high concurrency
- Replicas: Configurable (default 3)
- Traits: Full cluster capabilities, supports high-throughput workloads
+---------------------------------------------------------+
| Kafka Cluster |
+---------------------------------------------------------+
| +-----------+ +-----------+ +-----------+ |
| | Broker 0 | | Broker 1 | | Broker 2 | |
| | +-------+ | | +-------+ | | +-------+ | |
| | |Topic A| | | |Topic B| | | |Topic C| | |
| | |Part 0 | | | |Part 1 | | | |Part 2 | | |
| | +-------+ | | +-------+ | | +-------+ | |
| +-----------+ +-----------+ +-----------+ |
+---------------------------------------------------------+
| Kafka Operator |
| +-----------+ +-----------+ +-------------+ |
| | Manager | | Exporter | |Cruise Control| |
| +-----------+ +-----------+ +-------------+ |
+---------------------------------------------------------+
| ZooKeeper Cluster |
| +-----------+ +-----------+ +-----------+ |
| | ZK Node | | ZK Node | | ZK Node | |
| +-----------+ +-----------+ +-----------+ |
+---------------------------------------------------------+
| Kubernetes Resources |
| * StatefulSet (Kafka Broker instances) |
| * Service (service discovery) |
| * PersistentVolumeClaim (data persistence) |
| * ConfigMap (configuration management) |
| * Secret (authentication credentials) |
+---------------------------------------------------------+
- Kafka Broker: Core message broker engine
- Kafka Operator: Cluster lifecycle management controller
- Kafka Manager: Web-based cluster administration tool
- Cruise Control: Automatic cluster rebalancing and optimization
- Kafka Exporter: Prometheus metrics collector
- ZooKeeper: Distributed coordination service (external dependency)
- CPU limit: 200m / CPU request: 100m
- Memory limit: 512Mi / Memory request: 256Mi
- CPU limit: 1 core / CPU request: 1 core
- Memory limit: 4Gi / Memory request: 4Gi
- Manager: CPU 500m, memory 512Mi
- Exporter: CPU 200m, memory 512Mi
- Cruise Control: CPU 500m, memory 1Gi
- Kubernetes 1.26+
- OpenSaola Operator deployed
- saola-cli installed
- A running ZooKeeper cluster
# Publish the package
saola publish kafka/
# Install the operator
saola operator create kafka-operator --type Kafka --version 3.9.1
# Create an instance
saola middleware create my-kafka --type Kafka --version 3.9.1
# Check status
saola middleware get my-kafka| Action | Description |
|---|---|
| restart | Restart the middleware instance |
| scale | Scale the number of broker replicas |
| datasecurity | Manage data security settings |
| disaster | Manage disaster recovery configuration |
| cluster-expose-external | Expose the cluster for external access |
| cluster-expose-manager | Expose the Kafka Manager UI |
Key parameters can be customized via the baseline configuration. See manifests/*parameters.yaml for the full parameter reference.
# Recommended production settings
resources:
kafka:
limits:
cpu: "2"
memory: "8Gi"
requests:
cpu: "1"
memory: "4Gi"
replicas: 3
volume:
size: 100 # GB
storageClass: "fast-ssd"- Cluster health:
kafka_brokers - Consumer lag:
kafka_consumergroup_lag - Partition status:
kafka_controller_kafkacontroller_offlinepartitionscount - Resource usage: CPU, memory, and disk utilization
- Use the Standard baseline
- Single-node deployment with minimal resources
- Suitable for functional verification and development
- Use the Highly Available or Cluster baseline
- At least 3-node cluster deployment
- Configure Pod anti-affinity to spread nodes across hosts
- Enable monitoring and alerting
- Enable SASL/SCRAM-SHA-256 authentication
- Enforce strong password policies
- Rotate credentials periodically
- Apply least-privilege access controls
- Set partition counts based on consumer parallelism
- Configure appropriate replication factors
- Tune JVM heap and GC parameters
- Monitor partition distribution to avoid hot spots
- Use Kafka Manager for cluster administration
- Enable Cruise Control for automatic rebalancing
- Routinely check cluster health and performance metrics
- Define log retention policies and clean up expired data
- ZooKeeper dependency: Ensure the ZooKeeper cluster is running stably
- Network latency: Consider the impact of network latency on performance
- Data consistency: Configure the ACK level appropriately
- Version compatibility: Ensure client versions are compatible with the Kafka broker version
- Resource planning: Size resources according to actual traffic volume
- Backup strategy: Define a data backup and recovery plan
| Project | Description |
|---|---|
| OpenSaola Operator | Core Kubernetes operator for middleware lifecycle management |
| saola-cli | Command-line tool for middleware management |
| PostgreSQL | PostgreSQL database package |
| MySQL | MySQL database package |
| Redis | Redis in-memory data store package |
| Elasticsearch | Elasticsearch search engine package |
| ZooKeeper | Apache ZooKeeper coordination service package |
| RabbitMQ | RabbitMQ message broker package |
This project is licensed under the Apache License 2.0.