Skip to content

harmonycloud/kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Kafka

English | 中文

Distributed streaming platform for high-throughput, low-latency data pipelines on Kubernetes.

Overview

Apache Kafka is a distributed streaming platform designed for high-throughput, low-latency data stream processing. This package provides an enterprise-grade Kafka deployment on Kubernetes with cluster management, monitoring, high availability, security authentication, and integration with ZooKeeper for reliable message delivery and stream processing.

Features

Core Capabilities

  • Message queuing: High-throughput distributed messaging system
  • Stream processing: Real-time data stream processing and transformation
  • Event-driven architecture: Supports event-driven patterns and microservice communication
  • Data integration: Acts as a data pipeline connecting heterogeneous systems
  • Log aggregation: Centralized log collection and processing

Enterprise Features

  • Cluster management: Multi-node Kafka cluster deployment
  • Automatic failover: Automatic partition reassignment on node failure
  • Durable storage: Persistent message storage with data recovery support
  • Monitoring and alerting: Integrated Prometheus metrics and alert rules
  • Security authentication: SASL/SCRAM-SHA-256 authentication support
  • Log management: Structured log output and log collection

Operations Features

  • Resource management: CPU and memory resource limits
  • Node affinity: Pod anti-affinity and node affinity configuration
  • Tolerations: Taint toleration settings
  • Health checks: Built-in liveness and readiness probes
  • Metrics export: Kafka metrics exporter for Prometheus
  • Management UI: Integrated Kafka Manager administration tool
  • Cruise Control: Automatic cluster rebalancing and optimization

Advanced Features

  • External access: NodePort and external IP access support
  • Disaster recovery: Cross-cluster data replication
  • Performance tuning: Built-in performance optimization configuration
  • Scalability: Horizontal scaling and dynamic expansion

Supported Versions

Kafka Releases

  • 3.9.1 (latest)
  • 3.1.0

Component Releases

  • Kafka Operator: v1.12.0
  • Kafka Manager: 1.1.0
  • Cruise Control: 2.5.108
  • Kafka Exporter: v1.3.1

Architecture

Deployment Modes

1. Standard (operator-standard)

  • Use cases: Development, testing, and quick deployment
  • Replicas: 1
  • Traits: Minimal resource footprint, simple deployment

2. Highly Available (operator-highly-available)

  • Use cases: Production workloads
  • Replicas: 3
  • Traits: High availability with automatic failover

3. Cluster (cluster)

  • Use cases: Production environments with high concurrency
  • Replicas: Configurable (default 3)
  • Traits: Full cluster capabilities, supports high-throughput workloads

Technical Architecture

+---------------------------------------------------------+
|                    Kafka Cluster                        |
+---------------------------------------------------------+
|  +-----------+  +-----------+  +-----------+            |
|  |  Broker 0 |  |  Broker 1 |  |  Broker 2 |            |
|  | +-------+ |  | +-------+ |  | +-------+ |            |
|  | |Topic A| |  | |Topic B| |  | |Topic C| |            |
|  | |Part 0 | |  | |Part 1 | |  | |Part 2 | |            |
|  | +-------+ |  | +-------+ |  | +-------+ |            |
|  +-----------+  +-----------+  +-----------+            |
+---------------------------------------------------------+
|                   Kafka Operator                        |
|  +-----------+  +-----------+  +-------------+          |
|  |  Manager  |  | Exporter  |  |Cruise Control|          |
|  +-----------+  +-----------+  +-------------+          |
+---------------------------------------------------------+
|                  ZooKeeper Cluster                      |
|  +-----------+  +-----------+  +-----------+            |
|  |  ZK Node  |  |  ZK Node  |  |  ZK Node  |            |
|  +-----------+  +-----------+  +-----------+            |
+---------------------------------------------------------+
|                 Kubernetes Resources                    |
|  * StatefulSet (Kafka Broker instances)                 |
|  * Service (service discovery)                          |
|  * PersistentVolumeClaim (data persistence)             |
|  * ConfigMap (configuration management)                 |
|  * Secret (authentication credentials)                  |
+---------------------------------------------------------+

Component Overview

  • Kafka Broker: Core message broker engine
  • Kafka Operator: Cluster lifecycle management controller
  • Kafka Manager: Web-based cluster administration tool
  • Cruise Control: Automatic cluster rebalancing and optimization
  • Kafka Exporter: Prometheus metrics collector
  • ZooKeeper: Distributed coordination service (external dependency)

Resource Requirements

Operator

  • CPU limit: 200m / CPU request: 100m
  • Memory limit: 512Mi / Memory request: 256Mi

Broker (default)

  • CPU limit: 1 core / CPU request: 1 core
  • Memory limit: 4Gi / Memory request: 4Gi

Management Components

  • Manager: CPU 500m, memory 512Mi
  • Exporter: CPU 200m, memory 512Mi
  • Cruise Control: CPU 500m, memory 1Gi

Prerequisites

Quick Start

# Publish the package
saola publish kafka/

# Install the operator
saola operator create kafka-operator --type Kafka --version 3.9.1

# Create an instance
saola middleware create my-kafka --type Kafka --version 3.9.1

# Check status
saola middleware get my-kafka

Available Actions

Action Description
restart Restart the middleware instance
scale Scale the number of broker replicas
datasecurity Manage data security settings
disaster Manage disaster recovery configuration
cluster-expose-external Expose the cluster for external access
cluster-expose-manager Expose the Kafka Manager UI

Configuration

Key parameters can be customized via the baseline configuration. See manifests/*parameters.yaml for the full parameter reference.

Resource Planning

# Recommended production settings
resources:
  kafka:
    limits:
      cpu: "2"
      memory: "8Gi"
    requests:
      cpu: "1"
      memory: "4Gi"
    replicas: 3
    volume:
      size: 100  # GB
      storageClass: "fast-ssd"

Key Monitoring Metrics

  • Cluster health: kafka_brokers
  • Consumer lag: kafka_consumergroup_lag
  • Partition status: kafka_controller_kafkacontroller_offlinepartitionscount
  • Resource usage: CPU, memory, and disk utilization

Usage Guidance

Environment Selection

Development and Test

  • Use the Standard baseline
  • Single-node deployment with minimal resources
  • Suitable for functional verification and development

Production

  • Use the Highly Available or Cluster baseline
  • At least 3-node cluster deployment
  • Configure Pod anti-affinity to spread nodes across hosts
  • Enable monitoring and alerting

Best Practices

Security

  • Enable SASL/SCRAM-SHA-256 authentication
  • Enforce strong password policies
  • Rotate credentials periodically
  • Apply least-privilege access controls

Performance Tuning

  • Set partition counts based on consumer parallelism
  • Configure appropriate replication factors
  • Tune JVM heap and GC parameters
  • Monitor partition distribution to avoid hot spots

Operations

  • Use Kafka Manager for cluster administration
  • Enable Cruise Control for automatic rebalancing
  • Routinely check cluster health and performance metrics
  • Define log retention policies and clean up expired data

Important Notes

  1. ZooKeeper dependency: Ensure the ZooKeeper cluster is running stably
  2. Network latency: Consider the impact of network latency on performance
  3. Data consistency: Configure the ACK level appropriately
  4. Version compatibility: Ensure client versions are compatible with the Kafka broker version
  5. Resource planning: Size resources according to actual traffic volume
  6. Backup strategy: Define a data backup and recovery plan

Related Projects

Project Description
OpenSaola Operator Core Kubernetes operator for middleware lifecycle management
saola-cli Command-line tool for middleware management
PostgreSQL PostgreSQL database package
MySQL MySQL database package
Redis Redis in-memory data store package
Elasticsearch Elasticsearch search engine package
ZooKeeper Apache ZooKeeper coordination service package
RabbitMQ RabbitMQ message broker package

License

This project is licensed under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors