diff --git a/hadoop-hdds/docs/content/start/ProductionDeployment.md b/hadoop-hdds/docs/content/start/ProductionDeployment.md new file mode 100644 index 000000000000..ed24a7b26710 --- /dev/null +++ b/hadoop-hdds/docs/content/start/ProductionDeployment.md @@ -0,0 +1,89 @@ +--- +title: Production Deployment +weight: 6 +menu: + main: + parent: Getting Started +--- + + +This document provides guidance on the requirements and best practices for a production deployment of Apache Ozone. + +## Ozone Components + +A typical production Ozone cluster includes the following services: + +* **Ozone Manager (OM)**: Manages the namespace and metadata of the Ozone cluster. A production cluster requires 3 OM instances for high availability. +* **Storage Container Manager (SCM)**: Manages the data nodes and pipelines. A production cluster requires 3 SCM instances for high availability. +* **DataNode**: Stores the actual data in containers. A production cluster requires at least 3 DataNodes. +* **Recon**: A web-based UI for monitoring and managing the Ozone cluster. A Recon server is strongly recommended, though not required. +* **S3 Gateway (S3G)**: An S3-compatible gateway for accessing Ozone. Multiple S3 Gateway instances are strongly recommended to load balance S3 traffic. +* **HttpFs**: An HDFS-compatible API for accessing Ozone. This is an optional component. + +## Requirements + +### System Requirements + +* **Hardware**: Bare metal machines are recommended for optimal performance. Virtual machines or containers are not recommended for production deployments. +* **Operating System**: Linux (recommended distributions: Red Hat 8/Rocky 8+, Ubuntu, SUSE; supported architectures: x86/ARM). +* **Java Development Kit (JDK)**: Version 8 or higher. +* **Time Synchronization**: A time synchronization service such as Chrony or ntpd must be enabled to prevent time drift. + +### Memory Requirements + +* **Ozone Manager (OM), Storage Container Manager (SCM), and Recon**: Recommended heap size in large production clusters is 64GB. +* **DataNode, S3 Gateway, and HttpFs**: Recommended heap size is 31GB. + +### Storage Requirements + +* **Ozone Manager (OM), Storage Container Manager (SCM), and Recon Metadata Storage**: Use SAS SSD or NVMe SSD for metadata (RocksDB and Ratis) to ensure optimal performance. It is recommended to use RAID 1 (disk mirroring) for the metadata disks to protect against disk failures. +* **DataNode Storage**: + * **Ratis Log**: Use SAS SSD or NVMe SSD for the Ratis log directory for low latency writes. + * **Container Data**: Hard disks are acceptable for container data storage. + * **Disk Configuration**: It is recommended to use a JBOD (Just a Bunch Of Disks) configuration instead of RAID. Ozone is a replicated distributed storage system and handles data redundancy. Using RAID can decrease performance without providing additional data protection benefits. +* **Storage Type**: Use direct-attached storage. Do not use Network Attached Storage (NAS) or Storage Area Network (SAN). + +### Network Requirements + +* **Network Bandwidth**: A minimum of 25Gbps network card bandwidth is recommended. +* **Network Topology**: A leaf-spine network topology with an oversubscription ratio below 3:1 is recommended for predictable performance. + +### Security Requirements (Optional but Recommended) + +* **Kerberos**: A Kerberos environment, including a Key Distribution Center (KDC), is recommended for enhanced security. + +## Recommended Configurations + +### Linux Kernel + +* **CPU Governor**: Set the CPU scaling driver to `performance` mode to maximize performance. +* **Transparent Hugepage**: Disable Transparent Hugepage to avoid performance issues. +* **SELinux**: Disable SELinux. +* **Swappiness**: Set `vm.swappiness=1` to minimize swapping. + +### Local File System + +* **LVM**: Disable Logical Volume Manager (LVM) for data drives. +* **File System**: Use `ext4` or `xfs` file systems. +* **Mount Options**: Mount drives with the `noatime` option to reduce unnecessary disk writes. For SSDs, also add the `discard` option. + +### Ozone Configuration + +* **Monitoring**: Install Prometheus and Grafana for monitoring the Ozone cluster. For audit logs, consider using a log ingestion framework such as the ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar frameworks. Alternatively, you can use Apache Ranger to manage audit logs. +* **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit` and `ozone.scm.ec.pipeline.minimum`. +* **Heap Sizes**: Configure sufficient heap sizes for Ozone Manager (OM), Storage Container Manager (SCM), Recon, DataNode, S3 Gateway (S3G), and HttpFs services to ensure stability. diff --git a/hadoop-hdds/docs/content/start/ProductionDeployment.zh.md b/hadoop-hdds/docs/content/start/ProductionDeployment.zh.md new file mode 100644 index 000000000000..4620ccf31bc8 --- /dev/null +++ b/hadoop-hdds/docs/content/start/ProductionDeployment.zh.md @@ -0,0 +1,69 @@ +--- +title: 生产环境部署 +weight: 6 +menu: + main: + parent: 快速入门 +--- + + +本文档旨在为 Apache Ozone 的生产环境部署提供需求和最佳实践的指导。 + +## 需求 + +### 系统需求 + +* **操作系统**: Linux(推荐发行版:Red Hat 8/Rocky 8+、Ubuntu、SUSE;支持架构:x86/ARM)。 +* **Java 开发工具包 (JDK)**: 版本 8 或更高。 +* **时间同步**: 必须启用时间同步服务(如 Chrony 或 ntpd)以防止时间漂移。 + +### 存储需求 + +* **元数据存储**: 为确保最佳性能,请使用 SAS SSD 或 NVMe SSD 存储元数据(RocksDB 和 Ratis)。 +* **DataNode 存储**: DataNode 数据存储可使用硬盘。 +* **存储类型**: 请使用直接附加存储。不要使用网络附加存储 (NAS) 或存储区域网络 (SAN)。 + +### 网络需求 + +* **网络带宽**: 建议网卡带宽至少为 25Gbps。 +* **网络拓扑**: 为实现可预测的性能,建议采用超分比例低于 3:1 的叶脊网络拓扑。 + +### 安全需求 (可选但推荐) + +* **Kerberos**: 为增强安全性,建议使用包括密钥分发中心 (KDC) 在内的 Kerberos 环境。 + +## 推荐配置 + +### Linux 内核 + +* **CPU 调节器**: 将 CPU 调节驱动设置为 `performance` 模式以最大化性能。 +* **透明大页**: 禁用透明大页以避免性能问题。 +* **SELinux**: 禁用 SELinux。 +* **Swappiness**: 设置 `vm.swappiness=1` 以最小化交换。 + +### 本地文件系统 + +* **LVM**: 禁用数据驱动器的逻辑卷管理器 (LVM)。 +* **文件系统**: 使用 `ext4` 或 `xfs` 文件系统。 +* **挂载选项**: 使用 `noatime` 选项挂载驱动器以减少不必要的磁盘写入。对于 SSD,还需添加 `discard` 选项。 + +### Ozone 配置 + +* **监控**: 安装 Prometheus 和 Grafana 以监控 Ozone 集群。 +* **管道限制**: 通过调整 `ozone.scm.datanode.pipeline.limit` 和 `ozone.scm.ec.pipeline.minimum` 来增加允许的写入管道数量,以更好地适应您的工作负载。 +* **堆大小**: 为 Ozone Manager (OM)、Storage Container Manager (SCM)、Recon、DataNode、S3 Gateway (S3G) 和 HttpFs 服务配置足够的堆大小,以确保稳定性。