Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 21 additions & 80 deletions hadoop-hdds/docs/content/design/diskbalancer.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,81 +119,32 @@ and is not already being moved by another balancing operation. To optimize perfo
containers repeatedly, it caches the list of containers for each volume which auto expires after one hour of its last
used time or if the container iterator for that is invalidated on full utilisation.

## CLI Interface
## Security Design
DiskBalancer follows the same security model as other services:

The DiskBalancer CLI provides the following commands:
* **Authentication**: Clients communicate directly with datanodes via RPC. In secure clusters, RPC authentication is required (Kerberos).

### Command Syntax
* **Authorization**: After successful authentication, each datanode performs authorization checks using `OzoneAdmins` based on the `ozone.administrators` configuration:
- **Admin operations** (start, stop, update): Require the authenticated user to be in `ozone.administrators` or belong to a group in `ozone.administrators.groups`
- **Read-only operations** (status, report): Do not require admin privileges - any authenticated user can query status and reports

By default, if `ozone.administrators` is not configured, only the user who launched the datanode service has admin privileges. This ensures that DiskBalancer operations are restricted to authorized administrators while allowing read-only access for monitoring purposes.

**Start DiskBalancer:**
```bash
ozone admin datanode diskbalancer start [<datanode-address> ...] [OPTIONS] [--in-service-datanodes]
```

**Stop DiskBalancer:**
```bash
ozone admin datanode diskbalancer stop [<datanode-address> ...] [--in-service-datanodes]
```

**Update Configuration:**
```bash
ozone admin datanode diskbalancer update [<datanode-address> ...] [OPTIONS] [--in-service-datanodes]
```

**Get Status:**
```bash
ozone admin datanode diskbalancer status [<datanode-address> ...] [--in-service-datanodes] [--json]
```

**Get Report:**
```bash
ozone admin datanode diskbalancer report [<datanode-address> ...] [--in-service-datanodes] [--json]
```

### Command Options

| Option | Description | Example |
|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|
| `<datanode-address>` | One or more datanode addresses as positional arguments. Addresses can be:<br>- Hostname (e.g., `DN-1`) - uses default CLIENT_RPC port (9858)<br>- Hostname with port (e.g., `DN-1:9858`)<br>- IP address (e.g., `192.168.1.10`)<br>- IP address with port (e.g., `192.168.1.10:9858`)<br>- Stdin (`-`) - reads datanode addresses from standard input, one per line | `DN-1`<br>`DN-1:9858`<br>`192.168.1.10`<br>`-` |
| `--in-service-datanodes` | It queries SCM for all IN_SERVICE datanodes and executes the command on all of them. | `--in-service-datanodes` |
| `--json` | Format output as JSON. | `--json` |
| `-t/--threshold` | Volume density threshold percentage (default: 10.0). Used with `start` and `update` commands. | `-t 5`<br>`--threshold 5.0` |
| `-b/--bandwidth-in-mb` | Maximum disk bandwidth in MB/s (default: 10). Used with `start` and `update` commands. | `-b 20`<br>`--bandwidth-in-mb 50` |
| `-p/--parallel-thread` | Number of parallel threads (default: 1). Used with `start` and `update` commands. | `-p 5`<br>`--parallel-thread 10` |
| `-s/--stop-after-disk-even` | Stop automatically after disks are balanced (default: false). Used with `start` and `update` commands. | `-s false`<br>`--stop-after-disk-even true` |

### Examples
## CLI Interface Design

```bash
# Start DiskBalancer on a specific datanode
ozone admin datanode diskbalancer start DN-1
The DiskBalancer CLI provides five main commands that communicate directly with datanodes:

# Start DiskBalancer on multiple datanodes
ozone admin datanode diskbalancer start DN-1 DN-2 DN-3
1. **start** - Initiates DiskBalancer on `specified datanodes` or all `in-service-datanodes` with optional configuration parameters
2. **stop** - Stops DiskBalancer operations on specified datanodes.
3. **update** - Updates DiskBalancer configuration.
4. **status** - Retrieves current DiskBalancer status including running state, metrics, and configuration.
5. **report** - Retrieves volume density report showing imbalance analysis.

# Start DiskBalancer on all IN_SERVICE datanodes
ozone admin datanode diskbalancer start --in-service-datanodes

# Start DiskBalancer with configuration parameters
ozone admin datanode diskbalancer start DN-1 -t 5 -b 20 -p 5

# Read datanode addresses from stdin
echo -e "DN-1\nDN-2" | ozone admin datanode diskbalancer start -

# Get status as JSON
ozone admin datanode diskbalancer status --in-service-datanodes --json

# Update configuration on specific datanode (partial update - only specified parameters are updated)
ozone admin datanode diskbalancer update DN-1 -b 50
```

### Authentication and Authorization

* **Authentication**: RPC authentication is required (e.g., via `kinit` in secure clusters). The client's identity is verified by the datanode's RPC layer.

* **Authorization**: Each datanode performs authorization checks using `OzoneAdmins` based on the `ozone.administrators` configuration:
- **Admin operations** (start, stop, update): Require the user to be in `ozone.administrators`
- **Read-only operations** (status, report): Do not require admin privileges
The CLI supports:
- **Direct datanode addressing**: Commands can target specific datanodes by hostname or IP address
- **Batch operations**: The `--in-service-datanodes` flag queries SCM for all IN_SERVICE and HEALTHY datanodes and executes commands on all of them
- **Flexible input**: Datanode addresses can be provided as positional arguments or read from stdin
- **Output formats**: Results can be displayed in human-readable format or JSON for programmatic access

### Operational State Awareness

Expand All @@ -206,17 +157,7 @@ This ensures DiskBalancer respects datanode lifecycle management and does not in

## Feature Flag

The Disk Balancer feature is introduced with a feature flag. By default, this feature is disabled.

The feature can be enabled by setting the following property to `true` in the `ozone-site.xml` configuration file:
`hdds.datanode.disk.balancer.enabled = false`

Developers who wish to test or use the Disk Balancer must explicitly enable it. Once the feature is
considered stable, the default value may be changed to `true` in a future release.

**Note:** This command is hidden from the main help message (`ozone admin datanode --help`). This is because the feature
is currently considered experimental and is disabled by default. The command is, however, fully functional for those who
wish to enable and use the feature.
The DiskBalancer feature is gated behind a feature flag (`hdds.datanode.disk.balancer.enabled`) to allow controlled rollout. By default, the feature is disabled. When disabled, the DiskBalancer service is not initialized on datanodes, and the CLI commands are hidden from the main help output to prevent accidental usage.

## DiskBalancer Metrics

Expand Down
76 changes: 68 additions & 8 deletions hadoop-hdds/docs/content/feature/DiskBalancer.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,74 @@ The Disk Balancer feature is introduced with a feature flag. By default, this fe
The feature can be **enabled** by setting the following property to `true` in the `ozone-site.xml` configuration file:
`hdds.datanode.disk.balancer.enabled = false`

### Authentication and Authorization

DiskBalancer commands communicate directly with datanodes via RPC, requiring proper authentication and authorization configuration.

#### Authentication Configuration

In secure clusters with Kerberos enabled, the datanode must have its Kerberos principal configured for RPC authentication in `ozone-site.xml`:

```xml
<property>
<name>hdds.datanode.kerberos.principal</name>
<value>dn/_HOST@REALM.TLD</value>
<description>
The Datanode service principal. This is typically set to
dn/_HOST@REALM.TLD. Each Datanode will substitute _HOST with its
own fully qualified hostname at startup. The _HOST placeholder
allows using the same configuration setting on all Datanodes.
</description>
</property>
```

**Note**: Without this configuration, DiskBalancer commands will fail with authentication errors in secure clusters.
The client uses this principal to verify the datanode's identity when establishing RPC connections.

#### Authorization Configuration

Each datanode performs authorization checks using `OzoneAdmins` based on the `ozone.administrators` configuration:
- **Admin operations** (start, stop, update): Require the user to be in `ozone.administrators` or belong to a group in `ozone.administrators.groups`
- **Read-only operations** (status, report): Do not require admin privileges - any authenticated user can query status and reports

#### Default Behavior

By default, if `ozone.administrators` is not configured, only the user who launched the datanode service can start, stop,
or update DiskBalancer. This means that in a typical deployment where the datanode runs as user `dn`, only that user has
admin privileges for DiskBalancer operations.

#### Enabling Authorization for Additional Users

To allow other users to perform DiskBalancer admin operations (start, stop, update), configure the `ozone.administrators` property in `ozone-site.xml`:

**Example 1: Single user**
```xml
<property>
<name>ozone.administrators</name>
<value>scm</value>
</property>
```

**Example 2: Multiple users**
```xml
<property>
<name>ozone.administrators</name>
<value>scm,hdfs</value>
</property>
```

**Example 3: Using groups**
```xml
<property>
<name>ozone.administrators.groups</name>
<value>ozone-admins,cluster-operators</value>
</property>
```

**Note**: `ozone-admins` and `cluster-operators` are example group names. Replace them with actual
group names from your environment. After updating the `ozone.administrators` configuration,
restart the datanode service for the changes to take effect.

## Command Line Usage
The DiskBalancer is managed through the `ozone admin datanode diskbalancer` command.

Expand Down Expand Up @@ -162,14 +230,6 @@ ozone admin datanode diskbalancer report --in-service-datanodes
ozone admin datanode diskbalancer report --in-service-datanodes --json
```

### Authentication and Authorization

* **Authentication**: RPC authentication is required (e.g., via `kinit` in secure clusters). The client's identity is verified by the datanode's RPC layer.

* **Authorization**: Each datanode performs authorization checks using `OzoneAdmins` based on the `ozone.administrators` configuration:
- **Admin operations** (start, stop, update): Require the user to be in `ozone.administrators`
- **Read-only operations** (status, report): Do not require admin privileges

## **DiskBalancer Configurations**

The DiskBalancer's behavior can be controlled using the following configuration properties in `ozone-site.xml`.
Expand Down
75 changes: 68 additions & 7 deletions hadoop-hdds/docs/content/feature/DiskBalancer.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,74 @@ summary: 数据节点的磁盘平衡器.
可以通过在“ozone-site.xml”配置文件中将以下属性设置为“true”来**启用**该功能:
`hdds.datanode.disk.balancer.enabled = false`

### 身份验证和授权

DiskBalancer 命令通过 RPC 直接与数据节点通信,因此需要进行正确的身份验证和授权配置。

#### 身份验证配置

在启用了 Kerberos 的安全集群中,必须在 `ozone-site.xml` 文件中配置数据节点的 Kerberos 主体以进行 RPC 身份验证:

```xml
<property>
<name>hdds.datanode.kerberos.principal</name>
<value>dn/_HOST@REALM.TLD</value>
<description>
The Datanode service principal. This is typically set to
dn/_HOST@REALM.TLD. Each Datanode will substitute _HOST with its
own fully qualified hostname at startup. The _HOST placeholder
allows using the same configuration setting on all Datanodes.
</description>

</property>
```

**注意**:如果没有此配置,DiskBalancer 命令在安全集群中将因身份验证错误而失败。 客户端使用此主体在建立 RPC 连接时验证数据节点的身份。

#### 授权配置

每个数据节点都使用 `OzoneAdmins` 根据 `ozone.administrators` 配置执行授权检查:

- **管理员操作**(启动、停止、更新):要求用户位于 `ozone.administrators` 成员列表中,或属于 `ozone.administrators.groups` 中的某个组。

- **只读操作**(状态、报告):不需要管理员权限 - 任何已认证的用户都可以查询状态和报告。

#### 默认行为

默认情况下,如果未配置 `ozone.administrators`,则只有启动数据节点服务的用户才能启动、停止或更新 DiskBalancer。

这意味着在典型的部署中,如果数据节点以用户 `dn` 的身份运行,则只有该用户拥有 DiskBalancer 操作的 管理员权限。

#### 为其他用户启用身份验证

要允许其他用户执行 DiskBalancer 管理操作(启动、停止、更新),请在 `ozone-site.xml` 文件中配置 `ozone.administrators` 属性:

**Example 1: Single user**
```xml
<property>
<name>ozone.administrators</name>
<value>scm</value>
</property>
```

**Example 2: Multiple users**
```xml
<property>
<name>ozone.administrators</name>
<value>scm,hdfs</value>
</property>
```

**Example 3: Using groups**
```xml
<property>
<name>ozone.administrators.groups</name>
<value>ozone-admins,cluster-operators</value>
</property>
```
**注意**:`ozone-admins` 和 `cluster-operators` 是示例组名称。请将其替换为您环境中的实际组名称。 更新 `ozone.administrators` 配置后,
请重启数据节点服务以使更改生效。

## 命令行用法
DiskBalancer 通过 `ozone admin datanode diskbalancer` 命令进行管理。

Expand Down Expand Up @@ -157,13 +225,6 @@ ozone admin datanode diskbalancer report --in-service-datanodes
# 以 JSON 格式获取报告
ozone admin datanode diskbalancer report --in-service-datanodes --json
```
### 身份验证和授权

* **身份验证**:需要 RPC 身份验证(例如,在安全集群中通过 `kinit`)。客户端的身份由数据节点的 RPC 层验证。

* **授权**:每个数据节点都使用 `OzoneAdmins` 根据 `ozone.administrators` 配置执行授权检查:
- **管理操作**(启动、停止、更新):要求用户位于 `ozone.administrators` 成员中
- **只读操作**(状态、报告):不需要管理员权限

## DiskBalancer Configurations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,22 +91,23 @@ protected void displayResults(List<String> successNodes, List<String> failedNode

private String generateStatus(List<DatanodeDiskBalancerInfoProto> protos) {
StringBuilder formatBuilder = new StringBuilder("Status result:%n" +
"%-35s %-15s %-15s %-15s %-12s %-12s %-12s %-15s %-15s %-15s%n");
"%-35s %-15s %-15s %-15s %-12s %-20s %-12s %-12s %-15s %-18s %-20s%n");

List<String> contentList = new ArrayList<>();
contentList.add("Datanode");
contentList.add("Status");
contentList.add("Threshold(%)");
contentList.add("BandwidthInMB");
contentList.add("Threads");
contentList.add("StopAfterDiskEven");
contentList.add("SuccessMove");
contentList.add("FailureMove");
contentList.add("BytesMoved(MB)");
contentList.add("EstBytesToMove(MB)");
contentList.add("EstTimeLeft(min)");

for (HddsProtos.DatanodeDiskBalancerInfoProto proto : protos) {
formatBuilder.append("%-35s %-15s %-15s %-15s %-12s %-12s %-12s %-15s %-15s %-15s%n");
formatBuilder.append("%-35s %-15s %-15s %-15s %-12s %-20s %-12s %-12s %-15s %-18s %-20s%n");
long estimatedTimeLeft = calculateEstimatedTimeLeft(proto);
long bytesMovedMB = (long) Math.ceil(proto.getBytesMoved() / (1024.0 * 1024.0));
long bytesToMoveMB = (long) Math.ceil(proto.getBytesToMove() / (1024.0 * 1024.0));
Expand All @@ -119,6 +120,8 @@ private String generateStatus(List<DatanodeDiskBalancerInfoProto> protos) {
String.valueOf(proto.getDiskBalancerConf().getDiskBandwidthInMB()));
contentList.add(
String.valueOf(proto.getDiskBalancerConf().getParallelThread()));
contentList.add(
String.valueOf(proto.getDiskBalancerConf().getStopAfterDiskEven()));
contentList.add(String.valueOf(proto.getSuccessMoveCount()));
contentList.add(String.valueOf(proto.getFailureMoveCount()));
contentList.add(String.valueOf(bytesMovedMB));
Expand Down Expand Up @@ -153,6 +156,7 @@ private Map<String, Object> createStatusResult(DatanodeDiskBalancerInfoProto sta
result.put("threshold", status.getDiskBalancerConf().getThreshold());
result.put("bandwidthInMB", status.getDiskBalancerConf().getDiskBandwidthInMB());
result.put("threads", status.getDiskBalancerConf().getParallelThread());
result.put("stopAfterDiskEven", status.getDiskBalancerConf().getStopAfterDiskEven());
result.put("successMove", status.getSuccessMoveCount());
result.put("failureMove", status.getFailureMoveCount());
result.put("bytesMovedMB", (long) Math.ceil(status.getBytesMoved() / (1024.0 * 1024.0)));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,7 @@ public void testStatusDiskBalancerWithJson() throws Exception {
assertTrue(output.contains("\"threshold\""));
assertTrue(output.contains("\"bandwidthInMB\""));
assertTrue(output.contains("\"threads\""));
assertTrue(output.contains("\"stopAfterDiskEven\""));
}
}

Expand Down Expand Up @@ -648,6 +649,7 @@ private DatanodeDiskBalancerInfoProto createStatusProto(String hostname,
.setThreshold(threshold)
.setDiskBandwidthInMB(bandwidthInMB)
.setParallelThread(parallelThread)
.setStopAfterDiskEven(true)
.build();

return DatanodeDiskBalancerInfoProto.newBuilder()
Expand Down
Loading