Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@
]
},
"features/pipeline_parallel",
"features/nvme_offload"
"features/nvme_offload",
"features/cluster_utils"
]
},
{
Expand Down
32 changes: 32 additions & 0 deletions docs/source/en/features/cluster_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Cluster Utilities

Author: [Hongxin Liu](https://github.com/ver217)

**Prerequisite:**
- [Distributed Training](../concepts/distributed_training.md)

## Introduction

We provide a utility class `colossalai.cluster.DistCoordinator` to coordinate distributed training. It's useful to get various information about the cluster, such as the number of nodes, the number of processes per node, etc.

## API Reference

{{ autodoc:colossalai.cluster.DistCoordinator }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}

{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}

{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}

{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}

{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}
32 changes: 32 additions & 0 deletions docs/source/zh-Hans/features/cluster_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# 集群实用程序

作者: [Hongxin Liu](https://github.com/ver217)

**前置教程:**
- [分布式训练](../concepts/distributed_training.md)

## 引言

我们提供了一个实用程序类 `colossalai.cluster.DistCoordinator` 来协调分布式训练。它对于获取有关集群的各种信息很有用,例如节点数、每个节点的进程数等。

## API 参考

{{ autodoc:colossalai.cluster.DistCoordinator }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_node_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.is_last_process }}

{{ autodoc:colossalai.cluster.DistCoordinator.print_on_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.print_on_node_master }}

{{ autodoc:colossalai.cluster.DistCoordinator.priority_execution }}

{{ autodoc:colossalai.cluster.DistCoordinator.destroy }}

{{ autodoc:colossalai.cluster.DistCoordinator.block_all }}

{{ autodoc:colossalai.cluster.DistCoordinator.on_master_only }}