From 7aa0c137394a771b9183154ecf4613d8534fdd37 Mon Sep 17 00:00:00 2001 From: Conglong Li Date: Thu, 28 Oct 2021 16:29:45 -0700 Subject: [PATCH 1/2] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index f39f55128..d701a51d1 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ This repo contains example models that use [DeepSpeed](https://github.com/micros # Note on Megatron examples +NOTE: We plan to deprecate/stop updating the following 3 Megatron-LM snapshots, and switch to use this [fork](https://github.com/microsoft/Megatron-DeepSpeed) as the only showcase of DeepSpeed features applied to Megatron-LM. However, currently the Megatron-DeepSpeed fork still has some limitations. For example, we do not have support for ZeRO-1/2/3, it's not integrated and tested in the fork. (We do have PP + ZeRO-1 but that's essentially it) We will keep you updated about this migration process. + Megatron-LM : This is a fairly old snapshot of Megatron-LM , and we have been using it show case the earlier features of DeepSpeed. This does not contain ZeRO-3 or 3D parallelism. Megatron-LM-v1.1.5-3D_parallelism: This is a relatively new Megatron (Oct 2020), but before Megatron started supporting 3D parallelism. We ported this version to showcase how to use 3D parallelism inside DeepSpeed with Megatron. From 7a500cb1ea94b4e65e14b513b17c15d0a8a4d165 Mon Sep 17 00:00:00 2001 From: Jeff Rasley Date: Mon, 15 Nov 2021 20:53:02 -0800 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d701a51d1..1da997098 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ This repo contains example models that use [DeepSpeed](https://github.com/micros # Note on Megatron examples -NOTE: We plan to deprecate/stop updating the following 3 Megatron-LM snapshots, and switch to use this [fork](https://github.com/microsoft/Megatron-DeepSpeed) as the only showcase of DeepSpeed features applied to Megatron-LM. However, currently the Megatron-DeepSpeed fork still has some limitations. For example, we do not have support for ZeRO-1/2/3, it's not integrated and tested in the fork. (We do have PP + ZeRO-1 but that's essentially it) We will keep you updated about this migration process. +NOTE: We are in the process of deprecating the 3 Megatron-LM snapshots in this repo. Our current and future features with Megatron-LM will use the [Megatron-DeepSpeed fork](https://github.com/microsoft/Megatron-DeepSpeed). Currently the Megatron-DeepSpeed fork supports 3D parallelism + ZeRO Stage 1 and Curriculum Learning. Please see this new fork for further updates in the process. Megatron-LM : This is a fairly old snapshot of Megatron-LM , and we have been using it show case the earlier features of DeepSpeed. This does not contain ZeRO-3 or 3D parallelism.