Skip to content

[Enhance] Improve Zero3 Implementation: Search Utility, Consolidation, and In-Place Dist Tensor Conversion#178

Merged
yhna940 merged 11 commits intoEleutherAI:feature/zero3from
yhna940:feature/refact-zero3-rebase
Apr 19, 2023
Merged

[Enhance] Improve Zero3 Implementation: Search Utility, Consolidation, and In-Place Dist Tensor Conversion#178
yhna940 merged 11 commits intoEleutherAI:feature/zero3from
yhna940:feature/refact-zero3-rebase

Conversation

@yhna940
Copy link
Copy Markdown
Contributor

@yhna940 yhna940 commented Apr 19, 2023

Title

  • Improve Zero3 Implementation: Search Utility, Consolidation, and In-Place Dist Tensor Conversion

Description

This PR aims to improve the zero3 implementation with the following major changes:

  1. Added a search utility for configuring chunk structures.
  2. Consolidated zero-related implementations into a single directory (Motivated by this commit).
  3. Added a process for converting to custom tensors in-place (Motivated by this commit).
  4. Unittest

Minor changes include:

  1. Instantiation of chunk manager and hetero memory manager within fsdp.
  2. Several small bug fixes.

Linked Issues

  • N/A

@yhna940 yhna940 added Design Design related ZeRO ZeroRedundancyOptimizer labels Apr 19, 2023
@yhna940 yhna940 requested a review from KKIEEK April 19, 2023 00:02
@yhna940 yhna940 requested a review from hyunwoongko as a code owner April 19, 2023 00:02
@yhna940 yhna940 self-assigned this Apr 19, 2023
Comment thread oslo/torch/nn/parallel/data_parallel/zero/fully_sharded_data_parallel.py Outdated
@KKIEEK
Copy link
Copy Markdown
Contributor

KKIEEK commented Apr 19, 2023

It looks good to me overall.

…arallel.py

Co-authored-by: Junhwa Song <ethan9867@gmail.com>
@yhna940 yhna940 merged commit 599ba0c into EleutherAI:feature/zero3 Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Design Design related ZeRO ZeroRedundancyOptimizer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants