Conversation
f33edcd to
48e91d2
Compare
|
Is there any difference between this and #2054? |
This is the second MR, we need to merge 2054 first, and then this 2000 (The reason the second MR is 2000, while the first one is 2054 (>2000), is because they were migrated from GitLab at different times) |
Got it, thanks! Could you please update the title to reflect this? |
983e5f3 to
11d9960
Compare
|
/ok to test e0c90c5 |
e0c90c5 to
501a5f6
Compare
|
/ok to test d12ccf1 |
|
/ok to test 86581cd |
| # during pipeline parallelism, it should not be set if sequence length | ||
| # is constant during training. | ||
| args.variable_seq_lengths = False | ||
| if args.sequence_packing: |
There was a problem hiding this comment.
Please move these validations into transformer_config.
| max_seqlen = torch.empty( | ||
| 1, | ||
| dtype=torch.int32, | ||
| device=torch.cuda.current_device(), |
There was a problem hiding this comment.
Could you please clarify why these were removed?
There was a problem hiding this comment.
To support pp, it would become more complex, so the thd related logic was moved to a new separate function get_batch_on_this_rank_for_sequence_packing.
There was a problem hiding this comment.
The thd logic was added in the part-1 PR, it's just back to how it was before.
| # Copyright (c) 2025 NVIDIA CORPORATION. All rights reserved. | ||
|
|
||
| from typing import Any, List, Optional | ||
| import enum |
There was a problem hiding this comment.
Could we put this big change in a separate file?
There was a problem hiding this comment.
This data_schedule.py should be the separate file you want, it was included in the part-1 PR.
|
@asolergi-nv FYI |
Signed-off-by: xiaoyao0115 <1804647152@qq.com>
3f9564f to
ffe8f94
Compare
| {config.sequence_packing_scheduler}" | ||
| ) | ||
| scheduler_type = scheduler_type_map[config.sequence_packing_scheduler] | ||
| return wrap_dataloader(data_iterator, config, scheduler_type, pg_collection=None) |
There was a problem hiding this comment.
should pass the pg_collection instead of hardcode it to None
Signed-off-by: xiaoyao0115 <1804647152@qq.com>
61ba6e8 to
8797dbc
Compare
This PR is the second part of hybrid-cp. The first part is: #2054
(PR for main branch: #2304 )
Compared to part 1, this PR adds the following:
max_seqlenis set to 12288, andmax_seqlen_per_dp_cp_rankis set to 3072. In the figure below, 'bshd' refers to running with CP=4, where sequences are padded to max_seqlen and executed in the same bshd format as in pretraining. 'thd-packing' refers to using CP=4 while packing variable-length sequences. In 'hybrid-cp', the maximum CP group size is also 4.dataiterator_wrapperto minimize code changes. Adding a new scheduling algorithm now only requires adding a new scheduler class, which keeps the logic clear and easier to maintain.There's many improvements that we want to make in the future releases.
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.