Add option to normalize loss per target by Muennighoff · Pull Request #326 · bigscience-workshop/Megatron-DeepSpeed

Muennighoff · 2022-08-09T19:13:03Z

No description provided.

megatron/model/gpt_model.py

megatron/arguments.py

megatron/model/gpt_model.py

thomasw21 · 2022-08-17T10:35:28Z

finetune_t0.py

    )

+    if args.norm_target_loss:
+        loss_mask = loss_mask.view(-1)


There's a fun hack you can do, view have the same storage space as the initial model. so you can probably write something like:

def fast_normalize(loss_mask: torch.Tensor): """ Turn loss_mask from [0,0,0,1,1,0,0,1,0,0,1,1,1] > [0,0,0,0.5,0.5,0,0,1,0,0,0.3,0.3,0.3] """ flatten_view = loss_mask.view(-1) _, inverse_indices, counts = torch.unique_consecutive(loss_mask, return_inverse=True, return_counts=True) counts = torch.gather(dim=0, index=inverse_indices, input=counts) flatten_view.div_(counts) return loss_mask

you could also clone before doing this operation so that you actually don't make fast_normalize a in-place operation.

Why is

def fast_normalize(loss_mask: torch.Tensor): """ Turn loss_mask from [0,0,0,1,1,0,0,1,0,0,1,1,1] > [0,0,0,0.5,0.5,0,0,1,0,0,0.3,0.3,0.3] """ flatten_view = loss_mask.view(-1) _, inverse_indices, counts = torch.unique_consecutive(loss_mask, return_inverse=True, return_counts=True) counts = torch.gather(dim=0, index=inverse_indices, input=counts) flatten_view.div_(counts) return loss_mask

better than

def fast_normalize(loss_mask: torch.Tensor): """ Turn loss_mask from [0,0,0,1,1,0,0,1,0,0,1,1,1] > [0,0,0,0.5,0.5,0,0,1,0,0,0.3,0.3,0.3] """ _, inverse_indices, counts = torch.unique_consecutive(loss_mask, return_inverse=True, return_counts=True) counts = torch.gather(dim=0, index=inverse_indices, input=counts) return loss_mask / counts

?

Does the latter work if loss_mask is not 1D?

megatron/model/gpt_model.py

Muennighoff added 4 commits August 9, 2022 21:12

Tmp lossseq

462efd9

Efficient loss normalization

992446c

Reuse variable

616cfe8

Simplify division

900c835

thomasw21 reviewed Aug 10, 2022

View reviewed changes

megatron/model/gpt_model.py Outdated Show resolved Hide resolved

megatron/model/gpt_model.py Outdated Show resolved Hide resolved

Add norm_target_loss arg

7bc1dd2

Muennighoff changed the title ~~TMP: Lossseq~~ Add option to normalize loss per target Aug 15, 2022

Muennighoff requested a review from thomasw21 August 15, 2022 15:08

thomasw21 reviewed Aug 16, 2022

View reviewed changes

Muennighoff added 3 commits August 16, 2022 11:23

Clarify loss on targets & remove kwarg

fce1a98

Loss mask is already float

2e7554d

Move norm to batch pipe

a6b2624

Muennighoff requested a review from thomasw21 August 17, 2022 10:32

thomasw21 reviewed Aug 17, 2022

View reviewed changes

megatron/model/gpt_model.py Show resolved Hide resolved

Muennighoff added 3 commits August 20, 2022 15:54

Reshape loss mask

549f499

Move view

d9a91fe

Merge branch 't0loading' into lossseq

456327c

Muennighoff merged commit 1e77844 into t0loading Nov 3, 2022

Muennighoff deleted the lossseq branch November 3, 2022 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to normalize loss per target#326

Add option to normalize loss per target#326
Muennighoff merged 11 commits intot0loadingfrom
lossseq

Muennighoff commented Aug 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasw21 Aug 17, 2022

Uh oh!

thomasw21 Aug 17, 2022

Uh oh!

Muennighoff Aug 20, 2022

Uh oh!

thomasw21 Aug 30, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Muennighoff commented Aug 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasw21 Aug 17, 2022

Choose a reason for hiding this comment

Uh oh!

thomasw21 Aug 17, 2022

Choose a reason for hiding this comment

Uh oh!

Muennighoff Aug 20, 2022

Choose a reason for hiding this comment

Uh oh!

thomasw21 Aug 30, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants