Sync `transformers` and `accelerate` versions by michaelbenayoun · Pull Request #562 · huggingface/optimum-neuron

michaelbenayoun · 2024-04-10T14:21:17Z

What does this PR do?

This PR synchronizes optimum-neuron with more recent transformers and accelerate versions:

accelerate==0.29.2, which is the latest release when this PR is being done,
transformers==4.40.2, which will be the latest releae when this PR is merged.

Related PR in transformers: huggingface/transformers#30259

On top of that:

The workflows for Trainium instances have been updated and use K8 now.

HuggingFaceDocBuilderDev · 2024-04-10T14:31:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dacorvo

Awesome ! Looks good to me, but found a few nits !

dacorvo · 2024-05-15T12:47:11Z

-                tr_loss_div = tr_loss / dp_size
+            tr_loss_div = tr_loss / dp_size
+
+            xm.mark_step()


I love when these lines pop magically to solve a sync issue ... 😜

dacorvo · 2024-05-15T12:49:26Z

-                ignore_keys_for_eval=ignore_keys_for_eval,
-                **kwargs,
-            )
+        # with hub_neuronx_cache("training", entry=self.model_cache_entry):


Are you sure you don't want to fetch from the cache here ? If so, you should remove the commented line.

Fixed, it was a quick test artifact

dacorvo · 2024-05-15T13:00:24Z

You should unpin the safetensor package here because there is now a conflict:

optimum-neuron/text-generation-inference/server/pyproject.toml

Line 17 in 1e7d0f5

'safetensors == 0.3.2',

…te_dacorvo' into sync_transformers_and_accelerate

dacorvo

LGTM, thanks ! If you can fix the seq2seq tracing issue before merging that pull-request that is even better.

michaelbenayoun · 2024-05-16T13:00:35Z

I fixed all but one test:

tests/generation/test_tnx_llama.py::test_decoder_generation_multiple_eos_token_ids

michaelbenayoun added 3 commits April 10, 2024 15:12

First pass on accelerator.py

0565b23

First pass on state.py

72f7341

Sync with accelerate done

951bf17

michaelbenayoun mentioned this pull request Apr 12, 2024

Bump PyTorch to 2.1 #502

Merged

7 tasks

michaelbenayoun added 25 commits April 15, 2024 15:26

Merge branch 'main' into sync_transformers_and_accelerate

c45905b

[WIP]

973a9a6

Update Llama attention

17cf147

[WIP] Update trainer.py

7ebf975

[WIP] test_trainers.py

65bf751

[WIP] test_trainers.py

1fb48ca

[WIP] test_trainers.py

e86a0f1

Refacored test

4833aab

Trigger workflow

fc712ed

Update transformers version

415838a

Several fixes

58ca658

Remove comment

48c201f

Update optimum version

7002408

[WIP] fix tests

6ef99c9

[WIP] bug with DDP

d0f88d6

Update transformers version

b474bd8

Fix tests

78ffd62

[WIP] fix tests

4b2ee10

[WIP] fix tests

c7a8a86

[WIP] fix tests

107266b

[WIP] fix tests

e6c1e09

TEST

73977bf

TEST

c2721c3

Update workflow (part 1)

5a7a619

Update workflow (part 1.5)

0bd07e0

michaelbenayoun added 12 commits May 14, 2024 15:03

Update workflow (part 2.3)

b7d721d

Update workflow (part 2.4)

3bca191

Update workflow (part 2.5)

43a1807

Update workflow (part 2.4)

293efd4

Update workflow (part 2.6)

72c30c3

Update workflow (part 2.7)

1593a9d

Update workflow (part 3)

38f4dc9

[WIP] fix model parallelization

3ea57b0

[WIP] fix model parallelization

2c7e0da

Update worker

df25696

Fix most issues except LlamaForQuestionAnswering

18a6a5a

Fix tests

0280184

michaelbenayoun marked this pull request as ready for review May 15, 2024 12:23

michaelbenayoun requested a review from dacorvo May 15, 2024 12:23

dacorvo reviewed May 15, 2024

View reviewed changes

michaelbenayoun and others added 8 commits May 15, 2024 15:30

Fix

f7a76a6

Apply suggestions

80e4c2e

fix(decoder): new generation mode API

cb727c2

fix(decoder): adapt to new StoppingCriteria API

46f4159

fix(tgi): adapt to new StoppingCriteria API

8284193

Fix precompilation

0632654

Merge remote-tracking branch 'upstream/sync_transformers_and_accelera…

45cb837

…te_dacorvo' into sync_transformers_and_accelerate

Merge branch 'main' into sync_transformers_and_accelerate

f8dce2f

dacorvo approved these changes May 16, 2024

View reviewed changes

Fix generation tests

c3747aa

Change model

39e0b96

michaelbenayoun merged commit d15c130 into main May 16, 2024

michaelbenayoun deleted the sync_transformers_and_accelerate branch May 16, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync `transformers` and `accelerate` versions#562

Sync `transformers` and `accelerate` versions#562
michaelbenayoun merged 52 commits intomainfrom
sync_transformers_and_accelerate

michaelbenayoun commented Apr 10, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2024

Uh oh!

dacorvo left a comment

Uh oh!

Uh oh!

Uh oh!

dacorvo May 15, 2024

Uh oh!

dacorvo May 15, 2024

Uh oh!

michaelbenayoun May 15, 2024

Uh oh!

Uh oh!

Uh oh!

dacorvo commented May 15, 2024

Uh oh!

dacorvo left a comment

Uh oh!

michaelbenayoun commented May 16, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelbenayoun commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2024

Uh oh!

dacorvo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dacorvo May 15, 2024

Choose a reason for hiding this comment

Uh oh!

dacorvo May 15, 2024

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun May 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dacorvo commented May 15, 2024

Uh oh!

dacorvo left a comment

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michaelbenayoun commented Apr 10, 2024 •

edited

Loading

michaelbenayoun commented May 16, 2024 •

edited

Loading