Reduce memory usage in TF building by Rocketknight1 · Pull Request #24046 · huggingface/transformers

Rocketknight1 · 2023-06-06T13:59:20Z

This PR reduces the default shape of dummy inputs from (3, 3) to (2, 2). This slightly reduces the memory usage when building TF models, which should hopefully fix some of our pipeline tests.

We could replace the dummy inputs with symbolic tensors, which would mean we could build TF models with 0 memory usage, but this would make TF model building slower (~4X) because it would implicitly compile the model when building, which is probably not an acceptable tradeoff.

cc @ydshieh and @amyeroberts as core maintainer

HuggingFaceDocBuilderDev · 2023-06-06T14:13:48Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh

Thanks, the change itself is good for me.

ydshieh · 2023-06-06T14:36:54Z

Let me run it on CI and see.

amyeroberts

Change LGTM - thanks for updating!

Happy to merge once @ydshieh gives the 👍 from CI runs

amyeroberts · 2023-06-06T14:54:55Z

        for key, spec in sig.items():
-            # 3 is the most correct arbitrary size. I will not be taking questions
-            dummies[key] = tf.ones(shape=[dim if dim is not None else 3 for dim in spec.shape], dtype=spec.dtype)
+            # 2 is the most correct arbitrary size. I will not be taking questions


I wish to file this diff as evidence to the contrary #team3

Rocketknight1 · 2023-06-06T15:14:58Z

Sorry for the delay - there's an issue with Funnel that wasn't reproducing on my machine. I eventually figured out that the problem is the classic TF one: indices for tf.gather are not validated on GPU but are validated on CPU, and so the bug only becomes apparent on CPU. Will fix in just a sec!

ydshieh · 2023-06-06T15:22:04Z

I also tried to run the change in this PR, and got

FAILED tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf - tensorflow.python.framework.errors_impl.ResourceExhaustedError: {{function_node __wrapped__Transpose_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[768,768] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Transpose]
FAILED tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf_table_qa - tensorflow.python.framework.errors_impl.ResourceExhaustedError: Exception encountered when calling layer 'tapas' (type TFTapasMainLayer).

{{function_node __wrapped__StatelessTruncatedNormalV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[30522,768] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:StatelessTruncatedNormalV2]

Call arguments received by layer 'tapas' (type TFTapasMainLayer):
  • input_ids=tf.Tensor(shape=(2, 2), dtype=int32)
  • attention_mask=tf.Tensor(shape=(2, 2), dtype=float32)
  • token_type_ids=tf.Tensor(shape=(2, 2, 7), dtype=int32)
  • position_ids=None
  • head_mask=None
  • inputs_embeds=None
  • output_attentions=False
  • output_hidden_states=False
  • return_dict=True
  • training=False

and 5 other ones (probably due to the above one).

@Rocketknight1 I think we will have to reiterate (change->run->change->run) a bit more before we merge.

Rocketknight1 · 2023-06-06T15:29:54Z

Yep, working on it now!

ydshieh · 2023-06-06T15:41:43Z

The tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf run against a list of models, so it's kind normal it fails with other models even some fixes are done previously.

I am OK to trigger the run (a subset) whenever you feel it's time. Otherwise I can show you a modified workflow file for you to trigger manually.

Rocketknight1 · 2023-06-06T16:26:35Z

@ydshieh the issues with Funnel have been resolved, so this should be ready for a CI run now!

ydshieh · 2023-06-06T16:41:06Z

You can watch it live here. It will take 20-30 min to finish.

Rocketknight1 · 2023-06-06T17:10:07Z

Looks like they're still failing even with very small dummies. I'll investigate those models and try to figure out why - the new dummies should be smaller than the old ones!

Rocketknight1 · 2023-06-06T17:11:19Z

Maybe this is a sign that we should transition the dummies to symbolic tensors for those models, even if it's probably too slow for our tests to do it across the whole codebase.

* Make the default dummies (2, 2) instead of (3, 3) * Fix for Funnel * Actually fix Funnel

Make the default dummies (2, 2) instead of (3, 3)

1fdb055

Rocketknight1 requested review from amyeroberts and ydshieh June 6, 2023 13:59

ydshieh approved these changes Jun 6, 2023

View reviewed changes

amyeroberts approved these changes Jun 6, 2023

View reviewed changes

Fix for Funnel

aa71374

Actually fix Funnel

2b32578

Rocketknight1 merged commit 7203ea6 into main Jun 6, 2023

Rocketknight1 deleted the lower_dummy_memory_usage branch June 6, 2023 17:29

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Reduce memory usage in TF building (huggingface#24046)

61e0124

* Make the default dummies (2, 2) instead of (3, 3) * Fix for Funnel * Actually fix Funnel

Conversation

Rocketknight1 commented Jun 6, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jun 6, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Jun 6, 2023

Uh oh!

ydshieh commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Jun 6, 2023

Uh oh!

ydshieh commented Jun 6, 2023

Uh oh!

Rocketknight1 commented Jun 6, 2023

Uh oh!

ydshieh commented Jun 6, 2023

Uh oh!

Rocketknight1 commented Jun 6, 2023

Uh oh!

Rocketknight1 commented Jun 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Jun 6, 2023 •

edited

Loading

ydshieh left a comment •

edited

Loading

ydshieh commented Jun 6, 2023 •

edited

Loading