Fix empty tensor shape issue in DynamicCache for torch.compile by yashwantbezawada · Pull Request #42053 · huggingface/transformers

yashwantbezawada · 2025-11-06T02:03:25Z

What does this PR do?

Fixes #42027

This PR fixes a regression where torch.cat receives incorrectly shaped empty tensors during GPT2 model tracing with torch.compile, causing compilation failures.

Background

The issue was introduced in commit dc11a3c (PR #39797) where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch_size, num_heads, seq_len, head_dim] along dim=-2, torch.compiles tracing fails with empty tensor errors.

Changes

Modified DynamicLayer.lazy_initialization()

Changed from: torch.tensor([], dtype=..., device=...) → shape [0] (1D)
Changed to: torch.zeros((batch_size, num_heads, 0, head_dim), dtype=..., device=...) → shape [batch, heads, 0, dim] (4D)

Modified QuantizedLayer.update()

Applied same fix when resetting cache after quantization
Ensures empty tensors have proper 4D shape matching key_states dimensions

Testing

The fix ensures:

torch.cat([empty_4d_tensor, key_states], dim=-2) works correctly
Compatible with torch.compile tracing
Maintains backward compatibility with eager mode
Works for both DynamicLayer and QuantizedLayer caches

Impact

Fixes regression from v4.52.4 to v4.57.1
Affects models using DynamicCache with torch.compile (GPT2, and others)
No breaking changes to API or behavior

Fixes huggingface#42027 This commit fixes a regression where torch.cat receives incorrectly shaped empty tensors during model tracing with torch.compile. The issue was introduced in commit dc11a3c where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch, heads, seq, dim] along dim=-2, torch.compile's tracing fails. Changes: - Modified DynamicLayer.lazy_initialization() to create properly shaped 4D empty tensors [batch, heads, 0, dim] instead of 1D [0] - Modified QuantizedLayer.update() to reset cache with proper 4D shape - Used torch.zeros() with explicit shape matching key_states dimensions This ensures torch.cat operations work correctly in both eager and compiled modes.

yashwantbezawada · 2025-11-06T03:01:02Z

I see that the CI tests are failing (tests_exotic_models, tests_generate, tests_torch), while code quality checks pass. I'm unable to access the detailed CircleCI logs to understand the specific test failures.

The changes I made:

Changed empty tensor initialization from torch.tensor([]) (1D) to torch.zeros((batch_size, num_heads, 0, head_dim)) (4D with 0 seq_len)
Applied this to both DynamicLayer.lazy_initialization and QuantizedLayer.update

This approach ensures torch.cat works correctly in torch.compile mode by providing properly shaped 4D tensors.

Could someone help me understand what tests are failing and why? I'd be happy to adjust the approach if needed. I'm aware of PR #40328 which takes a more comprehensive approach to torch.compile + DynamicCache compatibility.

cc @huggingface/transformers

Rocketknight1 · 2025-11-06T14:56:54Z

This is an update to a PR from @Cyrilvallez, so I'll wait for him to approve it!

Cyrilvallez · 2025-11-11T16:51:58Z

How are you using torch.compile? In general, it should not really be used with DynamicCache, as the shapes will keep on changing each iteration

yashwantbezawada · 2025-11-26T21:14:10Z

The original issue was about using a custom torch.compile backend with aot_export_joint_simple for tracing - not really standard generation with DynamicCache.

I noticed PR #40328 is doing the proper fix with symbolic shapes and mark_dynamic. That seems like the right long-term solution. Should I just close this one?

Cyrilvallez · 2025-12-01T17:58:17Z

Hey @yashwantbezawada! Indeed the PR you linked will provide much better support for compile options! I'll go back into it asap when things will be calmer after the v5 release! Feel free to close in the meantime!

yashwantbezawada · 2025-12-01T22:14:48Z

Thanks for the update! Closing this one - looking forward to #40328.

yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch 2 times, most recently from 8bf24e6 to ffd4b63 Compare November 6, 2025 02:12

yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch from ffd4b63 to 1375af8 Compare November 6, 2025 02:39

yashwantbezawada closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix empty tensor shape issue in DynamicCache for torch.compile#42053

Fix empty tensor shape issue in DynamicCache for torch.compile#42053
yashwantbezawada wants to merge 1 commit intohuggingface:mainfrom
yashwantbezawada:fix/cache-empty-tensor-compile-42027

yashwantbezawada commented Nov 6, 2025 •

edited

Loading

Uh oh!

yashwantbezawada commented Nov 6, 2025

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Cyrilvallez commented Nov 11, 2025

Uh oh!

yashwantbezawada commented Nov 26, 2025

Uh oh!

Cyrilvallez commented Dec 1, 2025

Uh oh!

yashwantbezawada commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yashwantbezawada commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Background

Changes

Modified DynamicLayer.lazy_initialization()

Modified QuantizedLayer.update()

Testing

Impact

Uh oh!

yashwantbezawada commented Nov 6, 2025

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Cyrilvallez commented Nov 11, 2025

Uh oh!

yashwantbezawada commented Nov 26, 2025

Uh oh!

Cyrilvallez commented Dec 1, 2025

Uh oh!

yashwantbezawada commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yashwantbezawada commented Nov 6, 2025 •

edited

Loading