Skip to content

fix: improve weight offloading to handle plain tensor attrs and use to_empty()#952

Open
quic-rishinr wants to merge 1 commit intoquic:mainfrom
quic-rishinr:mem_optim_v2
Open

fix: improve weight offloading to handle plain tensor attrs and use to_empty()#952
quic-rishinr wants to merge 1 commit intoquic:mainfrom
quic-rishinr:mem_optim_v2

Conversation

@quic-rishinr
Copy link
Copy Markdown
Contributor

fix: improve weight offloading to handle plain tensor attrs and use to_empty()

Replace manual storage resizing with to_empty(device="meta") for
parameters/buffers and explicitly handle plain tensor attributes (e.g.
stacked expert weights in MoE models) that are not registered as
parameters or buffers. This ensures all tensors are properly moved to
the meta device, reducing memory usage after ONNX export.

Add unit tests for plain tensor attribute clearing

…o_empty()

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant