Skip to content

Conversation

@rsshaik1
Copy link
Collaborator

@rsshaik1 rsshaik1 commented May 5, 2025

This PR integrates the support for double dequantization on HPU.

jiqing-feng and others added 8 commits March 18, 2025 10:43
* fix 4bit XPU dequant 4bit

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix default value

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex linear set

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex linear set to false when calling state dict

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix Int8Param device patch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix xpu to cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu cpu data device

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix intel cpu/xpu warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error log

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lib

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm return Nonr

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* error log only without ipex

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix import eerror

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable xpu 8bit optim

* add deqaunt_blockwise

* dequantize_blockwise

* add bakcend synchronize

* refine code

* ipex dep

* ipex dep too

* ipex version check

---------

Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Authored by: Chetan Kumar Verma <chetan.kumar.verma@intel.com>
Co-authored-by: Ruheena Suhani Shaik <ruheena.suhani.shaik@intel.com>
Co-authored-by: Bhargav Eede <bhargav.eede@intel.com>
Co-authored-by: Vivek Goel <vivek.goel@intel.com>

Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@rsshaik1 rsshaik1 requested review from bhargaveede and ckvermaAI May 5, 2025 12:08
@rsshaik1 rsshaik1 force-pushed the double_quant branch 2 times, most recently from b862ecc to a35e7e1 Compare May 5, 2025 12:44
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Optional[torch.Tensor]]:
assert_on_hpu([A, col_stats, row_stats, out_col, out_row])
return double_quant_impl(A, col_stats, row_stats, out_col, out_row, threshold)
assert_on_hpu([A, col_stats, row_stats, out_col, out_row])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not using this function anywhere on HPU. Please check and if that is the case lets add an assert.

):
assert quant_state is not None
if A.device.type in ("cpu", "xpu") and A.requires_grad == False:
if A.device.type in ("cpu", "xpu", "hpu") and A.requires_grad == False:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a different way. Remove "hpu" check from here, add a check such that we do not enter elif part for "hpu", then we will always go to else case.

After making this change if there is no path to get to gemv_4bit for HPU, remove the content of that function and add assert there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants