Skip to content

Conversation

@jiqing-feng
Copy link
Collaborator

@jiqing-feng jiqing-feng commented Dec 3, 2025

Add HF Kernel for CPU, can get significant speed-up on TTFT compared to torch_fused.

Requires review after kernels-community is ready.

@jiqing-feng jiqing-feng marked this pull request as draft December 3, 2025 08:19
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@yao-matrix
Copy link

kernel PR merged huggingface/kernels-community#81, pls make it to ready for review once kernel binary propagated, @jiqing-feng

@Qubitium Qubitium self-assigned this Dec 4, 2025
@Qubitium Qubitium changed the title HF Kernel for CPU HF Kernel for CPU: AMX, AVX2, AVX512 optimized Dec 4, 2025
@jiqing-feng jiqing-feng marked this pull request as ready for review December 5, 2025 01:23
@jiqing-feng
Copy link
Collaborator Author

Hi @Qubitium . This PR is ready to be reviewed.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2025

@jiqing-feng Awesome. We now are approaching a threshold where we have more mature CPU kernels than GPU ones thanks to Intel. =)

Please add hf_kernel to ci kernel test: https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_kernel_output_torch_fused.py

And maybe change the test_kernel_output_torch_fused.py test name to test_kernel_output_intel_cpu_xpu.py since it now tests both xpu and cpu kernels for output regressions.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2025

@jiqing-feng One thing. Please add kernels hard dependency to both pyproject.toml and requirements.txt. I checked the pkg dependency for https://github.com/huggingface/kernels/blob/main/pyproject.toml and kernel is very small and has very low dependency so i think it's safe for us to add this new hard dependency.

@jiqing-feng
Copy link
Collaborator Author

Hi @Qubitium . I have fixed your comments. Please verify it cause I cannot load the test model /monster/data/model/bloom-560m-gptqmodel-4bit

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2025

Hi @Qubitium . I have fixed your comments. Please verify it cause I cannot load the test model /monster/data/model/bloom-560m-gptqmodel-4bit

Ok. Thanks. I will run the unit test and merge after it passes.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2025

CI tests passed

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Collaborator Author

Hi @Qubitium . Please let me know what needs to be changed before merging. Thanks.

@Qubitium Qubitium merged commit 5e66941 into ModelCloud:main Dec 8, 2025
1 check passed
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants