Skip to content

Fixed CUDA kernel loading unlocking graph compilation#27

Merged
BlackSamorez merged 3 commits intomainfrom
compile
Feb 22, 2024
Merged

Fixed CUDA kernel loading unlocking graph compilation#27
BlackSamorez merged 3 commits intomainfrom
compile

Conversation

@BlackSamorez
Copy link
Copy Markdown
Collaborator

Following transformer Fix static generation when compiling!, it's clear that CUDA graph compilation can immensely speed up generation speed.

This PR changes the way CUDA kernels from aqlm are compiled and loaded, allowing for their use with torch.compile and CUDA graphs.

Copy link
Copy Markdown
Collaborator

@justheuristic justheuristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@BlackSamorez BlackSamorez merged commit 3e93557 into main Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants