-
Notifications
You must be signed in to change notification settings - Fork 211
AMD port of TurboDiffusion - Working on gfx1151 on Windows #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d87745b to
9a7e801
Compare
|
3 Install Dependencies pip install -r requirements.txt |
In Turbodiffusion repo (on this PR branch) |
couldnt git clone this branch, so i modified the files by hand |
ah, for |
|
The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file. |
Sounds promising 👀 |
Fixed, thanks!
Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually. |
- Add HIP kernels for GEMM, LayerNorm, RMSNorm, and quantization ops - Integrate rocWMMA for matrix operations on AMD GPUs - Update setup.py for Windows ROCm builds with clang-cl - Add platform detection (CUDA/HIP) with common abstractions - Optimize SLA kernel config for ROCm (BLKK=16) - Update .gitignore to exclude build artifacts and IDE files - Fix distributed utils and network files for ROCm compatibility
I'll be the 1st tester though, when the time comes ^^ |
Use rocWMMA for GEMM kernels, and use triton-windows and SpargeAttn modified to support AMD on Windows.
See README_AMD_WINDOWS.md for setup steps.
Generated video using Wan2.1 1.4b 480p default command as per README.md:
generated_video.mp4
Limitations: