Skip to content

Conversation

@jammm
Copy link

@jammm jammm commented Dec 30, 2025

Use rocWMMA for GEMM kernels, and use triton-windows and SpargeAttn modified to support AMD on Windows.
See README_AMD_WINDOWS.md for setup steps.

Generated video using Wan2.1 1.4b 480p default command as per README.md:

generated_video.mp4

Limitations:

  • Currently supports only RDNA3/3.5, though can possibly work on RDNA4 with minor modificaitons.
  • multi-cta/distributed not tested

@bat3a
Copy link

bat3a commented Jan 5, 2026

3 Install Dependencies

pip install -r requirements.txt
in which repo?

@jammm
Copy link
Author

jammm commented Jan 5, 2026

3 Install Dependencies

pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

@bat3a
Copy link

bat3a commented Jan 5, 2026

3 Install Dependencies
pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

couldnt git clone this branch, so i modified the files by hand
git clone --branch jam/windows_amd https://github.com/woct0rdho/triton-windows.git triton-windows-patch Cloning into 'triton-windows-patch'... fatal: Remote branch jam/windows_amd not found in upstream origin

@jammm
Copy link
Author

jammm commented Jan 5, 2026

3 Install Dependencies
pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

couldnt git clone this branch, so i modified the files by hand git clone --branch jam/windows_amd https://github.com/woct0rdho/triton-windows.git triton-windows-patch Cloning into 'triton-windows-patch'... fatal: Remote branch jam/windows_amd not found in upstream origin

ah, for triton-windows you just need pip install triton-windows. I'll update the README to mention this. My bad

@githust66
Copy link

The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file.

@0xDELUXA
Copy link

0xDELUXA commented Jan 6, 2026

...though can possibly work on RDNA4 with minor modificaitons.

Sounds promising 👀

@jammm
Copy link
Author

jammm commented Jan 6, 2026

The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file.

Fixed, thanks!

...though can possibly work on RDNA4 with minor modificaitons.

Sounds promising 👀

Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually.

jammm and others added 4 commits January 6, 2026 17:05
- Add HIP kernels for GEMM, LayerNorm, RMSNorm, and quantization ops
- Integrate rocWMMA for matrix operations on AMD GPUs
- Update setup.py for Windows ROCm builds with clang-cl
- Add platform detection (CUDA/HIP) with common abstractions
- Optimize SLA kernel config for ROCm (BLKK=16)
- Update .gitignore to exclude build artifacts and IDE files
- Fix distributed utils and network files for ROCm compatibility
@0xDELUXA
Copy link

0xDELUXA commented Jan 7, 2026

Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually.

I'll be the 1st tester though, when the time comes ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants