How to run it

qwen.c

C/CUDA implementation for inference of Qwen3-0.6B.

How to run it

First, clone the repo:

git clone https://github.com/asdf93074/qwen.c

Put the model.safetensors file from this link into the root of your repo: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors

make release chat

Builds the implementation into a shared library which is then used by python chat.py for chatting.

make run

Uses run.c as the entrypoint which loads the model, and prints the generated tokens. Not really of much use unless you want to hack on it to do something.

NOTE: It only supports CUDA as a backend.

Motivation

I wanted to learn more about C, CUDA and deep learning libraries in general. The goal was to build something you could generally talk to. I tried it with Qwen3 0.6B. As the architecture for the other Qwen3 models is identical, you could technically use this with any of those too (remember to update the hardcoded number of layers/heads in json.c).

Caveats

Most of the kernels are naively written, TBI.
It picks the max token when decoding, which is known to cause repetitiveness (though I didn't run into this issue in testing).
The C code can only load safetensors but you can load weights in python and run them through it too.
KV cache and RoPE matrixes are generated for only a max length of 2048 to save on memory.

Improvements

Lots of possible extension points available here if someone is looking to learn/contribute.

better kernels (many improvements can be done here)
dynamic KV caching (size is fixed at init)
KV cache offload to CPU
remove python dependency by doing the byte-level-BPE tokenization in C too
better sampling techniques (temperature, top-p, top-k etc)
better memory allocation (saves us from doing lots of cudaMalloc/cudaFree calls)
partial offload to CPU
supporting quantized versions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
assets		assets
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
chat.py		chat.py
common.py		common.py
json.c		json.c
json.h		json.h
qwen.cu		qwen.cu
qwen.h		qwen.h
requirements.txt		requirements.txt
run.c		run.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

qwen.c

How to run it

Motivation

Caveats

Improvements

License

About

Uh oh!

Releases

Packages

Languages

License

asdf93074/qwen.c

Folders and files

Latest commit

History

Repository files navigation

qwen.c

How to run it

Motivation

Caveats

Improvements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages