FYI, the Lora target modules in https://github.com/dllm-reasoning/d1/blob/main/diffu-grpo/diffu_grpo_train.py (which seem to be taken from Llama 3) are incorrect. LlaDA uses a different naming convention for their blocks, so the correct modules to target are ["q_proj", "k_proj", "v_proj", "attn_out", "ff_proj", "up_proj"]. (For example, see the LlaDA HF source code.)
peft doesn't throw an error because q_proj, k_proj, v_proj are valid targets, so it just silently ignores the rest. Changing to the correct modules results in an extra ~50% trainable parameters.
FYI, the Lora target modules in https://github.com/dllm-reasoning/d1/blob/main/diffu-grpo/diffu_grpo_train.py (which seem to be taken from Llama 3) are incorrect. LlaDA uses a different naming convention for their blocks, so the correct modules to target are ["q_proj", "k_proj", "v_proj", "attn_out", "ff_proj", "up_proj"]. (For example, see the LlaDA HF source code.)
peftdoesn't throw an error because q_proj, k_proj, v_proj are valid targets, so it just silently ignores the rest. Changing to the correct modules results in an extra ~50% trainable parameters.