Dear,
When I try to reproduce the defense results of the Qwen model using the provided steering matrix, the results are close to the ones reported in the paper. However, when I try to generate the steering matrix from the beginning, the defense performance of Qwen is much worse. The performance of Llama is still close to the one reported in the paper.
Could you check the reproducibility for the Qwen model?
Best regards,
Quoc Nguyen
Dear,
When I try to reproduce the defense results of the Qwen model using the provided steering matrix, the results are close to the ones reported in the paper. However, when I try to generate the steering matrix from the beginning, the defense performance of Qwen is much worse. The performance of Llama is still close to the one reported in the paper.
Could you check the reproducibility for the Qwen model?
Best regards,
Quoc Nguyen