Skip to content

Reproduce results for Qwen #5

@minhquoc0712

Description

@minhquoc0712

Dear,

When I try to reproduce the defense results of the Qwen model using the provided steering matrix, the results are close to the ones reported in the paper. However, when I try to generate the steering matrix from the beginning, the defense performance of Qwen is much worse. The performance of Llama is still close to the one reported in the paper.

Could you check the reproducibility for the Qwen model?
Best regards,
Quoc Nguyen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions