Skip to content

Replicating llama result, having lower accuracy than Figure 4 #1

@DeconstructionMechanics

Description

Dear authors,

Thanks for your amazing work in this field.
I am trying to evaluate connectivity task using Llama2-7b. The result is about 40% to 50%, which is far less than Figure 4 in your paper. The version I am using is Llama2-7b-chat, with temperature = 0 and top_p = 0.7
I am wondering whether we are using the same parameter, or you may have also finetuned llama based on section 5.1?

Thank you!
DM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions