Replicating llama result, having lower accuracy than Figure 4

Dear authors,

Thanks for your amazing work in this field.
I am trying to evaluate connectivity task using Llama2-7b. The result is about 40% to 50%, which is far less than Figure 4 in your paper. The version I am using is Llama2-7b-chat, with temperature = 0 and top_p = 0.7
I am wondering whether we are using the same parameter, or you may have also finetuned llama based on section 5.1?

Thank you!
DM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicating llama result, having lower accuracy than Figure 4 #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replicating llama result, having lower accuracy than Figure 4 #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions