Hello, Thank you for providing code.
I can get the right results of W1A1 with bash scripts/run_glue.sh MNLI (around 77 accuracy on MNLI)
But when i reproduce the W1A1 with multi-distillation approach following (W32A32->W1A2->W1A1), I cannot reproduce the results of W1A2 in paper by simply change abits=1 to abits=2 in scripts/run_glue.sh (The result of W1A2 i get is 80.96/81.36).
Can you share the detail settings of multi-disitillation approach?
Hello, Thank you for providing code.
I can get the right results of W1A1 with
bash scripts/run_glue.sh MNLI(around 77 accuracy on MNLI)But when i reproduce the W1A1 with multi-distillation approach following (W32A32->W1A2->W1A1), I cannot reproduce the results of W1A2 in paper by simply change
abits=1toabits=2inscripts/run_glue.sh(The result of W1A2 i get is80.96/81.36).Can you share the detail settings of multi-disitillation approach?