Performance of the provided model on LIBERO task

Hi! I’m Junmo,

Thank you for sharing your work and providing the trained models — they have been really helpful for the experiments.

I’ve noticed a performance gap when evaluating the provided Seer model (33.pth) on the LIBERO tasks.
From the [website](https://drive.google.com/drive/folders/1zwqGvKKtjyuWdDaNSLVGJprJMPoSqAPk), the log file evaluate_33.pth.log reports, for example, a 75% success rate on KITCHEN_SCENE8_put_both_moka_pots_on_the_stove.
However, when I downloaded the same 33.pth model from the website and ran evaluation locally using the provided eval.sh script and default hyperparameters, I obtained only 40% for the same task. Shouldn't they be same or similar since it's same model and same hyperparameters?
Is there anything that I'm missing when executing the code? I’d appreciate it if you could clarify this difference in results.

**log from evaluate_33.pth.log provided from [website](https://drive.google.com/drive/folders/1zwqGvKKtjyuWdDaNSLVGJprJMPoSqAPk)**
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
95.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (0, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
95.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (1, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (1, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (1, 97), (1, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
95.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (1, 114), (1, 115), (1, 116), (1, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
95.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (1, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (0, 129), (1, 130), (0, 131), (1, 132), (1, 133), (0, 134), (1, 135), (0, 136), (1, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (0, 143), (0, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (0, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
85.0%
this_result_list : [(1, 160), (0, 161), (1, 162), (1, 163), (0, 164), (1, 165), (1, 166), (1, 167), (0, 168), (1, 169), (0, 170), (1, 171), (0, 172), (1, 173), (1, 174), (1, 175), (1, 176), (1, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
75.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (0, 187), (0, 188), (1, 189), (1, 190), (1, 191), (1, 192), (1, 193), (1, 194), (1, 195), (1, 196), (1, 197), (0, 198), (1, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%

**log from executing provided Seer model (33.pth) on the LIBERO tasks on local machine** 
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
90.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (1, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
100.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (0, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (0, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (0, 97), (0, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
75.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (0, 114), (1, 115), (1, 116), (0, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
85.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (0, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (1, 129), (1, 130), (0, 131), (1, 132), (1, 133), (1, 134), (1, 135), (0, 136), (0, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (1, 143), (1, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (1, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
100.0%
this_result_list : [(1, 160), (0, 161), (0, 162), (1, 163), (0, 164), (0, 165), (1, 166), (1, 167), (1, 168), (0, 169), (0, 170), (0, 171), (0, 172), (0, 173), (0, 174), (0, 175), (1, 176), (0, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
40.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (1, 187), (1, 188), (1, 189), (1, 190), (0, 191), (1, 192), (1, 193), (1, 194), (1, 195), (0, 196), (1, 197), (1, 198), (0, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of the provided model on LIBERO task #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance of the provided model on LIBERO task #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions