Skip to content

Performance of the provided model on LIBERO task #14

@junmokane

Description

@junmokane

Hi! I’m Junmo,

Thank you for sharing your work and providing the trained models — they have been really helpful for the experiments.

I’ve noticed a performance gap when evaluating the provided Seer model (33.pth) on the LIBERO tasks.
From the website, the log file evaluate_33.pth.log reports, for example, a 75% success rate on KITCHEN_SCENE8_put_both_moka_pots_on_the_stove.
However, when I downloaded the same 33.pth model from the website and ran evaluation locally using the provided eval.sh script and default hyperparameters, I obtained only 40% for the same task. Shouldn't they be same or similar since it's same model and same hyperparameters?
Is there anything that I'm missing when executing the code? I’d appreciate it if you could clarify this difference in results.

log from evaluate_33.pth.log provided from website
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
95.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (0, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
95.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (1, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (1, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (1, 97), (1, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
95.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (1, 114), (1, 115), (1, 116), (1, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
95.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (1, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (0, 129), (1, 130), (0, 131), (1, 132), (1, 133), (0, 134), (1, 135), (0, 136), (1, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (0, 143), (0, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (0, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
85.0%
this_result_list : [(1, 160), (0, 161), (1, 162), (1, 163), (0, 164), (1, 165), (1, 166), (1, 167), (0, 168), (1, 169), (0, 170), (1, 171), (0, 172), (1, 173), (1, 174), (1, 175), (1, 176), (1, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
75.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (0, 187), (0, 188), (1, 189), (1, 190), (1, 191), (1, 192), (1, 193), (1, 194), (1, 195), (1, 196), (1, 197), (0, 198), (1, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%

log from executing provided Seer model (33.pth) on the LIBERO tasks on local machine
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
90.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (1, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
100.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (0, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (0, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (0, 97), (0, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
75.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (0, 114), (1, 115), (1, 116), (0, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
85.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (0, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (1, 129), (1, 130), (0, 131), (1, 132), (1, 133), (1, 134), (1, 135), (0, 136), (0, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (1, 143), (1, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (1, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
100.0%
this_result_list : [(1, 160), (0, 161), (0, 162), (1, 163), (0, 164), (0, 165), (1, 166), (1, 167), (1, 168), (0, 169), (0, 170), (0, 171), (0, 172), (0, 173), (0, 174), (0, 175), (1, 176), (0, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
40.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (1, 187), (1, 188), (1, 189), (1, 190), (0, 191), (1, 192), (1, 193), (1, 194), (1, 195), (0, 196), (1, 197), (1, 198), (0, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions