-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi! I’m Junmo,
Thank you for sharing your work and providing the trained models — they have been really helpful for the experiments.
I’ve noticed a performance gap when evaluating the provided Seer model (33.pth) on the LIBERO tasks.
From the website, the log file evaluate_33.pth.log reports, for example, a 75% success rate on KITCHEN_SCENE8_put_both_moka_pots_on_the_stove.
However, when I downloaded the same 33.pth model from the website and ran evaluation locally using the provided eval.sh script and default hyperparameters, I obtained only 40% for the same task. Shouldn't they be same or similar since it's same model and same hyperparameters?
Is there anything that I'm missing when executing the code? I’d appreciate it if you could clarify this difference in results.
log from evaluate_33.pth.log provided from website
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
95.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (0, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
95.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (1, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (1, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (1, 97), (1, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
95.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (1, 114), (1, 115), (1, 116), (1, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
95.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (1, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (0, 129), (1, 130), (0, 131), (1, 132), (1, 133), (0, 134), (1, 135), (0, 136), (1, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (0, 143), (0, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (0, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
85.0%
this_result_list : [(1, 160), (0, 161), (1, 162), (1, 163), (0, 164), (1, 165), (1, 166), (1, 167), (0, 168), (1, 169), (0, 170), (1, 171), (0, 172), (1, 173), (1, 174), (1, 175), (1, 176), (1, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
75.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (0, 187), (0, 188), (1, 189), (1, 190), (1, 191), (1, 192), (1, 193), (1, 194), (1, 195), (1, 196), (1, 197), (0, 198), (1, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%
log from executing provided Seer model (33.pth) on the LIBERO tasks on local machine
Success rates for task 0 LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket:
90.0%
this_result_list : [(1, 20), (0, 21), (1, 22), (1, 23), (0, 24), (1, 25), (1, 26), (1, 27), (1, 28), (1, 29), (1, 30), (1, 31), (1, 32), (1, 33), (1, 34), (1, 35), (1, 36), (1, 37), (1, 38), (1, 39)]
Success rates for task 1 LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket:
90.0%
this_result_list : [(1, 40), (1, 41), (1, 42), (1, 43), (1, 44), (1, 45), (1, 46), (1, 47), (1, 48), (1, 49), (1, 50), (1, 51), (1, 52), (1, 53), (1, 54), (1, 55), (1, 56), (1, 57), (1, 58), (1, 59)]
Success rates for task 2 KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it:
100.0%
this_result_list : [(1, 60), (1, 61), (1, 62), (1, 63), (1, 64), (1, 65), (1, 66), (1, 67), (1, 68), (1, 69), (1, 70), (1, 71), (1, 72), (1, 73), (1, 74), (1, 75), (1, 76), (1, 77), (1, 78), (1, 79)]
Success rates for task 3 KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it:
100.0%
this_result_list : [(1, 80), (0, 81), (1, 82), (1, 83), (1, 84), (1, 85), (1, 86), (0, 87), (1, 88), (1, 89), (1, 90), (0, 91), (1, 92), (1, 93), (1, 94), (1, 95), (1, 96), (0, 97), (0, 98), (1, 99)]
Success rates for task 4 LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate:
75.0%
this_result_list : [(1, 100), (1, 101), (1, 102), (1, 103), (1, 104), (1, 105), (1, 106), (1, 107), (1, 108), (1, 109), (1, 110), (1, 111), (1, 112), (1, 113), (0, 114), (1, 115), (1, 116), (0, 117), (1, 118), (0, 119)]
Success rates for task 5 STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy:
85.0%
this_result_list : [(1, 120), (1, 121), (1, 122), (0, 123), (1, 124), (1, 125), (1, 126), (1, 127), (1, 128), (1, 129), (1, 130), (0, 131), (1, 132), (1, 133), (1, 134), (1, 135), (0, 136), (0, 137), (1, 138), (1, 139)]
Success rates for task 6 LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate:
80.0%
this_result_list : [(1, 140), (1, 141), (1, 142), (1, 143), (1, 144), (1, 145), (1, 146), (1, 147), (1, 148), (1, 149), (1, 150), (1, 151), (1, 152), (1, 153), (1, 154), (1, 155), (1, 156), (1, 157), (1, 158), (1, 159)]
Success rates for task 7 LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket:
100.0%
this_result_list : [(1, 160), (0, 161), (0, 162), (1, 163), (0, 164), (0, 165), (1, 166), (1, 167), (1, 168), (0, 169), (0, 170), (0, 171), (0, 172), (0, 173), (0, 174), (0, 175), (1, 176), (0, 177), (1, 178), (1, 179)]
Success rates for task 8 KITCHEN_SCENE8_put_both_moka_pots_on_the_stove:
40.0%
this_result_list : [(0, 180), (1, 181), (0, 182), (1, 183), (1, 184), (1, 185), (1, 186), (1, 187), (1, 188), (1, 189), (1, 190), (0, 191), (1, 192), (1, 193), (1, 194), (1, 195), (0, 196), (1, 197), (1, 198), (0, 199)]
Success rates for task 9 KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it:
75.0%