Seems like results provided by asv are unreliable. Here are few latest runs from our StartupBench runs, compared to runs on my computer:

Both runs on the bottom of image are on my computer, one is the same number of runs as master (100) and other is 800.
Maximum disrepancy between those two runs is ~ 4%.
Working theory so far: I set number of processes to 2, and this is the number of processes of the runner, so it might be we are fighting for resources with GHA runner and other background processes.
I will try reducing the number of processes to 1. If it won't help, it might be another vote for using our own machines.
Seems like results provided by

asvare unreliable. Here are few latest runs from our StartupBench runs, compared to runs on my computer:Both runs on the bottom of image are on my computer, one is the same number of runs as master (100) and other is 800.
Maximum disrepancy between those two runs is ~ 4%.
Working theory so far: I set number of processes to 2, and this is the number of processes of the runner, so it might be we are fighting for resources with GHA runner and other background processes.
I will try reducing the number of processes to 1. If it won't help, it might be another vote for using our own machines.