We have two issues with the regression tests,
the first one, that they are very sensitive to very small changes. There are at least two families (not sure what the differences are) of workers in github, and there are two possible results for some of these tests.
As far as I can see this is the result of numerical differences that grow (and a fit means many numbers being updated for many epochs) and testing it is pretty much a nightmare (because it depends on hitting one of the "bad" workers during the test).
I don't have a good idea for this right now I'm afraid.
The second is, adding a theory covmat. We haven't done this because it is extremely expensive on the worker and cannot afford it (we hit the 6GB limit, if you look at the workflows I'm already removing files inside the docker image to free space...).
We will need to 1) probably remove even more files 2) upload the result of vp-setupfit for every regression test to the server so that it can be downloaded directly from the nnpdf server without running it in the worker.
This would've caught bugs like the one fixed by #2463 and now that the theory covmat is our default, this is important.
And would be made much better once #2461-#2462 are done since that ties the vp-setupfit with the n3fit run through the md5 file.
I've put this at the top of my to-do list but I cannot commit any time to my to-do list until about Thursday next week so if someone takes it before that, I would be extremely happy.
We have two issues with the regression tests,
the first one, that they are very sensitive to very small changes. There are at least two families (not sure what the differences are) of workers in github, and there are two possible results for some of these tests.
As far as I can see this is the result of numerical differences that grow (and a fit means many numbers being updated for many epochs) and testing it is pretty much a nightmare (because it depends on hitting one of the "bad" workers during the test).
I don't have a good idea for this right now I'm afraid.
The second is, adding a theory covmat. We haven't done this because it is extremely expensive on the worker and cannot afford it (we hit the 6GB limit, if you look at the workflows I'm already removing files inside the docker image to free space...).
We will need to 1) probably remove even more files 2) upload the result of
vp-setupfitfor every regression test to the server so that it can be downloaded directly from the nnpdf server without running it in the worker.This would've caught bugs like the one fixed by #2463 and now that the theory covmat is our default, this is important.
And would be made much better once #2461-#2462 are done since that ties the vp-setupfit with the n3fit run through the md5 file.
I've put this at the top of my to-do list but I cannot commit any time to my to-do list until about Thursday next week so if someone takes it before that, I would be extremely happy.