Skip to content

First pass, add a PDF class#5

Merged
scarlehoff merged 19 commits into
mainfrom
first_pass_jcm
Jan 14, 2026
Merged

First pass, add a PDF class#5
scarlehoff merged 19 commits into
mainfrom
first_pass_jcm

Conversation

@scarlehoff
Copy link
Copy Markdown
Collaborator

This is a first pass, my goal here will be to try and use directly calls to the vp API when possible. This (should) simplify a lot some parts, like the generation of the covmat, replicas etc (it just requires some care with the input).

For now this has required only one change in the NNPDF side: NNPDF/nnpdf#2268 in order to accept arbitrary objects that are not necessarily LHAPDF names (I put you as a reviewer there @tgiani). I think this is a nice addition and I think I'm happy with the current version since it allows the usage of the functions that @LucianHL already wrote with no changes.

I've also seen a few things that I could speed up in nnpdf which we only use for plots (since the fit code uses tensorflow) but that here are used for the fit.

@LucianHL for now I'm not removing anything but just duplicating and checking that things are numerically equal. I think I'd eventually need some help in preparing some config files because the combination of possible paths / options is not 100% clear to me yet.

For now the only change in behaviour is that the fakes PDF are saved to a temporary folder instead of the lhapdf data path since I don't always have write permissions to that folder.

I'll continue doing some changes in this branch. My short term goal here is to make the computation of the chi2 in just a few calls, which then I can optimize in the nnpdf side, while at the same time keeping everything flexible.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 3, 2025

Thank you @scarlehoff ! That sounds very good - I will have a go at running this to check it works for me and yes no problem just let me know about config files etc

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

At the moment you need that specific branch (that needs newer versions of python). I'll make it work for 3.9 too

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 3, 2025

thanks - the only reason 3.9 is used by default is I think to get lhapdf python to work. I will hold off for now then

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

lhapdf, at least from conda-forge, should (since sometime last year) work for all python 3.9-3.13

In any case, I've modified the branch in nnpdf so it should now work with older versions too.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 4, 2025

Thanks - I have had a quick go and 'reportengine.compat' (yaml is imported from there in the main file) seems to be missing from the reportengine installation as per your README? I might be doing something wrong...

If you want me to hold off running for now while you still make change np just let me know

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Fixed. There was a recent update to reportengine that I apparently forgot about.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 5, 2025

Just to say that is now running for me - the outputs are definitely not right and the runtime is actually pretty similar to before but that may well be because things are still work in progress. Let me know if I can help

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Just to say that is now running for me

Perfect!

For now the calls are correct, the content not so much because I'm still going through the different parameters to understand when/what things should change. I'm able to reproduce the results of the main branch by setting some stuff by hand* but my goal is to be able to understand the code enough to have it in automatic.

*the first call must ignore the parin1 variable and instead use the pdf of PDFlabel_lhin as input, but not sure what the parin1 is at that point

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Btw, one thing that is not clear to me. The config file that you left creates 3 PDFs. Is the last (test_3 or something in that case) what would be the final PDF in the optimization?

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 5, 2025

*the first call must ignore the parin1 variable and instead use the pdf of PDFlabel_lhin as input, but not sure what the parin1 is at that point

Indeed that is not very clear - if lhin is set to True in the config file then the LHAPDF grid with PDFlabel_lhin is used - so this is just an option to output a chi^2 for an arbitrary user-given grid. If you set it to false then `parin1' (which gives the PDF parameters) is used, as read in from the input file in the config (if lhin is True this is not used)

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 5, 2025

Btw, one thing that is not clear to me. The config file that you left creates 3 PDFs. Is the last (test_3 or something in that case) what would be the final PDF in the optimization?

The test_XX (where 'test' is whatever label is set in the config file) grids are the temporary ones that are created from the input file (if lhin is set to true this still happens, i.e. the lhapdf output is regridded, though that is not really necessary). As the default is not running a fit, only 3 are created - there are 3 runs to give the chi^2 in the t_0, experimental definitions and with/without positivity penalties. These should actually be deleted by the end of the code. If you ran a fit there would be many more (though again deleted as you go) due to the minimisation.

The actual fit output is given by the code in src/outputs.py - in outputs/evgrids the grid at $Q=Q_0$ is given (which then needs to be evolved separately with evolven3fit etc), in outputs/pars the PDF parameters etc

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

ok, great. That helps a lot!

And I have a related question, why do you create two points in Q when you create the grid? Is that just because LHAPDF complains otherwise or is there a deeper reason?

Could you send me a runcard for a fit (perhaps the monte carlo version since that's what I'm more used to so it will be easier for me to follow, but both are fine)? To make sure I'm able to reproduce both the computation of the chi2 and the subsequent fit.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 6, 2025

And I have a related question, why do you create two points in Q when you create the grid? Is that just because LHAPDF complains otherwise or is there a deeper reason?

Indeed there is nothing deep there - the LHAPDF routine complains if you don't have at least 2 Q points. I did check that the second higher one is completely redundant though, i.e. by setting the PDF values to zero there etc. It is not used if you are exactly on the first Q_0 point (which you are). I'm sure there is a work around within LHAPDF but that seemed like more trouble than it was worth...

Could you send me a runcard for a fit (perhaps the monte carlo version since that's what I'm more used to so it will be easier for me to follow, but both are fine)? To make sure I'm able to reproduce both the computation of the chi2 and the subsequent fit.

Sure. In terms of a Monte Carlo version, I'm not completely sure of the distinction you mean here? The code performs a fit to a given set of data - or pseudodata. A MC fit would involve running multiple fits to multiple sets of pseudodata (as you know of course!). At the moment that is the kind of thing (to the extent that I have done MC fits, which is not too much, just for a few checks in the paper closure tests) that would be done via an external script.

So I guess we really just need an input card with a chi^2 and an fit output to the unfluctuated real data as the simplest thing? To speed things up probably with just a couple of PDF parameters free as a first step?

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Ah, ok. I thought it might do the replicas automagically. Yes, something simple is enough, so that we can check the result is exactly the same up to numerics.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Feb 6, 2025

No problem - I have created the branch lhl_outs which contains outputs and inputs needed to produce these (in the main directory 'scout...' are the outputs to screen and then in outputs/buffer/ as well though in the latter case only the result from a fit is usefully filled). The configs are config/chi2out_comp_jcm.yaml and config/fit_comp_jcm.yaml and the parameter input files for these are in inputs/

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

I have another question, in write_lha https://github.com/LucianHL/fixpar_nnpdf/blob/63be9966df7645713028e5939b17b366feba516f/src/lhapdf_funs.py#L24 there's the input parin which is used when calling pdfs_msht. Instead, when calling pdfs_diff it is not, in that case inside pdfs_diff the value of pdf_pars.parinarr is used.

Could the array parin be used in both cases? I can see that at least in some of the calls there's an explicit pdf_pars.parinarr[0,:]=parin1 which would suggest yes but I'm not sure whether they keep being the same afterwards?

It would be great if we could remove the side effects of at least the PDF functions (and ideally have them with the same signature so that we can abstract things out).

(and ideally ideally have global_pars.py to be a fixed input/header and less commonblock-like 😅)

@LucianHL
Copy link
Copy Markdown
Collaborator

Hi yes there is no particularly deep reason that things are done like that - the minimisation with pdfs_diff was written later than pdf_msht and involves a few more inputs so I somewhat lazily have read those in via global variables. But there is nothing stopping it being done via input to writelha. This would just involve reading parinarr in to writelha and suitably defining all parameter inputs in terms of this. I can have a go at that if you would like?

(and ideally ideally have global_pars.py to be a fixed input/header and less commonblock-like 😅)

Sure, there is definitely room for improvement!

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

scarlehoff commented Feb 10, 2025

I see.
No, my goal here is removing writelha which means understanding what goes into each of the functions that might get called*. So for now I'll work under the assumption that the parin parameters are the same in both and see when that fails.

*(I decided to start with that goal because it is clear that nothing should change so it is easy to check the final products)

@LucianHL
Copy link
Copy Markdown
Collaborator

Ok thanks - in fact I was not completely clear in my last message I think. Just to clarify the argument parin1 of writelha is used to pass the central PDF parameters, and is used for $\chi^2$ output, either in a fit or not. If a fit is done, then parinarr is used. Only the zeroth set of elements parinarr[0,:] is calculated externally from writelha (it is just set to parin1 as you have identified). The rest - which correspond to finite differences for a given PDF parameter $+ n \epsilon$ are calculated directly in func_pdfs_diff which is called by pdf_diffs. What might confuse you is this if statement:

https://github.com/LucianHL/fixpar_nnpdf/blob/fce87be6c2cbe813ce2e2f66521d6815031d5d93/src/chi2s.py#L142

which is actually not needed at all, and is just a leftover from old code.

… still needed to finish the fit ; added the fit example runcards by Lucian
Copy link
Copy Markdown
Collaborator

@LucianHL LucianHL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! One immediate quick comment to save you time - you can assume that newmin is True from now on - if it is False this gives the older derivative (a brute force difference rather than using the form of the FK convolution) and there is no need to keep this. Similarly anything with diff_2 or diff_4 (which are options that only make sense for newmin being False) can just be deleted and certainly no need to update it.

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Feb 13, 2025

Hi @LucianHL, I have a question. In writelha the grid xarr is build by the function xgrid_calc(), by taking the union of all the 76 grids contained in /input/xarr. This grid is then used to produce the tmp lhapdf files used later in the code, both to compute the chi2 and during the fit, when calling the function pdfs_diff. What are the grids in /input/xarr, how are they built and why do we use them? I m wondering because, if I change this grids as it happens in this branch, we get nonsense when performing the fit

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Thanks!

I get (when running with this version)

Does that mean that the results are very different from main? Or just a bit?

RE: speed and derivatives

I think part of the difference might be that some of the stuff we are using under the hood is trying to use multiple cores so it is penalized running in a single core. But there's still a lot of stuff that can be improved, in particular the derivative could be compiled with something like numba and the part that's calling directly the NNPDF code is using the routines that we use for plotting and that are very much not optimal for being called continously.

@LucianHL
Copy link
Copy Markdown
Collaborator

Does that mean that the results are very different from main? Or just a bit?

Just a bit, i.e. the sort of thing you would expect given the codes are not identical. For e.g. the longmin case the old code gives a final $\chi^2$ of 5747.045 and the new of 5747.062. I would actually be happy if they were even a bit further away, and suspect they might be in some cases with different starting points.

I think part of the difference might be that some of the stuff we are using under the hood is trying to use multiple cores so it is penalized running in a single core. But there's still a lot of stuff that can be improved, in particular the derivative could be compiled with something like numba and the part that's calling directly the NNPDF code is using the routines that we use for plotting and that are very much not optimal for being called continously.

Ok understood. I am not familiar with numba but will have a look. As I say I think procedurally it makes sense long term to have the derivatives done at the level of the PDFs purely but this clearly depends on exactly how quick you can make the calls to the theory calculation itself. If that is no longer a bottleneck then for sure there is no need to write things in that way.

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

5747.045 and the new of 5747.062

That's even too close. I would've expected a bigger change.
Great in any case ^^

@LucianHL
Copy link
Copy Markdown
Collaborator

Indeed! Yes as I say I am pretty sure for other examples the differences may end up being larger than that!

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Interestingly, running the code in my computer I get an even better chi2i = 5710.46724429442
but it might be that whatever numerical differences are there between our installations are enough to push levmar a bit further and out of a local miniumum.

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Apr 28, 2025

Hmm that is a bit surprising! Have you tried running the resulting fixed parameters (from outputs/pars/) back through to see what that gives? Without refitting that is...

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Apr 28, 2025

I m going to try to run it as well on another machine, so we ll have a third datapoint

@LucianHL
Copy link
Copy Markdown
Collaborator

Thanks - in the meantime @scarlehoff if you can pass on the final output parameter file I will have a quick look with that

@LucianHL
Copy link
Copy Markdown
Collaborator

p.s. is this with the short or long min. Or both??

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

With long, short gives me something very close to yours:

chi2o = 5745.0417803549535
chi2i = 5745.134332615947
chi2o < chi2i - next iteration
ntries = 6
chi2i - chi2o < tol : exit
diff2=Falseinput_mshtfit_publication_shortmin.dat

(I'm reading the output files from output/buffer)
mshtfit_publication_longmin_out.txt

@LucianHL
Copy link
Copy Markdown
Collaborator

Thanks, yes that should give you the right answer. As I say if you can pass on the file from outputs/pars then I can do a quick sanity check that it is giving the same $\chi^2$ for me

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Apr 28, 2025

with short I get smt similar to Juan

chi2o = 5745.042293945888
chi2i = 5745.134586023858
chi2o < chi2i - next iteration 
ntries = 6
chi2i - chi2o < tol : exit 
diff2=Falseinput_mshtfit_publication_shortmin.dat

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Apr 28, 2025

with long it is still running, but if I look at the temporary buffer I have

ntries = 7
chi2i = 5733.103451296752
step = 1
lam = 0.0001
chi2o = 5726.5334813429445
chi2i = 5733.103451296752
chi2o < chi2i - next iteration 

so it seems like I will also end up with a lower chi2 than the one reported in publication_tests

@LucianHL
Copy link
Copy Markdown
Collaborator

Thanks ok - I have set a fit going starting directly with the latest version of this branch just to be 100% sure my previous quoted $\chi^2$ isn't with an older version and will have a look at the parameter set from one of your fits if you pass that on. Also having the final buffer will be useful

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Apr 28, 2025

I get

chi2o = 5709.604782221212
chi2i = 5709.689674788425
chi2o < chi2i - next iteration 
ntries = 13
chi2i - chi2o < tol : exit 
diff2=Falseinput_mshtfit_publication_longmin.dat

These should be the parameters
mshtfit_publication_longmin_out.txt

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented Apr 29, 2025

Hi @LucianHL, after running the code using the runcards you provided we get the central value of the pdf, saved for example in grids/mshtfit_publication_longmin_out, correct? I was wondering how I can produce the final PDF set including the eigenvectors for the error computation

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented Apr 29, 2025

Hi both something is not quite working with the newest code- hence why you are apparently seeing lower minima. I also reproduce that with the latest git version. I am pretty sure related to the experimental/$t_0$ covariance matrix question. I am looking into it now

In particular, that output parameter set you have provided does not agree at all with the chi^2 values you are quoting @tgiani . Anyway, more to follow soon - sorry about that

@LucianHL
Copy link
Copy Markdown
Collaborator

That should now be fixed. What was happening here was that the addition I had mentioned in a comment above was not included in this branch and so the t0 covariance matrix from the first input set was being used throughout the minimisation. So the lower chi^2 value was due to the fact that this t0 covariance matrix was being used, i.e. far from matching the final PDF parameters. You will see this if you pass the output from your fits back through the code (at which point the final PDF parameters will be used for the t0 covariance matrix) - i.e. the chi^2 will not match.

This was fixed in the code I had been using and I had wrongly assumed we were matching. Anyway I will set a final longmin check fit going now just to see if it is starting to match what it should (though I have checked that with simpler fits), so if you like please wait a little for me to confirm things are looking good. I have also added a bit more info about running scans to the README

@LucianHL
Copy link
Copy Markdown
Collaborator

Hi both that is working now for me, and the longmin run is giving a $\chi^2$ around 5747.06.

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented May 5, 2025

Hello, thank you, now I m also getting

chi2o = 5747.062471223495
chi2i = 5747.062471279271
chi2o < chi2i - next iteration 
ntries = 21
chi2i - chi2o < tol : exit 
diff2=Falseinput_mshtfit_publication_longmin.dat

@tgiani
Copy link
Copy Markdown
Collaborator

tgiani commented May 5, 2025

Hi @LucianHL , in order to run a scan to compute the PDF error using for example T^2=1, is it enough to use the same runcard used to produce the fit but setting
readcov : True,
cov input file: mshtfit_publication_longmin_out.dat, and
t2_err: 1. ?
Should I also set input file to the output of the fit or it is not necessary? Thanks

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented May 5, 2025

Excellent! Thanks, and sorry for the noise there.

readcov : True,
cov input file: mshtfit_publication_longmin_out.dat, and
t2_err: 1. ?
Should I also set input file to the output of the fit or it is not necessary? Thanks

The input file will not be used as readcov is set to True - the covariance matrix file also includes the central PDF parameter values, which are what are used. Otherwise this is all right, but you will also want to change the name of the label to something different/sensible here to avoid the buffer file from the original fit being overwritten.

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

I think at this stage we can move to the next step which is isolating the parts that are used from nnpdf and improving those

I'll do that in a separate PR to leave this in this state as benchmark.

I'll remove the global pars file so that instead most of the info will be read from runcards like the nnpdf ones (the format of global pars is similar anyway, I just want to avoid having the stateful configuration)

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

One question @LucianHL, were you ever using the t0 covmat for sampling or only the experimental covmat?

@LucianHL
Copy link
Copy Markdown
Collaborator

LucianHL commented May 14, 2025

Thanks @scarlehoff sounds good.

One question @LucianHL, were you ever using the t0 covmat for sampling or only the experimental covmat?

I don't follow - the t0 covmat is used in the minimisation so I'm not sure what you mean.

p.s. I'm on holiday this week so may not be super quick to respond but no problem to answer quick questions like that.

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

By sampling I mean for the creation of the replicas (when doing make_replica). It doesn't really make a difference, but I was wondering what was the case here since the t0 covmat is updated at every step.

p.s. I'm on holiday this week so may not be super quick to respond but no problem to answer quick questions like that.

enjoy the holidays! And don't worry, I'm not managing to be quick to respond even when not on holidays...

@LucianHL
Copy link
Copy Markdown
Collaborator

Ah ok, so make_replica isn't I think ever used - the import command in the code is I think a historical thing (I guess at some point I considered using it?). When I have generated replicas rather than doing a Hessian fit that is not done in a completely streamlined way. Basically the pseudodata are first generated by setting pdout to true in the runcard, along with a few other flags. After that a regular fit is done to this, so using the t0 covmat. I should add that to the README card so everything is clear - are you looking to do this soon?

@scarlehoff
Copy link
Copy Markdown
Collaborator Author

Ah, ok. That's clear then.
Indeed, I saw it there but was not clear to me when was actually being called.

are you looking to do this soon?

No, but I'm isolating the calls to vp to avoid computing the same things many times and wasn't sure about that part. For now I'll leave it be.

@scarlehoff scarlehoff merged commit cb8b758 into main Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants