Skip to content

Adding action to bundle PDFs#1340

Merged
Zaharid merged 13 commits into
masterfrom
bundle_pdfs
Aug 13, 2021
Merged

Adding action to bundle PDFs#1340
Zaharid merged 13 commits into
masterfrom
bundle_pdfs

Conversation

@siranipour
Copy link
Copy Markdown
Contributor

@siranipour siranipour commented Jul 27, 2021

An action that closes #1331. You specify the base pdf as pdf and the alphas pdfs for which you want to add the replica 0s from with pdfs.

Example runcard:

pdf: NNPDF31_nnlo_as_0118_DISonly

pdfs:
  - NNPDF31_nnlo_as_0118
  - NNPDF40_nnlo_as_0118


actions_:
  - bundle_pdfs

Edit: you can find the bundled PDF set inside the output folder. Note it won't overwrite the bundled PDF if it already exists

@siranipour
Copy link
Copy Markdown
Contributor Author

Will add docstrings if we're happy with the design

@juanrojochacon
Copy link
Copy Markdown

Not sure I understand. What is "bundle" here doing exactly? And the PDF sets should be

pdf: NNPDF40_nnlo_as_0118

pdfs:

  • NNPDF40_nnlo_as_0117
  • NNPDF40_nnlo_as_0119

actions_:

  • bundle_pdfs

if I understand?

@siranipour
Copy link
Copy Markdown
Contributor Author

Yeah, the runcard I posted above is just an example to get it working because I didn't know what sets to exactly work with.

The bundle_pdf actions will add the replica 0s and fixup the header files according to #1331 (comment). You can then find the output PDF inside the output folder

@juanrojochacon
Copy link
Copy Markdown

ok, the proof is in the pudding, if someone can produce the bundled sets we can check whether or not they behave as they ought to

@enocera
Copy link
Copy Markdown
Contributor

enocera commented Jul 27, 2021

ok, the proof is in the pudding, if someone can produce the bundled sets we can check whether or not they behave as they ought to

@juanrojochacon With the runcard provided by @siranipour it's really trivial to produce a bundled set. I've just produced a bundled set for our NNLO baseline+alphas=0.117&alphas=0.119, you can find it on the server (its name is 210713-n3fit-001_pdfas).

@juanrojochacon
Copy link
Copy Markdown

thanks let me check it

@juanrojochacon
Copy link
Copy Markdown

If I compute the means and errors by hand from the bundled sets and from the individual sets I get perfect agreement, I think we are happy and should close this PR

x, Q = 1e-05 1.65
reldiff_mean, reldiff_err = 0.0 3.744390438763254e-10

x, Q = 1e-05 10
reldiff_mean, reldiff_err = 0.0 -2.1480599298919797e-08

x, Q = 1e-05 100
reldiff_mean, reldiff_err = 0.0 -1.0969699694816895e-08

x, Q = 1e-05 100000.0
reldiff_mean, reldiff_err = 0.0 -3.490101419494708e-08

x, Q = 0.0001 1.65
reldiff_mean, reldiff_err = 0.0 -1.040580521271848e-08

x, Q = 0.0001 10
reldiff_mean, reldiff_err = 0.0 8.97924623698616e-11

x, Q = 0.0001 100
reldiff_mean, reldiff_err = 0.0 4.4605778137683814e-08

x, Q = 0.0001 100000.0
reldiff_mean, reldiff_err = 0.0 -1.6756881208543032e-06

x, Q = 0.001 1.65
reldiff_mean, reldiff_err = 0.0 2.384716156251005e-08

x, Q = 0.001 10
reldiff_mean, reldiff_err = 0.0 -3.2114232767679965e-07

x, Q = 0.001 100
reldiff_mean, reldiff_err = 0.0 -2.0604607370530934e-07

x, Q = 0.001 100000.0
reldiff_mean, reldiff_err = 0.0 5.425307567213913e-11

x, Q = 0.01 1.65
reldiff_mean, reldiff_err = 0.0 -7.949864162363118e-08

x, Q = 0.01 10
reldiff_mean, reldiff_err = 0.0 -1.8741042399943065e-07

x, Q = 0.01 100
reldiff_mean, reldiff_err = 0.0 -8.092536452211966e-09

x, Q = 0.01 100000.0
reldiff_mean, reldiff_err = 0.0 2.508360570129925e-07

x, Q = 0.1 1.65
reldiff_mean, reldiff_err = 0.0 1.664341933905202e-08

x, Q = 0.1 10
reldiff_mean, reldiff_err = 0.0 1.6308781515132326e-07

x, Q = 0.1 100
reldiff_mean, reldiff_err = 0.0 -1.0234433209722509e-08

x, Q = 0.1 100000.0
reldiff_mean, reldiff_err = 0.0 -8.73322159363136e-08

x, Q = 0.3 1.65
reldiff_mean, reldiff_err = 0.0 -1.554429124628098e-09

@juanrojochacon
Copy link
Copy Markdown

Also, can you remind me what is the python command to compute PDF errors using the native LHAPDF routines, rather than our own ones?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 27, 2021

@siranipour Seems to me the choice of module where this is places is a bit obscure. paramfits is really for the alpha_s analysis.
True that there is no obvious place at the moment, but replica_selector.py would be better perhaps. Or a new module somewhere.

@Zaharid Zaharid reopened this Jul 27, 2021
@siranipour siranipour reopened this Jul 27, 2021
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 27, 2021

@juanrojochacon A successful PR is "merged", not "closed" ;)


# Fixup the info file
info_file = (temp_pdf/temp_pdf.name).with_suffix('.info')
os.system(f"sed -i -e 's/NumMembers.*/NumMembers: {new_nrep}/g' {info_file}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use YAML at this point instead of the funny external commands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so would I but in place file manipulation is a true nightmare in python. Also, is it fair to assume the .info file is in yaml format? The suffix didn't imply it would always be yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is always YAML and LHAPDF uses that internally. No need to do anything "in place" AFAICT. It is load structure from file, manipulate structure, write structure to file (which may or may not be the same as before seeing as it is closed by now).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you fine with using sed to prepend the alphas_MZ and alphas_Vals to the replica files?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All things equal I'd rather not. It is one more external thing, with annoying differences between linux and mac. And we should have plenty of tools to do it the right way.

@enocera
Copy link
Copy Markdown
Contributor

enocera commented Jul 27, 2021

@juanrojochacon A successful PR is "merged", not "closed" ;)

@juanrojochacon Which does not mean that we encourage you to merge this PR. Review is still pending.

Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated
Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated
Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated
Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 27, 2021

As an aside, I am thinking that if we ever want to support this ourselves, we would need things like

def _find_as_variation_theories(main_theoryid: int) -> List[int]
    ...

def find_as_variation_theory_for(main_thoeryid: int, alpha_s: Real): -> int
    ...

Could be done with a bit of sql magic, or my hardcoding these things in python. Similar for scale variations incidentally.

@juanrojochacon
Copy link
Copy Markdown

Sorry ;)

@siranipour
Copy link
Copy Markdown
Contributor Author

@Zaharid, I've removed our reliance on sed now and use open to handle the file I/O.

Comment thread validphys2/src/validphys/replica_selector.py Outdated
Co-authored-by: Zaharid <zk261@cam.ac.uk>
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Aug 3, 2021

I still thing the action should be named differently.

"""
info_file = pathlib.Path(alphas_pdf.infopath)

with open(info_file, 'r') as stream:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters, but these things should operate on bytes rather than text.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't know that, will fix

info_file = pathlib.Path(alphas_pdf.infopath)

with open(info_file, 'r') as stream:
with open(info_file, 'rb') as stream:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the only one that mattered a bit was new_replica_file (in and out). Reason being to avoid somewhat expensive utf 8 conversions on the relatively many big files.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Aug 3, 2021

@siranipour please delete the grid named NNPDF40_nnlo_as_0118. Would tell you to also destroy your computer but computers are hard to replace these days... See #1070 (comment)

Add type checks to target name.
For all we know the current path may not be writable or accessible.
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Aug 10, 2021

@siranipour Please add some docs and merge.

Add comment to description on alpha_s variations.
Also use the newer interface to make sure it round trips.
@Zaharid Zaharid merged commit 816f6aa into master Aug 13, 2021
@Zaharid Zaharid deleted the bundle_pdfs branch August 13, 2021 18:03
@Zaharid Zaharid added the enhancement New feature or request label Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Script to produce bundled PDF set

4 participants