Adding action to bundle PDFs by siranipour · Pull Request #1340 · NNPDF/nnpdf

siranipour · 2021-07-27T11:52:23Z

An action that closes #1331. You specify the base pdf as pdf and the alphas pdfs for which you want to add the replica 0s from with pdfs.

Example runcard:

pdf: NNPDF31_nnlo_as_0118_DISonly

pdfs:
  - NNPDF31_nnlo_as_0118
  - NNPDF40_nnlo_as_0118


actions_:
  - bundle_pdfs

Edit: you can find the bundled PDF set inside the output folder. Note it won't overwrite the bundled PDF if it already exists

siranipour · 2021-07-27T11:53:17Z

Will add docstrings if we're happy with the design

juanrojochacon · 2021-07-27T11:55:37Z

Not sure I understand. What is "bundle" here doing exactly? And the PDF sets should be

pdf: NNPDF40_nnlo_as_0118

pdfs:

NNPDF40_nnlo_as_0117
NNPDF40_nnlo_as_0119

actions_:

bundle_pdfs

if I understand?

siranipour · 2021-07-27T11:59:00Z

Yeah, the runcard I posted above is just an example to get it working because I didn't know what sets to exactly work with.

The bundle_pdf actions will add the replica 0s and fixup the header files according to #1331 (comment). You can then find the output PDF inside the output folder

juanrojochacon · 2021-07-27T12:01:05Z

ok, the proof is in the pudding, if someone can produce the bundled sets we can check whether or not they behave as they ought to

enocera · 2021-07-27T13:13:07Z

ok, the proof is in the pudding, if someone can produce the bundled sets we can check whether or not they behave as they ought to

@juanrojochacon With the runcard provided by @siranipour it's really trivial to produce a bundled set. I've just produced a bundled set for our NNLO baseline+alphas=0.117&alphas=0.119, you can find it on the server (its name is 210713-n3fit-001_pdfas).

juanrojochacon · 2021-07-27T13:17:20Z

thanks let me check it

juanrojochacon · 2021-07-27T13:47:32Z

If I compute the means and errors by hand from the bundled sets and from the individual sets I get perfect agreement, I think we are happy and should close this PR

x, Q = 1e-05 1.65
reldiff_mean, reldiff_err = 0.0 3.744390438763254e-10

x, Q = 1e-05 10
reldiff_mean, reldiff_err = 0.0 -2.1480599298919797e-08

x, Q = 1e-05 100
reldiff_mean, reldiff_err = 0.0 -1.0969699694816895e-08

x, Q = 1e-05 100000.0
reldiff_mean, reldiff_err = 0.0 -3.490101419494708e-08

x, Q = 0.0001 1.65
reldiff_mean, reldiff_err = 0.0 -1.040580521271848e-08

x, Q = 0.0001 10
reldiff_mean, reldiff_err = 0.0 8.97924623698616e-11

x, Q = 0.0001 100
reldiff_mean, reldiff_err = 0.0 4.4605778137683814e-08

x, Q = 0.0001 100000.0
reldiff_mean, reldiff_err = 0.0 -1.6756881208543032e-06

x, Q = 0.001 1.65
reldiff_mean, reldiff_err = 0.0 2.384716156251005e-08

x, Q = 0.001 10
reldiff_mean, reldiff_err = 0.0 -3.2114232767679965e-07

x, Q = 0.001 100
reldiff_mean, reldiff_err = 0.0 -2.0604607370530934e-07

x, Q = 0.001 100000.0
reldiff_mean, reldiff_err = 0.0 5.425307567213913e-11

x, Q = 0.01 1.65
reldiff_mean, reldiff_err = 0.0 -7.949864162363118e-08

x, Q = 0.01 10
reldiff_mean, reldiff_err = 0.0 -1.8741042399943065e-07

x, Q = 0.01 100
reldiff_mean, reldiff_err = 0.0 -8.092536452211966e-09

x, Q = 0.01 100000.0
reldiff_mean, reldiff_err = 0.0 2.508360570129925e-07

x, Q = 0.1 1.65
reldiff_mean, reldiff_err = 0.0 1.664341933905202e-08

x, Q = 0.1 10
reldiff_mean, reldiff_err = 0.0 1.6308781515132326e-07

x, Q = 0.1 100
reldiff_mean, reldiff_err = 0.0 -1.0234433209722509e-08

x, Q = 0.1 100000.0
reldiff_mean, reldiff_err = 0.0 -8.73322159363136e-08

x, Q = 0.3 1.65
reldiff_mean, reldiff_err = 0.0 -1.554429124628098e-09

juanrojochacon · 2021-07-27T13:48:03Z

Also, can you remind me what is the python command to compute PDF errors using the native LHAPDF routines, rather than our own ones?

Zaharid · 2021-07-27T13:50:09Z

@siranipour Seems to me the choice of module where this is places is a bit obscure. paramfits is really for the alpha_s analysis.
True that there is no obvious place at the moment, but replica_selector.py would be better perhaps. Or a new module somewhere.

Zaharid · 2021-07-27T13:51:30Z

@juanrojochacon A successful PR is "merged", not "closed" ;)

Zaharid · 2021-07-27T13:52:38Z

+
+        # Fixup the info file
+        info_file = (temp_pdf/temp_pdf.name).with_suffix('.info')
+        os.system(f"sed -i -e 's/NumMembers.*/NumMembers: {new_nrep}/g' {info_file}")


I'd rather use YAML at this point instead of the funny external commands.

Yeah so would I but in place file manipulation is a true nightmare in python. Also, is it fair to assume the .info file is in yaml format? The suffix didn't imply it would always be yaml

It is always YAML and LHAPDF uses that internally. No need to do anything "in place" AFAICT. It is load structure from file, manipulate structure, write structure to file (which may or may not be the same as before seeing as it is closed by now).

Are you fine with using sed to prepend the alphas_MZ and alphas_Vals to the replica files?

All things equal I'd rather not. It is one more external thing, with annoying differences between linux and mac. And we should have plenty of tools to do it the right way.

enocera · 2021-07-27T13:53:10Z

@juanrojochacon A successful PR is "merged", not "closed" ;)

@juanrojochacon Which does not mean that we encourage you to merge this PR. Review is still pending.

Zaharid · 2021-07-27T14:10:04Z

As an aside, I am thinking that if we ever want to support this ourselves, we would need things like

def _find_as_variation_theories(main_theoryid: int) -> List[int]
    ...

def find_as_variation_theory_for(main_thoeryid: int, alpha_s: Real): -> int
    ...

Could be done with a bit of sql magic, or my hardcoding these things in python. Similar for scale variations incidentally.

juanrojochacon · 2021-07-27T14:27:41Z

Sorry ;)

siranipour · 2021-08-03T10:22:10Z

@Zaharid, I've removed our reliance on sed now and use open to handle the file I/O.

Co-authored-by: Zaharid <zk261@cam.ac.uk>

Zaharid · 2021-08-03T11:01:27Z

I still thing the action should be named differently.

Zaharid · 2021-08-03T12:20:30Z

+    """
+    info_file = pathlib.Path(alphas_pdf.infopath)
+
+    with open(info_file, 'r') as stream:


Not that it matters, but these things should operate on bytes rather than text.

Ah didn't know that, will fix

Zaharid · 2021-08-03T16:17:18Z

    info_file = pathlib.Path(alphas_pdf.infopath)

-    with open(info_file, 'r') as stream:
+    with open(info_file, 'rb') as stream:


Actually the only one that mattered a bit was new_replica_file (in and out). Reason being to avoid somewhat expensive utf 8 conversions on the relatively many big files.

Zaharid · 2021-08-03T16:23:44Z

@siranipour please delete the grid named NNPDF40_nnlo_as_0118. Would tell you to also destroy your computer but computers are hard to replace these days... See #1070 (comment)

Add type checks to target name.

For all we know the current path may not be writable or accessible.

Zaharid · 2021-08-10T17:56:18Z

@siranipour Please add some docs and merge.

Add comment to description on alpha_s variations. Also use the newer interface to make sure it round trips.

Adding action to bundle PDFs

525255a

siranipour requested review from Zaharid, enocera and juanrojochacon July 27, 2021 11:52

juanrojochacon closed this Jul 27, 2021

Zaharid reopened this Jul 27, 2021

siranipour reopened this Jul 27, 2021

Zaharid reviewed Jul 27, 2021

View reviewed changes

Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated

Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated

Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated

Zaharid reviewed Jul 27, 2021

View reviewed changes

Comment thread validphys2/src/validphys/paramfits/dataops.py Outdated

Moving function

72aafe8

siranipour force-pushed the bundle_pdfs branch from a5fb217 to 72aafe8 Compare July 27, 2021 14:25

Removing reliance on sed

6a70a7b

Adding doc strings

01b034c

Zaharid reviewed Aug 3, 2021

View reviewed changes

Comment thread validphys2/src/validphys/replica_selector.py Outdated

Update validphys2/src/validphys/replica_selector.py

0f65e5d

Co-authored-by: Zaharid <zk261@cam.ac.uk>

Zaharid reviewed Aug 3, 2021

View reviewed changes

Renaming function and operate on bytes

116fe17

siranipour force-pushed the bundle_pdfs branch from ff4d4f5 to 116fe17 Compare August 3, 2021 14:52

Zaharid reviewed Aug 3, 2021

View reviewed changes

Zaharid added 3 commits August 10, 2021 17:41

Simplify logic in replica fixup

e754e85

Add type checks to target name.

Store temporary folder in output_path

fa419b3

For all we know the current path may not be writable or accessible.

Add target name check

c911a8e

Zaharid approved these changes Aug 10, 2021

View reviewed changes

Zaharid added 3 commits August 10, 2021 18:59

Delete old contents if they exist

17075a0

Improve info writing

0b856ee

Add comment to description on alpha_s variations. Also use the newer interface to make sure it round trips.

Use bytes as a minor optimization

927ac08

Zaharid force-pushed the bundle_pdfs branch from a16d27c to 927ac08 Compare August 10, 2021 17:59

Adding docs entry

a5ab318

Zaharid merged commit 816f6aa into master Aug 13, 2021

Zaharid deleted the bundle_pdfs branch August 13, 2021 18:03

Zaharid added the enhancement New feature or request label Oct 28, 2021

Conversation

siranipour commented Jul 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siranipour commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

siranipour commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

enocera commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

Zaharid commented Jul 27, 2021

Uh oh!

Zaharid commented Jul 27, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enocera commented Jul 27, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zaharid commented Jul 27, 2021

Uh oh!

juanrojochacon commented Jul 27, 2021

Uh oh!

siranipour commented Aug 3, 2021

Uh oh!

Uh oh!

Zaharid commented Aug 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zaharid commented Aug 3, 2021

Uh oh!

Zaharid commented Aug 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

siranipour commented Jul 27, 2021 •

edited

Loading