Skip to content

New ht thcovmat#2126

Open
achiefa wants to merge 41 commits into
masterfrom
new_ht_thcovmat
Open

New ht thcovmat#2126
achiefa wants to merge 41 commits into
masterfrom
new_ht_thcovmat

Conversation

@achiefa
Copy link
Copy Markdown
Contributor

@achiefa achiefa commented Jul 15, 2024

This PR allows for the inclusion of theory uncertainties due to the effect of power corrections. The theory covariance matrix is constructed by computing the shifts for the theoretical predictions as done for the MHOUs. The shift is computed at the level of the structure functions. Then, the shifts for the structure functions are combined to reconstruct the shift for the xsec. For this reason, the calculation of the shift depends on the dataset. Currently, only 1-JET and DIS (NC and CC) are supported.

At the level of the runcard, power corrections are specified as follows

theorycovmatconfig:
  point_prescriptions: ["power corrections"]
  pc_parameters:
  - {ht: H2p, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: H2d, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: HLp, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: HLd, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: H3p, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: H3d, yshift: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0], nodes: [0.0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1]}
  - {ht: Hj, yshift: [2.0, 2.0, 2.0, 2.0, 2.0, 2.0], nodes: [0.25, 0.75, 1.25, 1.75, 2.25, 2.75]}
  pc_included_procs: ["JETS", "DIS NC", "DIS CC"]
  pc_excluded_exps: [HERA_NC_318GEV_EAVG_CHARM-SIGMARED,
                     HERA_NC_318GEV_EAVG_BOTTOM-SIGMARED]
  pdf: 240701-02-rs-nnpdf40-baseline
  use_thcovmat_in_fitting: true
  use_thcovmat_in_sampling: true

For each process/sf implemented, we need to specify a series of information:

  1. The name of the power correction, under the key ht. This is useful to identify the type of power correction, in particular when computing the posterior.
  2. The values in yshift are the magnitudes of the prior.
  3. nodes contains the points where the prior is shifted

The array pc_included_procs specifies the processes for which the shifts are computed. I've also implemented the possibility to exclude some particular datasets within the selected processes, and this can be done by specifying the names in pc_excluded_exps.

The key func_type is temporary and will be deleted once we decide which function to use to construct the prior.

All the relevant details for this PR are contained in the module higher_twist_functions.py. For each observable for which the shift has to be computed, I implemented a factory that constructs a function which will then compute the shift. I thought it was kind of necessary to make the shift dependent on the shift parameters (yshift and nodes) and on the prescription according to which we vary the parameters (which for now is fixed), and not on the kinematics (I think this is known as currying in computer science).

TO DO

  • Add documentation
  • [x] Remove func_type
  • What happens if power_corrections is specified but the parameters are not given?
  • Change the name ht to name (?)
  • Something else?

@achiefa achiefa added the run-fit-bot Starts fit bot from a PR. label Jul 15, 2024
@achiefa achiefa requested a review from RoyStegeman July 15, 2024 10:11
@achiefa achiefa self-assigned this Jul 15, 2024
@github-actions
Copy link
Copy Markdown

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@scarlehoff scarlehoff marked this pull request as draft July 17, 2024 14:26
Comment thread n3fit/src/n3fit/layers/DIS.py Outdated
Comment thread n3fit/src/n3fit/layers/DIS.py Outdated
Comment thread n3fit/src/n3fit/layers/DIS.py Outdated
Comment thread n3fit/src/n3fit/scripts/vp_setupfit.py Outdated
Comment thread n3fit/src/n3fit/scripts/n3fit_exec.py Outdated
Comment thread validphys2/src/validphys/results.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
@achiefa achiefa force-pushed the new_ht_thcovmat branch 2 times, most recently from 28662e8 to 6b71fae Compare October 11, 2024 22:16
@achiefa achiefa force-pushed the new_ht_thcovmat branch 2 times, most recently from 348be41 to 9ac473c Compare January 14, 2025 14:06
@achiefa
Copy link
Copy Markdown
Contributor Author

achiefa commented Jan 14, 2025

Hi @roy, I've updated the code so that now the covmat for power corrections can be constructed as in the case of scale variations. Please, look and let me know if something is unclear. I'd like if you could double-check the functions that compute the shifts, in particular when normalisation factors and conversion factors are used.

There are few things that I still don't like here and there, but you're free to propose modifications.

@achiefa achiefa added run-fit-bot Starts fit bot from a PR. and removed run-fit-bot Starts fit bot from a PR. labels Jan 14, 2025
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
@achiefa achiefa removed the run-fit-bot Starts fit bot from a PR. label Jan 14, 2025
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
@github-actions
Copy link
Copy Markdown

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@achiefa achiefa marked this pull request as ready for review January 30, 2025 14:13
@achiefa achiefa requested a review from scarlehoff February 21, 2025 15:37
Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @achiefa thanks for this.

I've left a few comments on the code. The 1000line higher_twist_functions I haven't look into in much detail, but one thing that I'd like to mention is please avoid using commondata_table. This is basically an object containing all commondata information like in the old days which was necessary to add for some of validphys functions but the idea is to instead get only the information you want.

For instance, the process type is part of the metadata now (no need to load the data and uncertainties!)
Same for the kinematics (although by the time you need the kinematics you often have already read the whole commondata). You can get the kinematics directly with cd.metadata.load_kinematics() and then you already have x, Q, and y for DIS as god intended!

Comment thread n3fit/runcards/examples/Basic_runcard_pc_covmat.yml
Comment thread validphys2/examples/theory_covariance/chi2table_ht.yaml Outdated
Comment thread validphys2/src/validphys/commondata.py Outdated
Comment thread validphys2/src/validphys/config.py Outdated
Comment thread validphys2/src/validphys/config.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/higher_twist_functions.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/higher_twist_functions.py Outdated
Comment thread validphys2/src/validphys/theorycovariance/higher_twist_functions.py Outdated
achiefa added 16 commits March 9, 2026 14:09
Excluded HERACOMB

Hacking NMC dataset

Grouping kinematics

Reimplementing thcovmat

Added comment in HT for DIS

Corrected normalisation for SIGMARED DIS NC data sets

Removing unused code

Added HT for F2C data (EMC) - removed deprecated function

Corrected EMC data iron target

Removed deprecated code

Refactoring + DIS CC

Corrected bug - ready for cc test

Corrected bug - ready

Removing unnecessary code

Corrected bug after rebase

Add normalisation in CC x-secs

Correct normalisation

Restore n3fit files from master

remove _PB suffix from process type

format a bit

Correct typo + example runcard

Update for new thcovmat construction + refactor + docstrings

Correct bug

Correct nuclear factors for nuclear targets

First implementation of jet data

Change pc jet dependence from pT to eta

Allowing step-function for the prior

Vectorize step_function + docstring

Correct bug in step function

Correct bug in step function

Produce covs_pt_prescrip

Allow different funcs for posterior

Adjusting linear triangular function

Correct bug in linear function

Implement multiplicative PC for jet

Correct docstring

Remove unused vp runcard

Dijet + clean-up + checks for pc dict

Remove translation layer for pc parameters

Remove copy of the same collection procs_data -> groups_data_by_process + clean-up

Remove unused collect

Update basic runcard

Allow generation of L1 data

Jets with single parameters

Adjust format

Restoring nodes for jets and dijets
Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but haven't run the code yet. I've also look at higher_twist_functions very superficially for now, just got curious about the cuts.

I'm a bit worried about all the if conditions that this one prescription adds and whether it breaks some assumptions in the code. In particular the one where theoryid(s) is a one-item list...

process_type: DIJET
# Plotting
kinematic_coverage: [ydiff, m_jj, sqrts]
kinematic_coverage: [ymax, m_jj, sqrts]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes necessary / should they be in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this with Emanuele, and we concluded that ydiff is equivalent to ymax. Since the other dijet datasets adopt ymax, and the name of the variables is processed in the construction of the theory covmat, I changed it to ymax

Comment thread validphys2/src/validphys/theorycovariance/construction.py Outdated
size2 = deltas2[0].size
s = np.zeros(shape=(size1, size2))
for shift1, shift2 in zip(deltas1, deltas2):
s += np.outer(shift1, shift2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says that it is using the 5 point prescription, but this is different from the covmat_5pt function. Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fear, I don't know what to call this. Richard called this generalised 5-pt. The point is that we have a shift for each ht parameter. Moreover, since each shift is linear in the parameter, the outer products of positive and negative shifts are equivalent. That's way there is no 0.5 factor. In few words, is a generalised 5-pt prescription where positive and negative shifts are equivalent. Happy to change the name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem with the name, with a mention to the difference in the docstr I'd be happy.

def covs_pt_prescrip_mhou(combine_by_type, point_prescription):
"""Produces the sub-matrices of the theory covariance matrix according
to a point prescription which matches the number of input theories.
If 5 theories are provided, a scheme 'bar' or 'nobar' must be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this single line fallen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what happened here. I'll restore this line.

start_proc_by_exp = defaultdict(list)
for exp_name, data_spec in datagroup_spec.items():
start_proc_by_exp[exp_name] = running_index
running_index += data_spec.load_commondata().ndata
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried here, you are loading the commondata and taking the number of data point.
But then in the loop below, inside compute_deltas_pc you are cutting the data...

are you sure this is being done consistently? (in other words, are you sure that load_commmondata here is happening after cuts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might give me mercy, right?

def load_commondata(self):
"""Strips the commondata loading from `load`"""
cd = self.commondata.load()
if self.cuts is not None:
loaded_cuts = self.cuts.load()
if not (hasattr(loaded_cuts, '_full') and loaded_cuts._full):
intmask = [int(ele) for ele in loaded_cuts]
cd = cd.with_cuts(intmask)
return cd

Copy link
Copy Markdown
Member

@scarlehoff scarlehoff May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess never load the data without cuts now. That's good then :)

Experience is a prison sometimes.


pc_type = get_pc_type(exp_name, process_type, experiment=experiment, pc_dict=pc_dict)

if process_type.startswith('DIS'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counting the days until someone call a process DISASTER for some reason 🙈

I would perhaps also guard against processes that have no x or no Q2.

This hard codes the theories needed for each prescription to avoid user error."""
th = t0id.id
if point_prescription == 'power corrections':
return NSList([t0id], nskey="theoryid")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many things across the code will assume that a theory covmat needs more than one theory...

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants