Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 113 additions & 1 deletion doc/sphinx/source/vp/pydataobjs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ computation and storage strategies.
Loading FKTables
----------------

Currently only FKTables can be directly without C++ code. This is implemented
This is implemented
in the :py:mod:`validphys.fkparser` module. For example::

from validphys.fkparser import load_fktable
Expand Down Expand Up @@ -143,3 +143,115 @@ central replica is the same as the mean of the replica predictions::
# Compute the size of the differences between approximate and true predictions
# over the PDF uncertainty. Take the maximum over the three ttbar data points.
print(((p - lp).std() / p.std()).max())

Loading CommonData
------------------

The underlying functions for loading CommonData can be found in
:py:mod:`validphys.commondataparser`. The data is loaded
as :py:class:`validphys.coredata.CommonData`, which uses the
`dataclasses <https://docs.python.org/3/library/dataclasses.html>`_ module
which automatically generates some special methods for the class. The
underlying data is stored as DataFrames, and so can be used
with the standard pandas machinery::

import pandas as pd

from validphys.api import API
from validphys.commondataparser import load_commondata
# define dataset settings
ds_input={'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10}
# first get the CommonDataSpec
cd = API.commondata(dataset_input=ds_input)
lcd = load_commondata(cd)
assert isinstance(lcd.central_values, pd.Series)
assert isinstance(lcd.systematics_table, pd.DataFrame)

The :py:class:`validphys.coredata.CommonData` class has a method which returns
a new instance of the class with cuts applied::

from validphys.api import API
from validphys.commondataparser import load_commondata
# define dataset and additional settings
ds_input={'dataset': 'CMSZDIFF12', 'cfac':('QCD', 'NRM'), 'sys':10}
inp = {
"dataset_input": ds_input,
"use_cuts": "internal",
"theoryid": 162
}
# first get the CommonDataSpec
cd = API.commondata(**inp)
lcd = load_commondata(cd)
# CommonDataSpec object ndata is always total data points uncut
assert lcd.ndata == cd.ndata
cuts = API.cuts(**inp)
lcd_cut = lcd.with_cuts(cuts)
# data has been cut, ndata should have changed.
assert lcd_cut.ndata != cd.ndata

An action already exists which returns the loaded and cut commondata, which is
more convenient than calling the underlying functions::

api_lcd_cut = API.loaded_commondata_with_cuts(**inp)
assert api_lcd_cut.ndata == lcd_cut.ndata

Loading Covariance Matrices
---------------------------

Functions which take :py:class:`validphys.coredata.CommonData` s and return
covariance matrices can be found in
:py:mod:`validphys.covmats`. As with the commondata
the functions can be called in scripts directly::

import numpy as np
from validphys.api import API
from validphys.covmats import covmat_from_systematics

inp = {
"dataset_input": {"dataset":"NMC"},
"use_cuts": "internal",
"theoryid": 162
}
lcd = API.loaded_commondata_with_cuts(**inp)
cov = covmat_from_systematics(lcd)
assert isinstance(cov, np.ndarray)
assert cov.shape == (lcd.ndata, lcd.ndata)

There exists a similar function which acts upon a list of multiple commondatas
and takes into account correlations between datasets::

from validphys.covmats import dataset_inputs_covmat_from_systematics
inp = {
"dataset_inputs": [
{"dataset":"NMC"},
{"dataset":"NMCPD"},
],
"use_cuts": "internal",
"theoryid": 162
}
lcds = API.dataset_inputs_loaded_cd_with_cuts(**inp)
total_ndata = np.sum([lcd.ndata for lcd in lcds])
total_cov = dataset_inputs_covmat_from_systematics(lcds)
assert total_cov.shape == (total_ndata, total_ndata)

These functions are also actions, which can be accessed directly
from the API::

from validphys.api import API

inp = {
"dataset_input": {"dataset":"NMC"},
"use_cuts": "internal",
"theoryid": 162
}
# single dataset covmat
cov = API.covmat_from_systematics(**inp)
inp = {
"dataset_inputs": [
{"dataset":"NMC"},
{"dataset":"NMCPD"},
],
"use_cuts": "internal",
"theoryid": 162
}
total_cov = API.dataset_inputs_covmat_from_systematics(**inp)
2 changes: 1 addition & 1 deletion validphys2/src/validphys/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

providers = [
'validphys.results',
'validphys.results_providers',
'validphys.commondata',
'validphys.pdfgrids',
'validphys.pdfplots',
'validphys.dataplots',
Expand Down
32 changes: 32 additions & 0 deletions validphys2/src/validphys/commondata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""
commondata.py

Module containing actions which return loaded commondata, leverages utils
found in :py:mod:`validphys.commondataparser`, and returns objects from
:py:mod:`validphys.coredata`

"""
from reportengine import collect

from validphys.commondataparser import load_commondata

def loaded_commondata_with_cuts(commondata, cuts):
"""Load the commondata and apply cuts.

Parameters
----------
commondata: validphys.core.CommonDataSpec
commondata to load and cut.
cuts: validphys.core.cuts, None
valid cuts, used to cut loaded commondata.

Returns
-------
loaded_cut_commondata: validphys.coredata.CommonData

"""
lcd = load_commondata(commondata)
return lcd.with_cuts(cuts)

dataset_inputs_loaded_cd_with_cuts = collect(
"loaded_commondata_with_cuts", ("data_input",))
Loading