diff --git a/doc/sphinx/source/data/plotting_format.md b/doc/sphinx/source/data/plotting_format.md index 1a92d123b5..069691b3b5 100644 --- a/doc/sphinx/source/data/plotting_format.md +++ b/doc/sphinx/source/data/plotting_format.md @@ -85,9 +85,8 @@ a substring). Currently they are: 'SIA': ('$z$', '$Q^2 (GeV^2)$', '$y$') ``` -This mapping is declared as `CommonData.kinLabel_latex` in the C++ -code (and accessible as `validphys.plotoptions.core.kinlabels_latex` -in the Python code). +This mapping is declared as `validphys.commondataparser.KINLABEL_LATEX` +in the python code. The three kinematic variables are referred to as `k1`, `k2` and `k3` in the plotting files. For example, for DIS processes, `k1` refers to `x`, diff --git a/doc/sphinx/source/tutorials/addspecialgrouping.rst b/doc/sphinx/source/tutorials/addspecialgrouping.rst index adf9ef5028..8745e58c4a 100644 --- a/doc/sphinx/source/tutorials/addspecialgrouping.rst +++ b/doc/sphinx/source/tutorials/addspecialgrouping.rst @@ -82,11 +82,6 @@ which tells the code to look for an ``nnpdf40_process`` key within the metadata to attempt to parse it as a string. We do not attribute a default value to this new key, which implies that it must be provided within the metadata file. -.. note:: - The reason the group name should be a string is because it is sometimes - passed to the C++ code through the SWIG interface, which is very strict - about the typing you use. - In addition to this, you must add the new grouping to :py:class:`validphys.plotoptions.core.PlotInfo` as a keyword arguments of the ``__init__`` function and subsequently as an attribute of the class diff --git a/doc/sphinx/source/vp/custom_pipelines.rst b/doc/sphinx/source/vp/custom_pipelines.rst index 6c358d21b9..be61541375 100644 --- a/doc/sphinx/source/vp/custom_pipelines.rst +++ b/doc/sphinx/source/vp/custom_pipelines.rst @@ -138,9 +138,7 @@ a missing resource. The functions of type `check_` should take the information processed by the Config class and verify that a given resource is correct. If so, they should return a "Resource specification" (something typically containing metadata information -such as paths, and a `load()` method to get the C++ object from -`libnnpdf`). We also define a `get` method that returns the C++ object -directly. +such as paths, which are necessary to load the final commondata or fktable) In the case of the positivity set, this is entirely given in terms of existing check functions: @@ -160,31 +158,28 @@ existing check functions: A more complicated example should raise the appropriate loader errors (see the other examples in the class). -The `PositivitySetSpec` could be defined roughly like: +The `PositivitySet` inherits in the code from `DataSetSpec` +but one could roughly define it as: .. code:: python - class PositivitySetSpec(): - def __init__(self, commondataspec, fkspec, poslambda, thspec): - self.commondataspec = commondataspec - self.fkspec = fkspec - self.poslambda = poslambda - self.thspec = thspec + class PositivitySetSpec(): + def __init__(self, commondataspec, fkspec, poslambda, thspec): + self.commondataspec = commondataspec + self.fkspec = fkspec + self.poslambda = poslambda + self.thspec = thspec - @property - def name(self): - return self.commondataspec.name + @property + def name(self): + return self.commondataspec.name - def __str__(self): - return self.name + def __str__(self): + return self.nam - @functools.lru_cache() - def load(self): - cd = self.commondataspec.load() - fk = self.fkspec.load() - return PositivitySet(cd, fk, self.poslambda) -Here `PositivitySet` is the `libnnpdf` object. It is generally better +This contains all necessary information for `validphys` to be able to load +the relevant `fktable`. It is generally better to pass around the spec objects because they are lighter and have more information (e.g. the theory in the above example). @@ -227,9 +222,7 @@ Computing PDF-dependent quantities ---------------------------------- Now that we can receive positivity sets as input, let's do something -with them. The SWIG wrappers allow us to call the C++ methods of -`libnnpdf` from Python. These things go in the `validphys.results` -module. We can start by defining a class to produce and hold the +with them. We can start by defining a class to produce and hold the results: .. code:: python @@ -255,7 +248,7 @@ way it allows to abstract away the different error types. One constructs an object inheriting from `validphys.core.Stats` that is appropriate for a given error type by calling `pdf.stats_class(data)`, where data is an array where the entries along the first dimension are -the results from each member computed from `libnnpdf` (and the other +the results from each member (and the other dimensions are arbitrary). `Stats` has methods that appropriately collapse along the first axis. For example, `central_value` computes the mean along the first axis for Monte Carlo PDFs and yields the diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index 8a7c4a7802..014947c3ae 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -68,8 +68,8 @@ are ``dataset_input``, ``cuts`` and ``theoryid``. It seems odd to require theory settings such as a ``theoryid`` in the ``dataset_input`` in order to load data. However, this is a relic of the - underlying C++ code that performs the loading of data, which intrinsically - groups together the commondata (CSVs containing data central values and + legacy C++ code that performs the loading of data, which intrinsically + grouped together the commondata (CSVs containing data central values and uncertainties) and :ref:`fktables`. Clearly there is a big margin for error when manually entering @@ -86,15 +86,14 @@ The ``DataSetSpec`` contains all of the information used to construct it, e.g. >>> ds_spec.name 'CMSZDIFF12' -but also importantly has a ``load`` method, which returns an instance of the -``DataSet`` that is generated from the C++ code using SWIG. This new object -contains numpy arrays of data central values and experimental covariance +but also importantly has a ``load_commondata`` method, which returns an instance of the +``CommonData``. This new object contains numpy arrays of data central values and experimental covariance matrices, e.g: .. code:: python - >>> ds_libnnpdf = ds_spec.load() - >>> ds_libnnpdf.get_cv() # get central values of dataset + >>> cd = ds_spec.load_commondata() + >>> cd.get_cv() # get central values of dataset array([2917. , 1074. , 460.5 , 222.6 , 109.8 , 61.84, 30.19, 2863. , 1047. , 446.1 , 214.5 , 110. , 58.13, 29.85, 2588. , 935.5 , 416.3 , 199. , 103.1 , 54.06, 28.45, diff --git a/doc/sphinx/source/vp/datthcomp.md b/doc/sphinx/source/vp/datthcomp.md index 4ba5324bc5..004d8ca89f 100644 --- a/doc/sphinx/source/vp/datthcomp.md +++ b/doc/sphinx/source/vp/datthcomp.md @@ -27,8 +27,8 @@ such they are assumed to be correct, so in principle they have no guarantee of failing early with a good error message. However, you can set `check_plotting: True` in the input configurations to cause the PLOTTING files to be processed as soon as the dataset is loaded. This -can be useful while debugging the plotting files, but will cause -a noticeable delay to the startup (because the C++ DataSet objects -need to be loaded in memory). This will warn the user of missing plotting files +can be useful while debugging the plotting files, but might cause +a noticeable delay to the startup (due to loading datasets and fktables). +This will warn the user of missing plotting files and produce nice early error messages if the configuration is not processed correctly. diff --git a/doc/sphinx/source/vp/developer.rst b/doc/sphinx/source/vp/developer.rst index 9ab9f2db0a..9e77b97832 100644 --- a/doc/sphinx/source/vp/developer.rst +++ b/doc/sphinx/source/vp/developer.rst @@ -22,9 +22,7 @@ Some of the most important modules are - `validphys.core` Core data structures that represent objects such as PDFs and data -sets. Several of them map to `libnnpdf` objects. In that case they -have a `.load()` method that produces the corresponding `C++` -object. +sets. - `validphys.loader` Tools to obtain NNPDF resources locally or remotely. See :ref:`upload` @@ -40,8 +38,8 @@ theory predictions. - `validphys.gridvalues`, `validphys.bases`, `validphys.pdfgrids` These contain tools to evaluate PDFs over grids of points. -`validphys.gridvalues` contains low level functionality that uses -`libnnpdf`, `validphys.pdfbases` contain several different bases +`validphys.gridvalues` contains low level functionality that might use +`lhapdf`, `validphys.pdfbases` contain several different bases over PDF flavour space and functionality to manipulate them, and `validphys.pdfgrids` contains high level providers suitable for using for plotting and as an input to other computations. diff --git a/doc/sphinx/source/vp/index.rst b/doc/sphinx/source/vp/index.rst index 76973933fc..4b59b5397c 100644 --- a/doc/sphinx/source/vp/index.rst +++ b/doc/sphinx/source/vp/index.rst @@ -27,9 +27,6 @@ Introduction to ``validphys 2`` ``validphys`` can be found in the :ref:`Design ` section. -* Some parts of ``validphys`` use the ``libnnpdf`` library in C++, through SWIG - wrappers. - * The ideas behind the design of the code are explained in the :ref:`Design ` section. diff --git a/doc/sphinx/source/vp/pydataobjs.rst b/doc/sphinx/source/vp/pydataobjs.rst index 329cd92be8..ba8d5013d6 100644 --- a/doc/sphinx/source/vp/pydataobjs.rst +++ b/doc/sphinx/source/vp/pydataobjs.rst @@ -3,14 +3,9 @@ Python based data objects ========================= -Internal data formats such as PDF sets, CommonData, or :ref:`FKTables -` files are currently accessed through the `libnnpdf` C++ code -(interfaced trough the SWIG wrappers). However there is a :ref:`project -` underway -to make these resources available in terms of standard Python containers -(particularly numpy arrays and pandas dataframes). The objectives include -simplifying the codebase, increasing the ease of use and enabling more advanced -computation and storage strategies. +Internal data formats such as CommonData, or :ref:`FKTables +` are internally always numpy arrays or pandas dataframes. +PDF sets are a bit more complicated since they use ``lhapdf``. Loading FKTables ---------------- @@ -271,4 +266,4 @@ the gluon and the d-quark, at three values of ``x`` at ``Q=91.2``. pdf = API.pdf(pdf="NNPDF40_nnlo_as_01180") l_pdf = pdf.load() alpha_s = l_pdf.central_member.alphasQ(91.2) - results = l_pdf.grid_values([21,1], [0.1, 0.2, 0.3], [91.2]) \ No newline at end of file + results = l_pdf.grid_values([21,1], [0.1, 0.2, 0.3], [91.2]) diff --git a/validphys2/examples/mc_gen_report.md b/validphys2/examples/mc_gen_report.md index 9ecac362ae..636d1f9097 100644 --- a/validphys2/examples/mc_gen_report.md +++ b/validphys2/examples/mc_gen_report.md @@ -1,4 +1,7 @@ %CHORUSNB 100 replicas +Mean table +---------- +{@art_data_mean_table@} Data replica histograms ----------------------- {@art_data_comparison@} @@ -9,6 +12,3 @@ Residuals --------- {@art_data_residuals@} {@one_art_data_residuals@} -Mean table ----------- -{@art_data_mean_table@} diff --git a/validphys2/src/validphys/app.py b/validphys2/src/validphys/app.py index 9e9fe6da7d..88a45d51a0 100644 --- a/validphys2/src/validphys/app.py +++ b/validphys2/src/validphys/app.py @@ -131,9 +131,6 @@ def init(self): if self.args["loglevel"] <= logging.DEBUG: cout = True if not cout: - import NNPDF - - NNPDF.SetVerbosity(0) lhapdf.setVerbosity(0) @staticmethod @@ -147,11 +144,11 @@ def upload_context(do_upload, output): return contextlib.ExitStack() def run(self): - if sys.version_info < (3, 6): + if sys.version_info < (3, 9): log.warning( - "validphys 2 is discontinued on Python<3.6 and will " + "validphys 2 is discontinued on Python<3.9 and will " "not be longer updated. Please run\n" - "conda install python=3.6\n\n" + "conda install python=3.9\n\n" "If you have any problems, please open an issue " "on https://github.com/NNPDF/nnpdf/issues." ) diff --git a/validphys2/src/validphys/calcutils.py b/validphys2/src/validphys/calcutils.py index f020623b0b..22c6123f9a 100644 --- a/validphys2/src/validphys/calcutils.py +++ b/validphys2/src/validphys/calcutils.py @@ -59,8 +59,7 @@ def calc_chi2(sqrtcov, diffs): """ #Note la.cho_solve doesn't really improve things here #NOTE: Do not enable check_finite. The upper triangular part is not - #guaranteed to make any sense. If this causes a problem, it is a bug in - #libnnpdf. + #guaranteed to make any sense. vec = la.solve_triangular(sqrtcov, diffs, lower=True, check_finite=False) #This sums up the result for the chi² for any input shape. #Sum the squares over the first dimension and leave the others alone diff --git a/validphys2/src/validphys/closuretest/multiclosure_pseudodata.py b/validphys2/src/validphys/closuretest/multiclosure_pseudodata.py index c3382d3f1c..7ab2f53c84 100644 --- a/validphys2/src/validphys/closuretest/multiclosure_pseudodata.py +++ b/validphys2/src/validphys/closuretest/multiclosure_pseudodata.py @@ -22,10 +22,8 @@ @check_use_fitcommondata def fits_dataset_cvs(fits_dataset): """Internal function for loading the level one data for all fits - for a single dataset. This function avoids using the c++ loading of - commondata which is very slow and also avoids the stringent metadata + for a single dataset. This function avoids the stringent metadata checks of the newer python commondata parser. - """ fits_cv = [] for ds in fits_dataset: diff --git a/validphys2/src/validphys/commondataparser.py b/validphys2/src/validphys/commondataparser.py index 80d55c0390..10e84674a3 100644 --- a/validphys2/src/validphys/commondataparser.py +++ b/validphys2/src/validphys/commondataparser.py @@ -1,8 +1,6 @@ """ This module implements parsers for commondata and systype files into useful -datastructures, contained in the :py:mod:`validphys.coredata` module, which are -not backed by C++ managed memory, and so they can be easily pickled and -interfaced with common Python libraries. +datastructures, contained in the :py:mod:`validphys.coredata` module. The validphys commondata structure is an instance of :py:class:`validphys.coredata.CommonData` """ @@ -158,7 +156,7 @@ def get_plot_kinlabels(commondata): def get_kinlabel_key(process_label): """ - Since there is no 1:1 correspondence between latex keys and GetProc, + Since there is no 1:1 correspondence between latex keys and the old libNNPDF names we match the longest key such that the proc label starts with it. """ l = process_label diff --git a/validphys2/src/validphys/config.py b/validphys2/src/validphys/config.py index 806a85afb8..f82781536c 100644 --- a/validphys2/src/validphys/config.py +++ b/validphys2/src/validphys/config.py @@ -433,7 +433,6 @@ def parse_dataset_input(self, dataset: Mapping): raise ConfigError(f"'weight' must be a number, not '{weight}'") if weight < 0: raise ConfigError(f"'weight' must be greater than zero not '{weight}'") - # Value needs to be string to not break libnnpdf Experiment custom_group = str(dataset.get("custom_group", "unset")) kdiff = dataset.keys() - known_keys for k in kdiff: diff --git a/validphys2/src/validphys/core.py b/validphys2/src/validphys/core.py index 944056a6d7..17ce1b1563 100644 --- a/validphys2/src/validphys/core.py +++ b/validphys2/src/validphys/core.py @@ -1,7 +1,6 @@ # -*- coding: utf-8 -*- """ -Core datastructures used in the validphys data model. Some of these are inmutable -specifications representing C++ objects. +Core datastructures used in the validphys data model. Created on Wed Mar 9 15:19:52 2016 @author: Zahari Kassabov @@ -22,14 +21,6 @@ from reportengine.baseexceptions import AsInputError from reportengine.compat import yaml -from NNPDF import (LHAPDFSet as libNNPDF_LHAPDFSet, - CommonData as LegacyCommonData, - FKTable, - FKSet, - DataSet, - Experiment, - PositivitySet,) - #TODO: There is a bit of a circular dependency between filters.py and this. #Maybe move the cuts logic to its own module? from validphys import lhaindex, filters @@ -39,7 +30,6 @@ from validphys.utils import experiments_to_dataset_inputs from validphys.lhapdfset import LHAPDFSet from validphys.fkparser import load_fktable -from validphys.pineparser import pineappl_reader from validphys.commondataparser import (peek_commondata_metadata, get_plot_kinlabels, parse_commondata,) @@ -202,34 +192,6 @@ def __str__(self): def __len__(self): return self.info["NumMembers"] - def legacy_load(self): - """Returns an libNNPDF LHAPDFSet object - Deprecated function used only in the `filter.py` module - """ - error = self.error_type - cl = self.error_conf_level - et = None - if error == "replicas": - et = libNNPDF_LHAPDFSet.erType_ER_MC - elif error == "hessian": - if cl == 90: - et = libNNPDF_LHAPDFSet.erType_ER_EIG90 - elif cl == 68: - et = libNNPDF_LHAPDFSet.erType_ER_EIG - else: - raise NotImplementedError(f"No hessian errors with confidence interval {cl}") - elif error == "symmhessian": - if cl == 68: - et = libNNPDF_LHAPDFSet.erType_ER_SYMEIG - else: - raise NotImplementedError( - f"No symmetric hessian errors with confidence interval {cl}" - ) - else: - raise NotImplementedError(f"Error type for {self}: '{error}' is not implemented") - - return libNNPDF_LHAPDFSet(self.name, et) - def get_members(self): """Return the number of members selected in ``pdf.load().grid_values`` """ @@ -440,37 +402,6 @@ def __init__(self, *, name, commondata, fkspecs, thspec, cuts, frac, op, weight) @functools.lru_cache() - def load(self): - """Load the libNNPDF version of the dataset""" - cd = LegacyCommonData.ReadFile(str(self.commondata.datafile), str(self.commondata.sysfile)) - - fktables = [] - for p in self.fkspecs: - fktable = p.load() - #IMPORTANT: We need to tell the python garbage collector to NOT free the - #memory owned by the FKTable on garbage collection. - #TODO: Do this automatically - fktable.thisown = 0 - fktables.append(fktable) - - fkset = FKSet(FKSet.parseOperator(self.op), fktables) - - data = DataSet(cd, fkset, self.weight) - - - if self.cuts is not None: - #ugly need to convert from numpy.int64 to int, so we can pass - #it happily to the vector to the SWIG wrapper. - #Do not do this (or find how to enable in SWIG): - #data = DataSet(data, list(dataset.cuts)) - loaded_cuts = self.cuts.load() - #This is an optimization to avoid recomputing the dataset if - #nothing is discarded - if not (hasattr(loaded_cuts, '_full') and loaded_cuts._full): - intmask = [int(ele) for ele in loaded_cuts] - data = DataSet(data, intmask) - return data - def load_commondata(self): """Strips the commondata loading from `load`""" cd = self.commondata.load() @@ -514,8 +445,11 @@ def __init__(self, fkpath, cfactors, metadata=None): self.cfactors = cfactors if cfactors is not None else [] self.legacy = False - # NOTE: the legacy interface is expected to be removed by future releases of NNPDF - # so please don't write code that relies on it + + # NOTE: The legacy interface is currently used by fkparser to decide + # whether to read an FKTable using the old parser or the pineappl parser + # this attribute (and the difference between both) might be removed in future + # releases of NNPDF so please don't write code that relies on it if not isinstance(fkpath, (tuple, list)): self.legacy = True else: @@ -533,27 +467,14 @@ def __init__(self, fkpath, cfactors, metadata=None): else: super().__init__(fkpath, cfactors) - def _load_legacy(self): - return FKTable(str(self.fkpath), [str(factor) for factor in self.cfactors]) - - def _load_pineappl(self): - log.info("Reading: %s", self.fkpath) - return pineappl_reader(self) - def load_with_cuts(self, cuts): """Load the fktable and apply cuts inmediately. Returns a FKTableData""" return load_fktable(self).with_cuts(cuts) - def load(self): - if self.legacy: - return self._load_legacy() - return self._load_pineappl() - class LagrangeSetSpec(DataSetSpec): """Extends DataSetSpec to work around the particularities of the positivity, integrability and other Lagrange Multiplier datasets. - Internally (for libNNPDF) they are always PositivitySets """ def __init__(self, name, commondataspec, fkspec, maxlambda, thspec): @@ -579,12 +500,6 @@ def to_unweighted(self): def load_commondata(self): return self.commondata.load() - @functools.lru_cache() - def load(self): - cd = self.commondata.load() - fk = self.fkspecs[0].load() - return PositivitySet(cd, fk, self.maxlambda) - class PositivitySetSpec(LagrangeSetSpec): pass @@ -614,19 +529,10 @@ def __init__(self, name, datasets, dsinputs=None): #TODO: Can we do better cooperative inherece trick than this? namespaces.NSList.__init__(self, dsinputs, nskey='dataset_input') - @functools.lru_cache(maxsize=32) - def load(self): - sets = [] - for dataset in self.datasets: - loaded_data = dataset.load() - sets.append(loaded_data) - return Experiment(sets, self.name) - @functools.lru_cache(maxsize=32) def load_commondata(self): return [d.load_commondata() for d in self.datasets] - def load_commondata_instance(self): """ Given Experiment load list of validphys.coredata.CommonData @@ -853,7 +759,7 @@ def errorbarstd(self): class MCStats(Stats): """Result obtained from a Monte Carlo sample""" def std_error(self): - # ddof == 1 to match libNNPDF behaviour + # ddof == 1 to match legacy libNNPDF behaviour return np.std(self.error_members(), ddof=1, axis=0) def moment(self, order): diff --git a/validphys2/src/validphys/coredata.py b/validphys2/src/validphys/coredata.py index ce0a580cc4..bcdd939ffb 100644 --- a/validphys2/src/validphys/coredata.py +++ b/validphys2/src/validphys/coredata.py @@ -1,8 +1,6 @@ """ Data containers backed by Python managed memory (Numpy arrays and Pandas -dataframes). This module is intended to substitute large parts of the C++ -wrappers. - +dataframes). """ import dataclasses import numpy as np diff --git a/validphys2/src/validphys/dataplots.py b/validphys2/src/validphys/dataplots.py index 7644756497..2af9f72ad4 100644 --- a/validphys2/src/validphys/dataplots.py +++ b/validphys2/src/validphys/dataplots.py @@ -196,8 +196,8 @@ def check_normalize_to(ns, **kwargs): raise RuntimeError("Should not be here") -#TODO: This interface is horrible. We need to think how to adapt libnnpdf -#to make this use case easier +#TODO: This interface is horrible. +# We need to think how to adapt it to make this use case easier def _plot_fancy_impl(results, commondata, cutlist, normalize_to:(int,type(None)) = None, labellist=None): @@ -787,31 +787,6 @@ def plot_trainvaliddist(fit, replica_data): ax.legend() return fig -@figure -def plot_covmat_eigs(data): - """Plot the eigenvalues of the covariance matrix for a given group of datasets.""" - eigs = la.eigvalsh(data.load().get_covmat()) - fig,ax = plt.subplots() - x = np.arange(1,len(eigs) + 1) - ax.plot(x, eigs, 'o', markersize=10) - ax.set_yscale('log') - ax.yaxis.grid(False) - plt.title("Covmat eigenvalues for %s" % data.name) - plt.xlabel("# Eigenvector") - return fig - -@figure -def plot_corrmat_eigs(data): - """Plot the eigenvalues of the correlation matrix for a given group of datasets.""" - covmat = data.load().get_covmat() - stds = np.sqrt(np.diag(covmat)) - corrmat = covmat/np.outer(stds,stds) - eigs = la.eigvalsh(corrmat) - fig,ax = plt.subplots() - ax.plot(eigs, 'o') - ax.set_yscale('log') - return fig - @figure def plot_chi2_eigs(pdf,dataset,chi2_per_eig): diff --git a/validphys2/src/validphys/fkparser.py b/validphys2/src/validphys/fkparser.py index 8213a994f0..9efceb6a7a 100644 --- a/validphys2/src/validphys/fkparser.py +++ b/validphys2/src/validphys/fkparser.py @@ -1,10 +1,7 @@ """ This module implements parsers for FKtable and CFactor files into useful -datastructures, contained in the :py:mod:`validphys.coredata` module, which are -not backed by C++ managed memory, and so they can be easily pickled and -interfaces with common Python libraries. The integration of these objects into -the codebase is currently work in progress, and at the moment this module -serves as a proof of concept. +datastructures, contained in the :py:mod:`validphys.coredata` module, which can +be easily pickled and interfaced with common Python libraries. Most users will be interested in using the high level interface :py:func:`load_fktable`. Given a :py:class:`validphys.core.FKTableSpec` @@ -29,8 +26,7 @@ import pandas as pd from validphys.coredata import FKTableData, CFactorData - - +from validphys.pineparser import pineappl_reader class BadCFactorError(Exception): @@ -60,7 +56,7 @@ def load_fktable(spec): with open_fkpath(spec.fkpath) as handle: tabledata = parse_fktable(handle) else: - tabledata = spec.load() + tabledata = pineappl_reader(spec) if not spec.cfactors: return tabledata diff --git a/validphys2/src/validphys/kinematics.py b/validphys2/src/validphys/kinematics.py index 62e02b3452..316cf8d48e 100644 --- a/validphys2/src/validphys/kinematics.py +++ b/validphys2/src/validphys/kinematics.py @@ -31,7 +31,7 @@ def describe_kinematics(commondata, titlelevel:int=1): import inspect cd = commondata info = plotoptions.get_info(cd) - proc = cd.load().GetProc(0) + proc = cd.load_commondata().commondataproc src = inspect.getsource(info.kinematics_override.xq2map) titlespec = '#'*titlelevel return (f""" diff --git a/validphys2/src/validphys/lhapdfset.py b/validphys2/src/validphys/lhapdfset.py index 0ab1f0f866..7c777df893 100644 --- a/validphys2/src/validphys/lhapdfset.py +++ b/validphys2/src/validphys/lhapdfset.py @@ -2,9 +2,6 @@ Module containing an LHAPDF class compatible with validphys using the official lhapdf python interface. - It exposes an interface mostly compatible with libNNPDF::LHAPDFSet - so it can be used as a drop-in replacement. - The ``.members`` and ``.central_member`` of the ``LHAPDFSet`` are LHAPDF objects (the typical output from ``mkPDFs``) and can be used normally. diff --git a/validphys2/src/validphys/loader.py b/validphys2/src/validphys/loader.py index d9a86bb092..3807549b56 100644 --- a/validphys2/src/validphys/loader.py +++ b/validphys2/src/validphys/loader.py @@ -202,7 +202,7 @@ def rebuild_commondata_without_cuts( newfile.write(line[:m.end()]) #And value, stat, *sys that we want to drop #Do not use string join to keep up with the ugly format - #This should really be nan's, but the c++ streams that read this + #This should really be nan's, but the c++ streams that could read this #do not have the right interface. #https://stackoverflow.com/questions/11420263/is-it-possible-to-read-infinity-or-nan-values-using-input-streams zeros = '-0\t'*(2 + 2*nsys) diff --git a/validphys2/src/validphys/mc_gen.py b/validphys2/src/validphys/mc_gen.py index bf94b5b478..aa44e89902 100644 --- a/validphys2/src/validphys/mc_gen.py +++ b/validphys2/src/validphys/mc_gen.py @@ -5,9 +5,7 @@ Tools to check the pseudo-data MC generation. """ # The functions in this module have been ported to not use libNNPDF -# but is still using it under the hood -# it has been a direct port of the libnnpdf dependent structure -# so they should not be used as an example +# but they should not be used as an example as they follow the libNNPDF logic import logging import matplotlib.patches as mpatches import matplotlib.pyplot as plt @@ -26,8 +24,9 @@ def art_rep_generation(groups_data, make_replicas): real_data_list = [] for group in groups_data: - real_group = group.load() - real_data = real_group.get_cv() + # Load all the commondata + real_group = group.load_commondata() + real_data = np.concatenate([i.get_cv() for i in real_group]) real_data_list.append(real_data) real_data = np.concatenate(real_data_list) @@ -165,9 +164,8 @@ def one_art_data_residuals(groups_data, indexed_make_replicas): all_normresidual = [] for group in groups_data: - - real_group = group.load() - real_data = real_group.get_cv() + real_group = group.load_commondata() + real_data = np.concatenate([i.get_cv() for i in real_group]) one_art_data = all_replicas[group_level == group.name].iloc[one_data_index] residual = one_art_data - real_data[one_data_index] @@ -192,8 +190,7 @@ def art_data_mean_table(art_rep_generation, groups_data): data = [] for group in groups_data: for dataset in group.datasets: - ds = dataset.load() - Ndata = ds.GetNData() + Ndata = dataset.load_commondata().ndata for i in range(Ndata): line = [ dataset.name, diff --git a/validphys2/src/validphys/n3fit_data.py b/validphys2/src/validphys/n3fit_data.py index e057a36a5b..75e5efdc12 100644 --- a/validphys2/src/validphys/n3fit_data.py +++ b/validphys2/src/validphys/n3fit_data.py @@ -2,9 +2,7 @@ n3fit_data.py Providers which prepare the data ready for -:py:func:`n3fit.performfit.performfit`. Returns python objects but the underlying -functions make calls to libnnpdf C++ library. - +:py:func:`n3fit.performfit.performfit`. """ import functools from collections import defaultdict diff --git a/validphys2/src/validphys/n3fit_data_utils.py b/validphys2/src/validphys/n3fit_data_utils.py index 3b54e29bd1..5743439946 100644 --- a/validphys2/src/validphys/n3fit_data_utils.py +++ b/validphys2/src/validphys/n3fit_data_utils.py @@ -15,8 +15,7 @@ @dataclasses.dataclass class FittableDataSet: """ - Python version of the libNNPDF dataset - to be merged with the product of the new CommonData dataset + Representation of the DataSet information necessary to run a fit Parameters ---------- @@ -32,12 +31,6 @@ class FittableDataSet: training_mask: bool training mask to apply to the fktable """ - - # NOTE: - # This class tries to be compatible with the libNNPDF dataset class - # after commondata is also in python, FittableDataSet can inherit from the vp dataset - # which knows how to generate its "fittable" version. - name: str fktables_data: list # of validphys.coredata.FKTableData objects diff --git a/validphys2/src/validphys/plotoptions/core.py b/validphys2/src/validphys/plotoptions/core.py index f97fee6fac..45caa9fb18 100644 --- a/validphys2/src/validphys/plotoptions/core.py +++ b/validphys2/src/validphys/plotoptions/core.py @@ -19,7 +19,6 @@ from reportengine.compat import yaml from reportengine.utils import get_functions, ChainMap -from NNPDF import DataSet from validphys.core import CommonDataSpec, DataSetSpec, Cuts, InternalCutsWrapper from validphys.coredata import CommonData from validphys.plotoptions.utils import apply_to_all_columns, get_subclasses @@ -306,10 +305,14 @@ def kitable(data, info, *, cuts=None): table: pd.DataFrame A DataFrame containing the kinematics for all points after cuts. """ - if isinstance(data, (DataSet, DataSetSpec)) and cuts is not None: + if isinstance(data, (DataSetSpec)) and cuts is not None: raise TypeError("Cuts must be None when a dataset is given") - if isinstance(data, (DataSetSpec, CommonDataSpec)): - data = data.load() + + if isinstance(data, DataSetSpec): + data = data.load_commondata() + elif isinstance(data, CommonDataSpec): + data = data.load() + table = pd.DataFrame(data.get_kintable(), columns=default_labels[1:]) if isinstance(data, CommonData) and cuts is not None: table = table.loc[cuts.load()] diff --git a/validphys2/src/validphys/results.py b/validphys2/src/validphys/results.py index ad76c043ad..4bc74c5639 100644 --- a/validphys2/src/validphys/results.py +++ b/validphys2/src/validphys/results.py @@ -14,7 +14,6 @@ import pandas as pd import scipy.linalg as la -from NNPDF import CommonData from reportengine.checks import require_one, remove_outer, check_not_empty from reportengine.table import table from reportengine import collect @@ -78,7 +77,6 @@ class DataResult(StatsResult): """Holds the relevant information from a given dataset""" def __init__(self, dataset, covmat, sqrtcovmat): - # The commondata is currently a libNNPDF object loaded_cd = dataset.load_commondata() if isinstance(loaded_cd, list): cv = np.concatenate([cd.get_cv() for cd in loaded_cd]) @@ -465,9 +463,9 @@ def results(dataset: (DataSetSpec), pdf: PDF, covariance_matrix, sqrt_covmat): is constructed from scale variation. The inclusion of this covariance matrix by default is used where available, however this behaviour can be modified with the flag `use_theorycovmat`. - The theory is specified as part of the dataset. + The theory is specified as part of the dataset (a remnant of the old C++ layout) A group of datasets is also allowed. - (as a result of the C++ code layout).""" + """ return ( DataResult(dataset, covariance_matrix, sqrt_covmat), ThPredictionsResult.from_convolution(pdf, dataset), @@ -673,36 +671,6 @@ def procs_chi2_table( groups_each_dataset_chi2_by_process, ) -#procs_chi2_table = collect("groups_chi2_table", ("group_dataset_inputs_by_process",)) - -@check_cuts_considered -@table -def closure_shifts(experiments_index, fit, use_cuts, experiments): - """Save the differenve between the fitted data and the real commondata - values. - - Actually shifts is what should be saved in the first place, rather than - thi confusing fiddling with Commondata, but until we can implement this at - the C++ level, we just dave it here. - """ - name, fitpath = fit - result = np.zeros(len(experiments_index)) - for experiment in experiments: - for dataset in experiment: - dspath = fitpath / "filter" / dataset.name - cdpath = dspath / ("DATA_" + dataset.name + ".dat") - try: - syspath = next((dspath / "systypes").glob("*.dat")) - except StopIteration as e: - raise FileNotFoundError( - "No systype " - "file found in filter folder %s" % (dspath / "systypes") - ) from e - cd = CommonData.ReadFile(str(cdpath), str(syspath)) - loc = experiments_index.get_loc((experiment.name, dataset.name)) - result[loc] = cd.get_cv() - dataset.load().get_cv() - return pd.DataFrame(result, index=experiments_index) - def positivity_predictions_data_result(pdf, posdataset): """Return an object containing the values of the positivuty observable.""" diff --git a/validphys2/src/validphys/sumrules.py b/validphys2/src/validphys/sumrules.py index 4e5ee3ee5e..c8f15ffb4a 100644 --- a/validphys2/src/validphys/sumrules.py +++ b/validphys2/src/validphys/sumrules.py @@ -145,9 +145,9 @@ def _sum_rules(rules_dict, lpdf, Q): @check_positive('Q') def sum_rules(pdf:PDF, Q:numbers.Real): """Compute the momentum, uvalence, dvalence, svalence and cvalence sum rules for - each member (as defined by libnnpdf), at the energy scale ``Q``. Return a - SumRulesGrid object with the list of values for each sum rule. The - integration is performed with absolute and relative tolerance of 1e-4.""" + each member, at the energy scale ``Q``. + Return a SumRulesGrid object with the list of values for each sum rule. + The integration is performed with absolute and relative tolerance of 1e-4.""" lpdf = pdf.load() return _sum_rules(KNOWN_SUM_RULES, lpdf, Q) diff --git a/validphys2/src/validphys/tests/test_covmats.py b/validphys2/src/validphys/tests/test_covmats.py index d7ccae5777..14c6487728 100644 --- a/validphys2/src/validphys/tests/test_covmats.py +++ b/validphys2/src/validphys/tests/test_covmats.py @@ -49,68 +49,37 @@ def test_self_consistent_covmat_from_systematics(data_internal_cuts_config): @pytest.mark.parametrize("dataset_inputs", [DATA, CORR_DATA]) def test_covmat_from_systematics(data_config, use_cuts, dataset_inputs): """Test which checks the python computation of the covmat relating to a - collection of datasets matches that of the C++ computation. + collection of datasets from dataset_inputs matches the direct call to groups_covmat Tests all combinations of internal/no cuts and correlated/uncorrelated data. - """ config = dict(data_config) config["use_cuts"] = use_cuts config["dataset_inputs"] = dataset_inputs covmat = API.dataset_inputs_covmat_from_systematics(**config) - cpp_covmat = API.groups_covmat(**config) + another_covmat = API.groups_covmat(**config) - np.testing.assert_allclose(cpp_covmat, covmat) + np.testing.assert_allclose(another_covmat, covmat) def test_covmat_with_one_systematic(): - """Test that a dataset with 1 systematic successfully builds covmat, and - that it agrees with cpp code. This special case can break the covmat - construction in python because of pandas indexing. - + """Test that a dataset with 1 systematic successfully builds covmat. + This special case can break the covmat construction in python because of pandas indexing. """ dsinput = {"dataset": "D0ZRAP", "frac": 1.0, "cfac": ["QCD"]} config = dict(dataset_input=dsinput, theoryid=THEORYID, use_cuts="nocuts") - covmat = API.covmat_from_systematics(**config) ds = API.dataset(**config) # double check that the dataset does indeed only have 1 systematic. assert ds.commondata.nsys == 1 - cpp_covmat = ds.load().get_covmat() - - np.testing.assert_allclose(cpp_covmat, covmat) - - -def test_cpp_sqrtcovmat(): - """Test that the sqrt of the covariance matrix is computed correctly for a - random sample of 10 datasets. This uses the get methods of a loaded dataset - which currently call the C++ code. This therefore currently tests the - computation of the sqrt of the covariance matrix in the C++ code. In time - the get_sqrtcovmat method should call the python code, in which case this - test can be merged with :py:func:`test_sqrt_covmat`. - """ - l = Loader() - # Only test 10 datasets to avoid test taking too long - datasets = random.sample(l.available_datasets, 10) - cuts = (None, "internal") - - for ds_name in datasets: - try: - for cut in cuts: - ds = l.check_dataset(ds_name, theoryid=THEORYID, cuts=cut) - ds_ld = ds.load() - sqrt_cov = ds_ld.get_sqrtcovmat() - assert np.allclose(sqrt_cov @ sqrt_cov.T, ds_ld.get_covmat()) - except FileNotFoundError: - continue + # Test the covmat can be constructed + _ = API.covmat_from_systematics(**config) def test_sqrt_covmat(data_config): - """In contrast to :py:func:`test_cpp_sqrtcovmat` this tests the python - implementation of the sqrt of the covariance matrix, namely + """Tests the python implementation of the sqrt of the covariance matrix, namely :py:func:`validphys.covmats.sqrt_covmat`. - """ rectangular_covmat = np.random.randint(10, size=(4, 5)) @@ -119,24 +88,25 @@ def test_sqrt_covmat(data_config): # rectangular covmat matrix sqrt_covmat(rectangular_covmat) + with pytest.raises(ValueError): # Check whether an empty covmat input raises # a ValueError sqrt_covmat(np.array([])) - exps = API.experiments_data(**data_config) + config = dict(data_config) + config["dataset_inputs"] = CORR_DATA + covmat = API.dataset_inputs_covmat_from_systematics(**config) + + cholesky_cov = sqrt_covmat(covmat) + np.testing.assert_allclose(cholesky_cov @ cholesky_cov.T, covmat) - for exp in exps: - ld_exp = exp.load() - covmat = ld_exp.get_covmat() - cholesky_cov = sqrt_covmat(covmat) - np.testing.assert_allclose(cholesky_cov @ cholesky_cov.T, covmat) @pytest.mark.parametrize("t0pdfset", [PDF, HESSIAN_PDF]) @pytest.mark.parametrize("dataset_inputs", [DATA, CORR_DATA]) -def test_python_t0_covmat_matches_cpp( +def test_python_t0_covmat_matches_variations( data_internal_cuts_config, t0pdfset, dataset_inputs): """Test which checks the python computation of the t0 covmat relating to a - collection of datasets matches that of the C++ computation. + collection of datasets Tests all combinations of hessian/MC t0pdfset and correlated/uncorrelated data. @@ -147,9 +117,9 @@ def test_python_t0_covmat_matches_cpp( config["t0pdfset"] = t0pdfset config["use_t0"] = True covmat = API.dataset_inputs_t0_covmat_from_systematics(**config) - cpp_covmat = API.groups_covmat(**config) + another_covmat = API.groups_covmat(**config) # use allclose defaults or it fails - np.testing.assert_allclose(cpp_covmat, covmat, rtol=1e-05, atol=1e-08) + np.testing.assert_allclose(another_covmat, covmat, rtol=1e-05, atol=1e-08) with pytest.raises(AssertionError): np.testing.assert_allclose( covmat, API.dataset_inputs_covmat_from_systematics(**config) @@ -161,7 +131,7 @@ def test_python_t0_covmat_matches_cpp( def test_systematic_matrix( data_config, use_cuts, dataset_input): """Test which checks the python computation of the t0 covmat relating to a - collection of datasets matches that of the C++ computation. + collection of datasets is equivalent using different functions Tests all combinations of hessian/MC t0pdfset and correlated/uncorrelated data. diff --git a/validphys2/src/validphys/tests/test_weights.py b/validphys2/src/validphys/tests/test_weights.py index 134cf47e2f..2b6b041fcd 100644 --- a/validphys2/src/validphys/tests/test_weights.py +++ b/validphys2/src/validphys/tests/test_weights.py @@ -40,10 +40,8 @@ def test_disable_weights(weighted_data_witht0_internal_cuts_config): assert np.allclose(weighted / unweighted, (100 + 1) / (1 + 1)) def test_python_weights(weighted_data_witht0_config): - """Test python implementation of weighted covmats is constent with - libnnpdf and that ``use_weights_in_covmat`` is working correctly in - python interface. - + """Test python implementation of weighted covmats + and that ``use_weights_in_covmat`` is working """ weighted_data_witht0_config = dict(weighted_data_witht0_config) weighted_data_witht0_config["use_cuts"] = "internal" @@ -56,7 +54,7 @@ def test_python_weights(weighted_data_witht0_config): np.testing.assert_allclose(cov, py_cov, rtol=1e-05, atol=1e-08) - # now test without weights - assumes that libnnpdf tests pass. + # now test without weights unweighted = API.dataset_inputs_covariance_matrix( **weighted_data_witht0_config, use_weights_in_covmat=False, )