Skip to content

add experiment_result_table with confidence levels#447

Merged
Zaharid merged 5 commits into
masterfrom
exp_res_tab
May 21, 2019
Merged

add experiment_result_table with confidence levels#447
Zaharid merged 5 commits into
masterfrom
exp_res_tab

Conversation

@wilsonmr
Copy link
Copy Markdown
Contributor

@wilsonmr wilsonmr commented May 3, 2019

I have added a table which takes the experiment_result_table but gives confidence levels instead of the full replica data.

It's a bit more concise and hopefully closes #446

@mariaubiali I tested with the runcard you posted on the issue and I think this does what you want, but please give this a test. The new function is experiment_result_table_68cl and should work otherwise with your current runcard:

fit: pheno
use_cuts: "fromfit"

experiments:
    - experiment: ATLAS
      datasets:
          - dataset: ATLASWZTOT13TEV81PB

dataspecs:
    - theoryid: 163
      pdf: 181126-si-nlo-central_DISonly

    - theoryid: 163
      pdf: 190326-ern-nlo-DIS

template_text: |
    % (exp) vs (exp+th) comparison for
    {@dataspecs experiment_result_table_68cl@}

actions_:
  - report(main=True)

Once again I'm not very good at deciding a name for the function, and perhaps the column headers could be improved, feedback welcome

Example: https://vp.nnpdf.science/MRHIuorMQu6cpoDa4_RE6A==/

@wilsonmr wilsonmr requested review from Zaharid and mariaubiali May 3, 2019 13:20
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 3, 2019

@siranipour Please have a look at how this is done. Should help out with some of the questions you had.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 3, 2019

@wilsonmr I think this should use results objects which work also for PDFs that are not monte carlo.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 3, 2019

i.e. we have something that gives you the 68 percent intervals no matter what your replicas mean.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 3, 2019

Also I wouldn't refer to the other table in the documentation. This is useful enough that the doc should explain what it does on its own.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 3, 2019

Btw, I am looking at the various *Results classes and they now look pretty poorly thought to me. I think that the original motivation was that I wanted to use the c++ code as much as possible (so it could be tested) and that why there are other crazy looking things apart from StatsResult, which was done later for sanity reasons. I am saying this all because I am trying to decide if constructing a stats object from the data in the table is guaranteed to work (I think it is). I was going to suggest starting from the inputs of the other table rather than the table itself, but that is a bit crazy as well. In particular it correctly says:

#TODO: Use collect to calculate results outside this
def experiment_result_table_no_table(experiments, pdf, experiments_index):

which was probably written before collect existed. So there are any things to improve in general. But I am inclined to merge this as is. Can you please see if this works reasonably for some hessian sets?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 7, 2019

@mariaubiali could you please see of this looks good to you?

@wilsonmr please test this for hessian sets (e.g compare the hessian version of 3.1 with the replica based one).

@mariaubiali
Copy link
Copy Markdown
Contributor

Hi @Zaharid @wilsonmr thank you for this!
What do I have to do to try it? Is it enough to updated nnpdf in my conda environment or do I have to checkout a particular branch?
Thanks

@mariaubiali
Copy link
Copy Markdown
Contributor

Do I have to merge the pull request?

@siranipour
Copy link
Copy Markdown
Contributor

Hi @mariaubiali you need to git checkout this branch, it should then work straight away

@mariaubiali
Copy link
Copy Markdown
Contributor

@wilsonmr sorry but I am not familiar with all this.
At the moment I am working on a conda environment with libnnpdf installed (not a developer one).
Does it mean that I have to install a developer environment as documented here
https://data.nnpdf.science/validphys-docs/guide.html#development-installs
go into the nnpdf directory and do git checkout name_branch, correct?
Or is there a simple way? I guess that if someone approves the pull request than I can just update nnpdf in conda and I could use the function that you have added straight away, right?
Sorry about all these questions...

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented May 8, 2019

Hi Maria,

Both you and @siranipour are correct:

You will need to use this branch to have access to this function. In order to use this branch you will have to set up a development environment as per the instructions. Fortunately it should be relatively easy and not take too long - however if you are using a newer version of OSX then there is a weird dependency of the conda compilers For info see #323

You need to download this SDK, unpack it to some location like ~/local/macosxsdk/MacOSX10.9.sdk and then make sure conda is pointing at it with the environment variable CONDA_BUILD_SYSROOT so for location above: export CONDA_BUILD_SYSROOT=/Users/michael/local/macosxsdk/MacOSX10.9.sdk which one can place in their bash profile or otherwise).

Since I will be producing an example of this working for a hessian set, then perhaps between that and the example I posted above you can say whether this has the format you had in mind, or if you want the information presented differently and provided you're happy we can merge the PR and then you'll be able to use it with the standard conda installation of the package if you don't want to mess around with the SDK.

Probably when @siranipour and myself look at #443 then we should add explicit instructions for how to install the SDK and even if the conda package doesn't work then perhaps a download script and keeping the file on the server would be a slightly better solution @Zaharid ?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 8, 2019

Yeah, ideally we should have a painless way to set up a dev environment for NNPDF by now, also on mac.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented May 9, 2019

Sorry for delay, here is another example, this time using hessian sets:

https://vp.nnpdf.science/mLVCDXuNSNmvfeW2CdPaGw==/

provided you are fine with the table @mariaubiali then I think this is good to merge

Copy link
Copy Markdown
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am not sure this is working as it should. AFAICT the nnpdf hessian results should be exactly symmetric wrt the central values (but not the cteq ones). There may be an off by one error somewhere.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019

E.g. see the following:

In [1]: %matplotlib agg                                                                                               

In [2]: from validphys.app import API                                                                                 

In [3]:  API.results(dataset_input={'dataset': 'ATLASWZTOT13TEV81PB'}, pdf='NNPDF31_nnlo_as_0118_hessian', use_cuts='n
   ...: ocuts', theoryid=53)                                                                                          

-- Reading COMMONDATA for Dataset: ATLASWZTOT13TEV81PB
nData: 3 nSys: 2
-- COMMONDATA Files for ATLASWZTOT13TEV81PB successfully read.

_ATLASWZTOT13TEV81PB________________________________________
--------------------------------------
ATLAS TOTAL W and Z cross sections 13 Tev
--------------------------------------
3 Data Points 20 X points 81 active flavours
PDF: NNPDF31_nnlo_as_0118_hessian  ErrorType: Symmetric eigenvector booked
LHAPDF 6.2.1 loading all 101 PDFs in set NNPDF31_nnlo_as_0118_hessian
NNPDF31_nnlo_as_0118_hessian, version 1; 101 PDF members
NNPDF31_nnlo_as_0118_hessian Initialised with 101 members and errorType symmhessian
Out[3]: 
(<validphys.results.DataResult at 0x7f5172d4fa58>,
 <validphys.results.ThPredictionsResult at 0x7f51513bbef0>)

In [4]: dt, th = _                                                                                                    

In [6]: th.central_value                                                                                              
Out[6]: array([360864.88, 469922.5 ,  79119.61], dtype=float32)

In [7]: th._rawdata                                                                                                   
Out[7]: 
array([[360864.88 , 360686.7  , 360701.2  , 360498.   , 360852.03 ,
        361041.8  , 360866.44 , 361385.34 , 361134.66 , 360413.25 ,
        360545.72 , 360685.56 , 361181.78 , 360592.44 , 359985.22 ,
        361244.53 , 360934.03 , 360353.88 , 360696.2  , 360589.44 ,
        360834.38 , 361574.88 , 360954.4  , 360723.03 , 362184.4  ,
        361203.25 , 359886.03 , 360850.4  , 361158.6  , 359827.12 ,
        360924.5  , 360927.66 , 360634.7  , 360976.4  , 360676.03 ,
        360625.88 , 361209.25 , 360679.7  , 360901.66 , 360774.   ,
        360968.3  , 360864.12 , 360871.06 , 360761.4  , 360994.5  ,
        360901.75 , 360981.38 , 360828.06 , 361088.2  , 360938.84 ,
        360694.97 , 360699.8  , 360863.   , 360824.12 , 360811.47 ,
        361004.   , 360948.7  , 360873.6  , 360849.3  , 360689.5  ,
        360852.3  , 360950.88 , 360854.97 , 360978.88 , 360741.94 ,
        360900.2  , 360671.8  , 360925.5  , 360873.47 , 360764.2  ,
        361072.03 , 360899.7  , 360836.38 , 360947.1  , 360939.22 ,
        360929.2  , 361008.1  , 360950.38 , 360764.25 , 360831.47 ,
        360845.38 , 360903.8  , 360824.12 , 360907.78 , 360859.38 ,
        360733.5  , 360830.8  , 360843.38 , 360766.94 , 360918.75 ,
        360965.34 , 360732.03 , 360834.53 , 360915.5  , 360894.9  ,
        360827.56 , 360968.97 , 360687.88 , 360886.25 , 360945.62 ,
        360784.72 ],
       [469922.5  , 469675.44 , 469445.47 , 469348.62 , 470025.2  ,
        470130.75 , 469680.3  , 470574.2  , 470000.9  , 469239.8  ,
        469500.16 , 469536.2  , 470429.3  , 469607.4  , 468478.44 ,
        470658.56 , 469851.6  , 469090.7  , 469755.2  , 469471.1  ,
        469897.38 , 470834.94 , 469999.06 , 469916.8  , 471705.97 ,
        470265.6  , 468753.9  , 469753.   , 470329.1  , 468763.56 ,
        469913.88 , 470085.38 , 469907.7  , 469810.88 , 469589.75 ,
        469908.25 , 470463.2  , 469940.7  , 470011.2  , 469932.28 ,
        470002.2  , 470000.   , 470073.84 , 469824.7  , 469971.3  ,
        470098.8  , 470138.2  , 469898.7  , 470241.88 , 470030.44 ,
        469740.66 , 469793.75 , 469941.12 , 469888.94 , 469845.2  ,
        470044.3  , 470008.75 , 469922.4  , 469942.62 , 469836.62 ,
        469839.56 , 469911.38 , 469882.06 , 469984.38 , 469690.25 ,
        469934.03 , 469704.34 , 469970.7  , 469903.47 , 469860.53 ,
        470141.97 , 470012.44 , 469932.8  , 470018.44 , 470011.8  ,
        469959.12 , 470096.75 , 470009.34 , 469833.22 , 469869.25 ,
        469895.56 , 469957.62 , 469821.72 , 469928.7  , 469968.97 ,
        469736.12 , 469958.9  , 469920.75 , 469845.62 , 470041.88 ,
        469953.3  , 469781.2  , 469977.38 , 469923.56 , 469966.   ,
        469942.44 , 470118.2  , 469679.06 , 470026.5  , 470042.9  ,
        469872.66 ],
       [ 79119.61 ,  79080.81 ,  79139.95 ,  79072.06 ,  78979.56 ,
         79160.41 ,  79203.89 ,  79216.74 ,  79247.46 ,  78933.94 ,
         79182.625,  79145.625,  79076.55 ,  79055.45 ,  78787.125,
         79103.47 ,  79020.11 ,  79060.92 ,  79104.97 ,  79112.016,
         79098.89 ,  79254.336,  79112.16 ,  79176.914,  79426.83 ,
         79208.305,  78876.63 ,  79088.16 ,  79206.19 ,  78883.33 ,
         79163.61 ,  79138.78 ,  79084.43 ,  79140.82 ,  79047.12 ,
         79075.4  ,  79223.79 ,  79090.55 ,  79141.77 ,  79079.734,
         79140.97 ,  79143.97 ,  79129.11 ,  79087.91 ,  79174.78 ,
         79191.66 ,  79166.016,  79081.36 ,  79202.57 ,  79127.49 ,
         79126.69 ,  79097.125,  79094.75 ,  79113.234,  79089.445,
         79137.58 ,  79132.734,  79146.44 ,  79150.02 ,  79090.06 ,
         79118.84 ,  79090.39 ,  79125.016,  79130.79 ,  79070.94 ,
         79151.08 ,  79070.03 ,  79116.67 ,  79129.414,  79103.48 ,
         79165.31 ,  79079.06 ,  79127.305,  79145.14 ,  79122.26 ,
         79118.375,  79157.914,  79108.41 ,  79096.836,  79137.02 ,
         79112.984,  79130.71 ,  79115.82 ,  79121.48 ,  79112.71 ,
         79091.61 ,  79113.76 ,  79117.7  ,  79106.15 ,  79129.16 ,
         79139.44 ,  79100.96 ,  79112.32 ,  79123.016,  79122.836,
         79115.34 ,  79139.22 ,  79106.984,  79119.56 ,  79135.33 ,
         79106.12 ]], dtype=float32)

In [8]: th._rawdata[:,0]                                                                                                                                                                                                                      
Out[8]: array([360864.88, 469922.5 ,  79119.61], dtype=float32)

In [9]: th.central_value                                                                                                                                                                                                                      
Out[9]: array([360864.88, 469922.5 ,  79119.61], dtype=float32)

In [10]: th._rawdata[:,0] == th.central_value                                                                                                                                                                                                 
Out[10]: array([ True,  True,  True])

In [11]: pdf = API.pdf(pdf='NNPDF31_nnlo_as_0118_hessian')                                                                                                                                                                                    

In [12]: pdf.stats_class(th._rawdata)                                                                                                                                                                                                         
Out[12]: <validphys.core.SymmHessianStats at 0x7f5193f50c50>


In [15]: stats = pdf.stats_class(th._rawdata.T)                                                                                                                                                                                               

In [16]: stats.central_value()                                                                                                                                                                                                                
Out[16]: array([360864.88, 469922.5 ,  79119.61], dtype=float32)

In [17]: th._rawdata[:,0]                                                                                                                                                                                                                     
Out[17]: array([360864.88, 469922.5 ,  79119.61], dtype=float32)

In [18]: stats.errorbar68()                                                                                                                                                                                                                   
Out[18]: 
(array([358080.97, 466160.75,  78382.3 ], dtype=float32),
 array([363648.78, 473684.25,  79856.92], dtype=float32))


In [20]: import numpy as np                                                                                                                                                                                                                   



In [26]: np.c_[stats.errorbar68()].T - stats.central_value()                                                                                                                                                                                  
Out[26]: 
array([[-2783.9062, -3761.75  ,  -737.3125],
       [ 2783.9062,  3761.75  ,   737.3125]], dtype=float32)

Note that the differences are exactly symmetric, which is not the case for the table you showed.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019

Uh, this is also broken in the current master. We should add tests for these tables:

~/nngit/nnpdf/validphys2/src/validphys/results.py in experiment_result_table_no_table(experiments, pdf, experiments_index)
    182 
    183 
--> 184         data_result = DataResult(loaded_exp)
    185         th_result = ThPredictionsResult.from_convolution(pdf, experiment,
    186                                                          loaded_data=loaded_exp)

TypeError: __init__() missing 2 required positional arguments: 'covmat' and 'sqrtcovmat'

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019

The table should really use collect rather than reimplement the functionality poorly.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented May 9, 2019

I agree that the function being broken wrt master is a problem - I will look at this, however I don't see your symmetry thing? are you for example talking about table 2, row 2:


4.530E+5 | 4.493E+5 | 4.568E+5

difference being lower: 0.037 and upper: 0.038 ?

When I print the number exactly before reportengine formats them I see

453038.06250  449262.415538  456813.709462

which gives symmetric differences of 3775.646962. I think we can see that it's just the way the numbers are rounded..

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019 via email

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019

(I was going to check that but I wanted the API thing and then found out it was not working with master)

@mariaubiali
Copy link
Copy Markdown
Contributor

Hi @Zaharid @wilsonmr

I really wanted to try out the new functionalities implemented by Michael. To this purpose I tried to install the nnpdf-dev environment in conda on my Linux machine.
I followed the instructions and everything works fine

'$ conda create -n nnpdf-dev
$ conda activate nnpdf-dev
$ conda install --only-deps nnpdf
$ conda install gxx_linux-64
$ echo $CONDA_PREFIX
/store/HEP/mu227/miniconda3/envs/nnpdf-dev
$ conda install pkg-config swig=3.0.10 cmake
$ cd nnpdfgit/nnpdf/conda-bld
$ mkdir conda-bld
$ cd conda-bld'

At this point the compilation breaks and I do not understand why

'$ nnpdfgit/nnpdf/conda-bld $ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
-- Setting build type to 'Release' as none was specified.
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/x86_64-conda_cos6-linux-gnu-cc
-- Check for working C compiler: /store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/x86_64-conda_cos6-linux-gnu-cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/pkg-config (found version "0.29.2")
-- Checking for one of the modules 'libarchive'
-- Checking for one of the modules 'sqlite3'
-- Checking for one of the modules 'gsl'
-- Checking for one of the modules 'yaml-cpp'
CMake Error at /store/HEP/mu227/miniconda3/envs/nnpdf-dev/share/cmake-3.14/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find SWIG: Found unsuitable version "3.0.8", but required is at
least "3.0.10" (found /usr/bin/swig3.0)
Call Stack (most recent call first):
/store/HEP/mu227/miniconda3/envs/nnpdf-dev/share/cmake-3.14/Modules/FindPackageHandleStandardArgs.cmake:376 (_FPHSA_FAILURE_MESSAGE)
/store/HEP/mu227/miniconda3/envs/nnpdf-dev/share/cmake-3.14/Modules/FindSWIG.cmake:64 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
libnnpdf/wrapper/CMakeLists.txt:1 (find_package)

-- Configuring incomplete, errors occurred!
See also "/store/HEP/mu227/nnpdfgit/nnpdf/conda-bld/CMakeFiles/CMakeOutput.log".'

This is weird because if I type that it seems to be OK

'$ which cmake
/store/HEP/mu227/miniconda3/envs/nnpdf-dev/bin/cmake
$ cmake --version
cmake version 3.14.0'

Bottomline is that I cannot test the function unless I manage to compile with cmake.
However, if you merge the modifications of @wilsonmr into the master, I can try it out on my non-developer environment.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019 via email

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 9, 2019 via email

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Ok I agree we should add some tests but I believe I have fixed the various functions to work with collect in this area of results.py

@wilsonmr
Copy link
Copy Markdown
Contributor Author

well the tests are broke by this change but if we wanted to implement #459 I'd rather do that and then rebase this branch

@mariaubiali
Copy link
Copy Markdown
Contributor

Hi @Zaharid @wilsonmr

thanks! With this fix it works..
However I have this further problem with this environment... which I don't know where it comes from!!

(nnpdf-dev) mu227@queens:~/hep/NNPDFplot/phenoTH $ vp-get fit 190319-ern-nnlo-central-166-global [INFO]: Could not find a resource (fit): Could not find fit '190319-ern-nnlo-central-166-global' in '/store/HEP/mu227/miniconda3/envs/nnpdf-dev/share/NNPDF/results'. Folder '/store/HEP/mu227/miniconda3/envs/nnpdf-dev/share/NNPDF/results/190319-ern-nnlo-central-166-global' not found. Attempting to download it. [ERROR]: Resource not in the remote repository: Could not find fit '190319-ern-nnlo-central-166-global' in remote index fitdata.json Could not find resource (fit): '190319-ern-nnlo-central-166-global'.

Any idea?

Thanks!!!

@mariaubiali
Copy link
Copy Markdown
Contributor

OK, somehow deactivating and activating again the environment did the job!
Now I run the validphys script using the function implemented by @wilsonmr and it works!
One question: would it be complicated to add a function like experiment_result_table_1sigma to get the 1-sigma error band?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 20, 2019

Something that would be nice to work on (for some definition thereof) and useful would be some sort of generic functionality for composing tables. It could look like:

data:
  - ...
  - ...
  - ...

stats_columns: ['central value', 'median', '68cl', {'quantile': 75}]

actions_:
   - data_stats_table

But meanwhile I'll be merging this one.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 20, 2019

Good. It is nice to have PRs that both add functionality and decrease the line count.

Copy link
Copy Markdown
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However something has to be done with the tests.

Copy link
Copy Markdown
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the tests.

Copy link
Copy Markdown
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the tests.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Ok whilst fixing the tests I have noticed that experiments_sqrtcovmat was bugged before

see:

data.SetT0(t0set.load_t0())

we have:

loaded_exp = experiment.load()
        if t0set:
            #Copy data to avoid chaos
            data = type(loaded_exp)(loaded_exp)
            log.debug("Setting T0 predictions for %s" % loaded_exp)
            data.SetT0(t0set.load_t0())
mat = loaded_exp.get_sqrtcovmat()

so we were never using t0 since data never gets used... I will fix this as well

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 21, 2019 via email

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 21, 2019

OK, I'll let the CI run and hopefully can merge ASAP.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Sounds good

@Zaharid Zaharid merged commit 0b6bc0f into master May 21, 2019
@Zaharid Zaharid deleted the exp_res_tab branch May 22, 2019 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding PDF uncertainty in validphys

4 participants