Adding action to plot chi2 dist for aggregate of datasets by siranipour · Pull Request #700 · NNPDF/nnpdf

siranipour · 2020-04-02T14:19:39Z

Closes #687

Yet to work on the KDE business. Example runcard, mainly for myself, because it's currently in /tmp

pdf: NNPDF31_nnlo_as_0118_DISonly
fit: NNPDF31_nnlo_as_0118_DISonly

experiments:
  from_: fit

theoryid: 53

use_cuts: fromfit

template_text: |
  {@plot_chi2dist_experiments@}

actions_:
  - report(main=True)

siranipour · 2020-04-02T14:28:27Z

This will deprecate

nnpdf/validphys2/src/validphys/dataplots.py

Line 32 in 7eb1089

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

siranipour · 2020-04-02T15:08:13Z

Note to self: for some reason the std_error is a vector of 0s, will work on this later. Problem is probably the same problem as why we need squeeze()

Zaharid · 2020-04-03T10:27:44Z

This will deprecate

nnpdf/validphys2/src/validphys/dataplots.py

Line 32 in 7eb1089

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

I think it it is good to have an action that works on a single dataset, for convenience.

Zaharid · 2020-04-07T15:05:03Z

@siranipour is this working now?

siranipour · 2020-04-07T15:15:58Z

Well kinda, I just don't why the std_error was a vector of 0s. Haven't looked at this since the previous commit

siranipour · 2020-04-10T09:56:46Z

This will deprecate

nnpdf/validphys2/src/validphys/dataplots.py

Line 32 in 7eb1089

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

I think it it is good to have an action that works on a single dataset, for convenience.

Btw why is results a provider of that action? It's unused.

Zaharid · 2020-04-10T09:57:49Z

Looks like a bug.

siranipour · 2020-04-10T10:05:21Z

Looks like a bug.

Will remove in this PR

siranipour · 2020-04-13T15:01:27Z

This now works, there was a bug in total_experiments_chi2data which I've now fixed thanks to a comment by Nathan back in 2018 at 18.30 (what a hard worker).

siranipour · 2020-04-13T15:02:09Z

I'll see what I can do re the KDE plots and hopefully have this merged soon

siranipour · 2020-04-13T15:41:34Z

Ok KDE plots added, let me know if it's what you had in mind. This should be ready to merge.

You can get the KDE plot using

pdf: NNPDF31_nnlo_as_0118_DISonly
fit: NNPDF31_nnlo_as_0118_DISonly

experiments:
  from_: fit

theoryid: 53

use_cuts: fromfit

template_text: |
  {@kde_chi2dist_experiments@}

actions_:
  - report(main=True)

Zaharid · 2020-04-14T09:33:57Z

The KDE plot could use some decoration such as titles, axis labels, and maybe the same sort of legend that the histogram has.

siranipour · 2020-04-14T10:28:54Z

Done

wilsonmr · 2020-04-14T10:57:42Z

looks nice - another thing that will need to be changed with #651 we should really get that finished off so that people can start using the new interface.

wilsonmr

Given that the code for plot_chi2dist and plot_chi2dist_experiments is almost identical - I wonder if it'd be neater to have the plotting code in a function which is then called by plot_chi2dist and plot_chi2dist_experiments because as far as I can tell the only difference is the title?

The KDE code also looks pretty similar - I wonder if we could have the same base code accept a keyword whether to use hist or KDE? I guess there seems to be a lot of unneccesary boilerplate here which could be put in some base function

EDIT: sorry I mixed up the funcitons I was referring to in original comment - should make sense now

siranipour · 2020-04-14T11:21:45Z

But plot_chi2dist and plot_experiments_chi2dist have different providers

wilsonmr · 2020-04-14T11:29:48Z

but look:

def base_function(experiment_or_dataset_chi2, stats, pdf):
    label = pdf.label # should we use label or name here?
    fig, ax = plt.subplots()
    alldata, central, _ = experiment_or_dataset_chi2
    if not isinstance(alldata, MCStats):
        ax.set_facecolor("#ffcccc")
        log.warning("Chi² distribution plots have a "
                "different meaning for non MC sets.")
        label += " (%s!)" % pdf.ErrorType

    label += '\n'+ '\n'.join(str(chi2_stat_labels[k])+(' %.2f' % v) for (k,v) in stats.items())
    ax.set_xlabel(r"Replica $\chi^2$")
    ax.hist(alldata.data, label=label, zorder=100)
    l = ax.legend()
    l.set_zorder(1000)
    return fig, ax

@figure
def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
    fig, ax = base_function(total_experiments_chi2data, experiments_chi2_stats)
    ax.set_title(r"Experiments $\chi^2$ distribution")
    return fig

trivial to add another action for dataset one and not that difficult to make the base function do either KDE or histogram

wilsonmr · 2020-04-14T11:30:30Z

+def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
+    """Plot the distribution of chi²s of the members of the pdfset."""
+    fig, ax = plt.subplots()
+    label = pdf.name


as per my old code - should we use pdf.label or pdf.name here?

Doesn't look like there is a difference on the examples I tried.

I think it would make a difference if you declared PDF like in vp-comparefits however

pdf: {id: id_of_the_base_fit, label: "whatever you like"}

siranipour · 2020-04-14T11:34:19Z

but look:

def base_function(experiment_or_dataset_chi2, stats, pdf):
    label = pdf.label # should we use label or name here?
    fig, ax = plt.subplots()
    alldata, central, _ = experiment_or_dataset_chi2
    if not isinstance(alldata, MCStats):
        ax.set_facecolor("#ffcccc")
        log.warning("Chi² distribution plots have a "
                "different meaning for non MC sets.")
        label += " (%s!)" % pdf.ErrorType

    label += '\n'+ '\n'.join(str(chi2_stat_labels[k])+(' %.2f' % v) for (k,v) in stats.items())
    ax.set_xlabel(r"Replica $\chi^2$")
    ax.hist(alldata.data, label=label, zorder=100)
    l = ax.legend()
    l.set_zorder(1000)
    return fig, ax

@figure
def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
    fig, ax = base_function(total_experiments_chi2data, experiments_chi2_stats)
    ax.set_title(r"Experiments $\chi^2$ distribution")
    return fig

trivial to add another action for dataset one and not that difficult to make the base function do either KDE or histogram

Ahhhh I see, two mins will do this now

siranipour · 2020-04-14T12:01:03Z

Thanks for the suggestion @wilsonmr very good point. I have now refactored accordingly. Let me know what you think.

siranipour requested a review from Zaharid April 2, 2020 14:19

siranipour force-pushed the experiments_chi2_dist branch from 29a8bc0 to e5d6f2f Compare April 2, 2020 14:21

siranipour force-pushed the experiments_chi2_dist branch from 256603d to d78fd68 Compare April 10, 2020 09:45

siranipour force-pushed the experiments_chi2_dist branch from d78fd68 to ca7c9d5 Compare April 13, 2020 13:49

siranipour added 5 commits April 13, 2020 14:50

Adding action to print chi2 dist for aggregate of datasets

5279f42

Correcting legend for histogram plot

ca7c9d5

Removing results provider from plot_chi2dist (unused)

90be0e8

Correcting shape of numpy broadcasting

f0b49a4

Adding title to plot

1f22441

Adding kde plots for chi2 distributions

8800deb

Zaharid changed the title ~~[WIP] Adding action to print chi2 dist for aggregate of datasets~~ Adding action to print chi2 dist for aggregate of datasets Apr 14, 2020

Zaharid requested a review from wilsonmr April 14, 2020 09:30

Adding title, axis labels, and legend to KDE plot

4624253

Adding x axis labels to histogram plots

3a4b2ab

siranipour commented Apr 14, 2020

View reviewed changes

Comment thread validphys2/src/validphys/results.py

Zaharid approved these changes Apr 14, 2020

View reviewed changes

wilsonmr reviewed Apr 14, 2020

View reviewed changes

Comment thread validphys2/src/validphys/dataplots.py Outdated

wilsonmr reviewed Apr 14, 2020

View reviewed changes

Refactoring such that we get rid of boilerplate

3c2435b

siranipour changed the title ~~Adding action to print chi2 dist for aggregate of datasets~~ Adding action to plot chi2 dist for aggregate of datasets Apr 14, 2020

scarrazza merged commit c78fe1c into master Apr 15, 2020

siranipour deleted the experiments_chi2_dist branch April 15, 2020 14:49

Conversation

siranipour commented Apr 2, 2020

Uh oh!

siranipour commented Apr 2, 2020

Uh oh!

siranipour commented Apr 2, 2020

Uh oh!

Zaharid commented Apr 3, 2020

Uh oh!

Zaharid commented Apr 7, 2020

Uh oh!

siranipour commented Apr 7, 2020

Uh oh!

siranipour commented Apr 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Apr 10, 2020

Uh oh!

siranipour commented Apr 10, 2020

Uh oh!

siranipour commented Apr 13, 2020

Uh oh!

siranipour commented Apr 13, 2020

Uh oh!

siranipour commented Apr 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Apr 14, 2020

Uh oh!

siranipour commented Apr 14, 2020

Uh oh!

Uh oh!

wilsonmr commented Apr 14, 2020

Uh oh!

wilsonmr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

siranipour commented Apr 14, 2020

Uh oh!

wilsonmr commented Apr 14, 2020

Uh oh!

wilsonmr Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

siranipour Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

wilsonmr Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

siranipour commented Apr 14, 2020

Uh oh!

siranipour commented Apr 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

siranipour commented Apr 10, 2020 •

edited

Loading

siranipour commented Apr 13, 2020 •

edited

Loading

wilsonmr left a comment •

edited

Loading