Skip to content

Adding action to plot chi2 dist for aggregate of datasets#700

Merged
scarrazza merged 9 commits into
masterfrom
experiments_chi2_dist
Apr 15, 2020
Merged

Adding action to plot chi2 dist for aggregate of datasets#700
scarrazza merged 9 commits into
masterfrom
experiments_chi2_dist

Conversation

@siranipour
Copy link
Copy Markdown
Contributor

Closes #687

Yet to work on the KDE business. Example runcard, mainly for myself, because it's currently in /tmp

pdf: NNPDF31_nnlo_as_0118_DISonly
fit: NNPDF31_nnlo_as_0118_DISonly

experiments:
  from_: fit

theoryid: 53

use_cuts: fromfit

template_text: |
  {@plot_chi2dist_experiments@}

actions_:
  - report(main=True)

@siranipour siranipour requested a review from Zaharid April 2, 2020 14:19
@siranipour siranipour force-pushed the experiments_chi2_dist branch from 29a8bc0 to e5d6f2f Compare April 2, 2020 14:21
@siranipour
Copy link
Copy Markdown
Contributor Author

This will deprecate

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

@siranipour
Copy link
Copy Markdown
Contributor Author

Note to self: for some reason the std_error is a vector of 0s, will work on this later. Problem is probably the same problem as why we need squeeze()

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Apr 3, 2020

This will deprecate

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

I think it it is good to have an action that works on a single dataset, for convenience.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Apr 7, 2020

@siranipour is this working now?

@siranipour
Copy link
Copy Markdown
Contributor Author

Well kinda, I just don't why the std_error was a vector of 0s. Haven't looked at this since the previous commit

@siranipour siranipour force-pushed the experiments_chi2_dist branch from 256603d to d78fd68 Compare April 10, 2020 09:45
@siranipour
Copy link
Copy Markdown
Contributor Author

siranipour commented Apr 10, 2020

This will deprecate

def plot_chi2dist(results, dataset, abs_chi2_data, chi2_stats, pdf):

right?

I think it it is good to have an action that works on a single dataset, for convenience.

Btw why is results a provider of that action? It's unused.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Apr 10, 2020

Looks like a bug.

@siranipour
Copy link
Copy Markdown
Contributor Author

Looks like a bug.

Will remove in this PR

@siranipour siranipour force-pushed the experiments_chi2_dist branch from d78fd68 to ca7c9d5 Compare April 13, 2020 13:49
@siranipour
Copy link
Copy Markdown
Contributor Author

This now works, there was a bug in total_experiments_chi2data which I've now fixed thanks to a comment by Nathan back in 2018 at 18.30 (what a hard worker).

@siranipour
Copy link
Copy Markdown
Contributor Author

I'll see what I can do re the KDE plots and hopefully have this merged soon

@siranipour
Copy link
Copy Markdown
Contributor Author

siranipour commented Apr 13, 2020

Ok KDE plots added, let me know if it's what you had in mind. This should be ready to merge.

You can get the KDE plot using

pdf: NNPDF31_nnlo_as_0118_DISonly
fit: NNPDF31_nnlo_as_0118_DISonly

experiments:
  from_: fit

theoryid: 53

use_cuts: fromfit

template_text: |
  {@kde_chi2dist_experiments@}

actions_:
  - report(main=True)

@Zaharid Zaharid changed the title [WIP] Adding action to print chi2 dist for aggregate of datasets Adding action to print chi2 dist for aggregate of datasets Apr 14, 2020
@Zaharid Zaharid requested a review from wilsonmr April 14, 2020 09:30
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Apr 14, 2020

The KDE plot could use some decoration such as titles, axis labels, and maybe the same sort of legend that the histogram has.

@siranipour
Copy link
Copy Markdown
Contributor Author

Done

Comment thread validphys2/src/validphys/results.py
@wilsonmr
Copy link
Copy Markdown
Contributor

looks nice - another thing that will need to be changed with #651 we should really get that finished off so that people can start using the new interface.

Copy link
Copy Markdown
Contributor

@wilsonmr wilsonmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the code for plot_chi2dist and plot_chi2dist_experiments is almost identical - I wonder if it'd be neater to have the plotting code in a function which is then called by plot_chi2dist and plot_chi2dist_experiments because as far as I can tell the only difference is the title?

The KDE code also looks pretty similar - I wonder if we could have the same base code accept a keyword whether to use hist or KDE? I guess there seems to be a lot of unneccesary boilerplate here which could be put in some base function

EDIT: sorry I mixed up the funcitons I was referring to in original comment - should make sense now

Comment thread validphys2/src/validphys/dataplots.py Outdated
@siranipour
Copy link
Copy Markdown
Contributor Author

But plot_chi2dist and plot_experiments_chi2dist have different providers

@wilsonmr
Copy link
Copy Markdown
Contributor

but look:

def base_function(experiment_or_dataset_chi2, stats, pdf):
    label = pdf.label # should we use label or name here?
    fig, ax = plt.subplots()
    alldata, central, _ = experiment_or_dataset_chi2
    if not isinstance(alldata, MCStats):
        ax.set_facecolor("#ffcccc")
        log.warning("Chi² distribution plots have a "
                "different meaning for non MC sets.")
        label += " (%s!)" % pdf.ErrorType

    label += '\n'+ '\n'.join(str(chi2_stat_labels[k])+(' %.2f' % v) for (k,v) in stats.items())
    ax.set_xlabel(r"Replica $\chi^2$")
    ax.hist(alldata.data, label=label, zorder=100)
    l = ax.legend()
    l.set_zorder(1000)
    return fig, ax

@figure
def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
    fig, ax = base_function(total_experiments_chi2data, experiments_chi2_stats)
    ax.set_title(r"Experiments $\chi^2$ distribution")
    return fig

trivial to add another action for dataset one and not that difficult to make the base function do either KDE or histogram

Comment thread validphys2/src/validphys/dataplots.py Outdated
def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
"""Plot the distribution of chi²s of the members of the pdfset."""
fig, ax = plt.subplots()
label = pdf.name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per my old code - should we use pdf.label or pdf.name here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like there is a difference on the examples I tried.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make a difference if you declared PDF like in vp-comparefits however

pdf: {id: id_of_the_base_fit, label: "whatever you like"}

@siranipour
Copy link
Copy Markdown
Contributor Author

but look:

def base_function(experiment_or_dataset_chi2, stats, pdf):
    label = pdf.label # should we use label or name here?
    fig, ax = plt.subplots()
    alldata, central, _ = experiment_or_dataset_chi2
    if not isinstance(alldata, MCStats):
        ax.set_facecolor("#ffcccc")
        log.warning("Chi² distribution plots have a "
                "different meaning for non MC sets.")
        label += " (%s!)" % pdf.ErrorType

    label += '\n'+ '\n'.join(str(chi2_stat_labels[k])+(' %.2f' % v) for (k,v) in stats.items())
    ax.set_xlabel(r"Replica $\chi^2$")
    ax.hist(alldata.data, label=label, zorder=100)
    l = ax.legend()
    l.set_zorder(1000)
    return fig, ax

@figure
def plot_chi2dist_experiments(total_experiments_chi2data, experiments_chi2_stats, pdf):
    fig, ax = base_function(total_experiments_chi2data, experiments_chi2_stats)
    ax.set_title(r"Experiments $\chi^2$ distribution")
    return fig

trivial to add another action for dataset one and not that difficult to make the base function do either KDE or histogram

Ahhhh I see, two mins will do this now

@siranipour
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestion @wilsonmr very good point. I have now refactored accordingly. Let me know what you think.

@siranipour siranipour changed the title Adding action to print chi2 dist for aggregate of datasets Adding action to plot chi2 dist for aggregate of datasets Apr 14, 2020
@scarrazza scarrazza merged commit c78fe1c into master Apr 15, 2020
@siranipour siranipour deleted the experiments_chi2_dist branch April 15, 2020 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a distribution plot for total chi²

4 participants