Skip to content

Decide how WIS should be computed and returned #386

@nikosbosse

Description

@nikosbosse

There is several options for the relationship between input and output for the interval score:

  1. One option is to think of it as a many-to-one relationship: One forecast comprises several quantiles / intervals. But you only get one score per forecast (same as e.g. for CRPS)
  2. Another option is to treat every quantile / interval as a separate forecast. Then you could get one score per quantile / interval.

Image

The current implementation of score() returns one score per quantile. That's why I created a horribly named function wis_one_to_one() which also does that. The function wis() only returns one score per forecast, not one score per quantile.

Current options are:

  • use wis() within score() and change the current behaviour from "one score per quantile" to "one score per forecast"
  • create an additional format, "interval" in which one forecast really is one interval. Then people could run score() on that and would get one score per interval. (we'd have to think a bit about how we handle the median when summarising, see below)
  • use wis_one_to_one() or a better named cousin in score() and return one score per quantile as we used to do in the past.
  • Merge wis() and wis_one_to_one() into one function. That's easily possible, there will just be a lot of arguments which are ugly (like count_median_twice, output, `mapping = c("one-to-one", "many-to-one").

Related: Computation of coverage, see #389

Additional background:
An additional complication is the difference between the quantile score and the interval score. You can compute a quantile score for a single quantile, but you can't compute overprediction, dispersion and underprediction.
You can do that for a single interval (which comprises to quantiles), so you can easily compute one score per interval.
If you want to return one score per quantile, but have over/underprediction + dispersion then you actually need to compute an interval score and then merge the result back.

Adding to that complication is that there is two ways to conceptualise the WIS:
a) as an average of quantile scores or
b) as an average of interval scores.
The implication is how you treat the median: If you think of the median as a 0% prediction interval (which I think is illegal and you're not allowed to do, but which we do computationally) then you end up with a difference between average quantile score and average interval score. If you average over interval scores, then the median effectively appears twice (once as lower and once as upper bound), whereas when you average over quantile scores, it appears only once. Therefore we have an argument count_median_twice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions