Conversation
c5517fa to
44606e7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #762 +/- ##
=======================================
Coverage 95.60% 95.60%
=======================================
Files 21 21
Lines 1569 1569
=======================================
Hits 1500 1500
Misses 69 69 ☔ View full report in Codecov by Sentry. |
a353255 to
cf63a41
Compare
|
@seabbs this is not the best vignette it could be, but I think it makes sense to merge a first version and reiterate. |
| @@ -0,0 +1,470 @@ | |||
| --- | |||
| title: "Scoring rules in `scoringutils`" | |||
| author: "Nikos Bosse" | |||
There was a problem hiding this comment.
in epinowcast we have moved away from tracking vignette authorship as it just gets out of date and is hard to agree on. I don't mind either way here
|
|
||
| # Introduction | ||
|
|
||
| This vignette gives an overview of the default scoring rules made available through the `scoringutils` package. You can, of course, also use your own scoring rules, provided they follow the same format. If you want to obtain more detailed information about how the pacakge works, have a look at the [revised version](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) of our `scoringutils` paper. |
There was a problem hiding this comment.
its really weird to me that we keep pointing at this google drive. I think if we are going to do this we should add it to inst as a pdf and point at that for now
There was a problem hiding this comment.
I think it would make the most sense to just make it a package Vignette (created an issue: #784). And once we upload it to CRAN we should update the arXiv preprint
|
|
||
| Scoring rules are functions that take a forecast and an observation as input and return a single numeric value. For point forecasts, they take the form $S(\hat{y}, y)$, where $\hat{y}$ is the forecast and $y$ is the observation. For probabilistic forecasts, they usually take the form $S(F, y)$, where $F$ is the cumulative density function (CDF) of the predictive distribution and $y$ is the observation. By convention, scoring rules are usually negatively oriented, meaning that smaller values are better (the best possible score is usually zero). In that sense, the score can be understood as a penalty. | ||
|
|
||
| Many scoring rules for probabilistic forecasts are so-called (strictly) proper scoring rules. Essentially, this means that they cannot be "cheated": A forecaster evaluated by a strictly proper scoring rule is always incentivised to report her honest best belief about the future and cannot, in expectation, improve her score by reporting something else. A more formal definition is the following: Let $G$ be the true, unobserved data-generating distribution. A scoring rule is said to be proper, if under $G$ and for an ideal forecast $F = G$, there is no forecast $F' \neq F$ that in expectation receives a better score than $F$. A scoring rule is considered strictly proper if, under $G$, no other forecast $F'$ in expectation receives a score that is better than or the same as that of $F$. |
seabbs
left a comment
There was a problem hiding this comment.
Yes I agree. This looks like a good start
|
|
||
| # Metrics for point forecasts | ||
|
|
||
| See a list of the default metrics for point forecasts by calling `?metrics_point()`. |
There was a problem hiding this comment.
I think might be better to talk about "reading the docs for metrics_point vs calling as lots of people will be on the website
|
|
||
| Scoring point forecasts can be tricky business. Depending on the choice of the scoring rule, a forecaster who is clearly worse than another, might consistently receive better scores (see @gneitingMakingEvaluatingPoint2011 for an illustrative example). | ||
|
|
||
| Every scoring rule for a point forecast is implicitly minimised by a specific aspect of the predictive distribution. The mean squared error, for example, is only a meaningful scoring rule if the forecaster actually reported the mean of their predictive distribution as a point forecast. If the forecaster reported the median, then the mean absolute error would be the appropriate scoring rule. If the scoring rule and the predictive task do not align, misleading results ensue. Consider the following example: |
There was a problem hiding this comment.
ref to transformed scores paper?
| mean(Metrics::se(observed, predicted_mu)) | ||
| mean(Metrics::se(observed, predicted_not_mu)) | ||
| ``` | ||
|
|
|
|
||
| --- | ||
|
|
||
| # Sample-based forecasts |
There was a problem hiding this comment.
Need to talk about different kinds of probabilistic forecasts in the intro as currently this is not explained.
|
Thank you! I opened a new issue to address your comments: #785. I currently set it to version 2.1., but feel free to move it to 2.0. if you think we should address this earlier |
Description
Related to #758.
The old vignettes related to our metrics and scoring rules were outdated. This PR creates a first proposal for a replacement. The vignette is not completely done (e.g. a section on the PIT is missing and there is more that one could say related to when to pick which score).
However, I think it makes sense to have this version in even if it's not perfect yet.
Other changes
This PR creates an updated vignette for the scoring rules.
Checklist
lintr::lint_package()to check for style issues introduced by my changes.