Issue #758 - Update metrics vignette by nikosbosse · Pull Request #762 · epiforecasts/scoringutils

nikosbosse · 2024-03-28T11:48:18Z

Description

Related to #758.
The old vignettes related to our metrics and scoring rules were outdated. This PR creates a first proposal for a replacement. The vignette is not completely done (e.g. a section on the PIT is missing and there is more that one could say related to when to pick which score).
However, I think it makes sense to have this version in even if it's not perfect yet.

Other changes

slightly updated the order of scores in the documentation for default metrics
small correction in the docs for a function

This PR creates an updated vignette for the scoring rules.

Checklist

My PR is based on a package issue and I have explicitly linked it.
I have included the target issue or issues in the PR title as follows: issue-number: PR title
I have tested my changes locally.
I have added or updated unit tests where necessary.
I have updated the documentation if required.
I have built the package locally and run rebuilt docs using roxygen2.
My code follows the established coding standards and I have run lintr::lint_package() to check for style issues introduced by my changes.
I have added a news item linked to this PR.
I have reviewed CI checks for this PR and addressed them as far as I am able.

codecov · 2024-03-28T11:51:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.60%. Comparing base (948f251) to head (0f38601).
Report is 2 commits behind head on main.

❗ Current head 0f38601 differs from pull request most recent head 832a420. Consider uploading reports for the commit 832a420 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #762   +/-   ##
=======================================
  Coverage   95.60%   95.60%           
=======================================
  Files          21       21           
  Lines        1569     1569           
=======================================
  Hits         1500     1500           
  Misses         69       69

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nikosbosse · 2024-04-05T15:08:36Z

@seabbs this is not the best vignette it could be, but I think it makes sense to merge a first version and reiterate.

seabbs · 2024-04-08T09:22:53Z

vignettes/scoring-rules.Rmd

@@ -0,0 +1,470 @@
+---
+title: "Scoring rules in `scoringutils`"
+author: "Nikos Bosse"


in epinowcast we have moved away from tracking vignette authorship as it just gets out of date and is hard to agree on. I don't mind either way here

seabbs · 2024-04-08T09:23:55Z

vignettes/scoring-rules.Rmd

+
+# Introduction
+
+This vignette gives an overview of the default scoring rules made available through the `scoringutils` package. You can, of course, also use your own scoring rules, provided they follow the same format. If you want to obtain more detailed information about how the pacakge works, have a look at the [revised version](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) of our `scoringutils` paper. 


its really weird to me that we keep pointing at this google drive. I think if we are going to do this we should add it to inst as a pdf and point at that for now

I think it would make the most sense to just make it a package Vignette (created an issue: #784). And once we upload it to CRAN we should update the arXiv preprint

seabbs · 2024-04-08T09:27:16Z

vignettes/scoring-rules.Rmd

+
+Scoring rules are functions that take a forecast and an observation as input and return a single numeric value. For point forecasts, they take the form $S(\hat{y}, y)$, where $\hat{y}$ is the forecast and $y$ is the observation. For probabilistic forecasts, they usually take the form $S(F, y)$, where $F$ is the cumulative density function (CDF) of the predictive distribution and $y$ is the observation. By convention, scoring rules are usually negatively oriented, meaning that smaller values are better (the best possible score is usually zero). In that sense, the score can be understood as a penalty. 
+
+Many scoring rules for probabilistic forecasts are so-called (strictly) proper scoring rules. Essentially, this means that they cannot be "cheated": A forecaster evaluated by a strictly proper scoring rule is always incentivised to report her honest best belief about the future and cannot, in expectation, improve her score by reporting something else. A more formal definition is the following: Let $G$ be the true, unobserved data-generating distribution. A scoring rule is said to be proper, if under $G$ and for an ideal forecast $F = G$, there is no forecast $F' \neq F$ that in expectation receives a better score than $F$. A scoring rule is considered strictly proper if, under $G$, no other forecast $F'$ in expectation receives a score that is better than or the same as that of $F$. 


This needs a reference

seabbs

Yes I agree. This looks like a good start

seabbs · 2024-04-08T09:28:16Z

vignettes/scoring-rules.Rmd

+
+# Metrics for point forecasts
+
+See a list of the default metrics for point forecasts by calling `?metrics_point()`. 


I think might be better to talk about "reading the docs for metrics_point vs calling as lots of people will be on the website

seabbs · 2024-04-08T09:29:16Z

vignettes/scoring-rules.Rmd

+
+Scoring point forecasts can be tricky business. Depending on the choice of the scoring rule, a forecaster who is clearly worse than another, might consistently receive better scores (see @gneitingMakingEvaluatingPoint2011 for an illustrative example). 
+
+Every scoring rule for a point forecast is implicitly minimised by a specific aspect of the predictive distribution. The mean squared error, for example, is only a meaningful scoring rule if the forecaster actually reported the mean of their predictive distribution as a point forecast. If the forecaster reported the median, then the mean absolute error would be the appropriate scoring rule. If the scoring rule and the predictive task do not align, misleading results ensue. Consider the following example:


ref to transformed scores paper?

seabbs · 2024-04-08T09:29:47Z

vignettes/scoring-rules.Rmd

+mean(Metrics::se(observed, predicted_mu))
+mean(Metrics::se(observed, predicted_not_mu))
+```
+


random white space

seabbs · 2024-04-08T09:30:47Z

vignettes/scoring-rules.Rmd

+
+---
+
+# Sample-based forecasts


Need to talk about different kinds of probabilistic forecasts in the intro as currently this is not explained.

nikosbosse · 2024-04-10T19:41:00Z

Thank you! I opened a new issue to address your comments: #785. I currently set it to version 2.1., but feel free to move it to 2.0. if you think we should address this earlier

nikosbosse force-pushed the update-metrics-vignette branch from c5517fa to 44606e7 Compare March 28, 2024 11:48

nikosbosse added 3 commits March 28, 2024 20:57

Rename vignette

59cb7a6

Create vignette stud

825c3d4

update vignette

cf63a41

seabbs force-pushed the update-metrics-vignette branch from a353255 to cf63a41 Compare March 28, 2024 20:57

Create new vignette with scoring rules

70f2e0a

nikosbosse mentioned this pull request Mar 30, 2024

Issue #765 - Delete old overviews of the default metrics #764

Merged

9 tasks

nikosbosse added 2 commits April 3, 2024 10:26

Add info on quantile score

ed52874

Update vignette with explanations for the quantile score

0dc4710

nikosbosse marked this pull request as ready for review April 5, 2024 14:36

nikosbosse changed the title ~~Update metrics vignette~~ Issue #758 - Update metrics vignette Apr 5, 2024

nikosbosse and others added 2 commits April 5, 2024 17:08

Merge branch 'main' into update-metrics-vignette

0f38601

Automatic readme update [ci skip]

d8b79f3

seabbs self-requested a review April 8, 2024 09:21

Merge branch 'main' into update-metrics-vignette

832a420

seabbs enabled auto-merge (squash) April 8, 2024 09:21

seabbs reviewed Apr 8, 2024

View reviewed changes

seabbs approved these changes Apr 8, 2024

View reviewed changes

seabbs disabled auto-merge April 8, 2024 09:36

seabbs merged commit 00ae48c into main Apr 8, 2024

seabbs deleted the update-metrics-vignette branch April 8, 2024 09:36

nikosbosse mentioned this pull request Apr 10, 2024

Small updates to the scoring rules Vignette #785

Open

nikosbosse mentioned this pull request Feb 13, 2026

BOT: Fix #785: Update scoring rules vignette #1088

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #758 - Update metrics vignette#762

Issue #758 - Update metrics vignette#762
seabbs merged 9 commits intomainfrom
update-metrics-vignette

nikosbosse commented Mar 28, 2024 •

edited

Loading

Uh oh!

codecov bot commented Mar 28, 2024 •

edited

Loading

Uh oh!

nikosbosse commented Apr 5, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

nikosbosse Apr 10, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

seabbs left a comment

Uh oh!

seabbs Apr 8, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

seabbs Apr 8, 2024

Uh oh!

nikosbosse commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		# Introduction

		This vignette gives an overview of the default scoring rules made available through the `scoringutils` package. You can, of course, also use your own scoring rules, provided they follow the same format. If you want to obtain more detailed information about how the pacakge works, have a look at the [revised version](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) of our `scoringutils` paper.


		Scoring rules are functions that take a forecast and an observation as input and return a single numeric value. For point forecasts, they take the form $S(\hat{y}, y)$, where $\hat{y}$ is the forecast and $y$ is the observation. For probabilistic forecasts, they usually take the form $S(F, y)$, where $F$ is the cumulative density function (CDF) of the predictive distribution and $y$ is the observation. By convention, scoring rules are usually negatively oriented, meaning that smaller values are better (the best possible score is usually zero). In that sense, the score can be understood as a penalty.

		Many scoring rules for probabilistic forecasts are so-called (strictly) proper scoring rules. Essentially, this means that they cannot be "cheated": A forecaster evaluated by a strictly proper scoring rule is always incentivised to report her honest best belief about the future and cannot, in expectation, improve her score by reporting something else. A more formal definition is the following: Let $G$ be the true, unobserved data-generating distribution. A scoring rule is said to be proper, if under $G$ and for an ideal forecast $F = G$, there is no forecast $F' \neq F$ that in expectation receives a better score than $F$. A scoring rule is considered strictly proper if, under $G$, no other forecast $F'$ in expectation receives a score that is better than or the same as that of $F$.


		# Metrics for point forecasts

		See a list of the default metrics for point forecasts by calling `?metrics_point()`.


		Scoring point forecasts can be tricky business. Depending on the choice of the scoring rule, a forecaster who is clearly worse than another, might consistently receive better scores (see @gneitingMakingEvaluatingPoint2011 for an illustrative example).

		Every scoring rule for a point forecast is implicitly minimised by a specific aspect of the predictive distribution. The mean squared error, for example, is only a meaningful scoring rule if the forecaster actually reported the mean of their predictive distribution as a point forecast. If the forecaster reported the median, then the mean absolute error would be the appropriate scoring rule. If the scoring rule and the predictive task do not align, misleading results ensue. Consider the following example:

Conversation

nikosbosse commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov bot commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nikosbosse commented Apr 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seabbs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikosbosse commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikosbosse commented Mar 28, 2024 •

edited

Loading

codecov bot commented Mar 28, 2024 •

edited

Loading