Skip to content

Issue #758 - Update metrics vignette#762

Merged
seabbs merged 9 commits intomainfrom
update-metrics-vignette
Apr 8, 2024
Merged

Issue #758 - Update metrics vignette#762
seabbs merged 9 commits intomainfrom
update-metrics-vignette

Conversation

@nikosbosse
Copy link
Collaborator

@nikosbosse nikosbosse commented Mar 28, 2024

Description

Related to #758.
The old vignettes related to our metrics and scoring rules were outdated. This PR creates a first proposal for a replacement. The vignette is not completely done (e.g. a section on the PIT is missing and there is more that one could say related to when to pick which score).
However, I think it makes sense to have this version in even if it's not perfect yet.

Other changes

  • slightly updated the order of scores in the documentation for default metrics
  • small correction in the docs for a function

This PR creates an updated vignette for the scoring rules.

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title as follows: issue-number: PR title
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • I have built the package locally and run rebuilt docs using roxygen2.
  • My code follows the established coding standards and I have run lintr::lint_package() to check for style issues introduced by my changes.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

@nikosbosse nikosbosse force-pushed the update-metrics-vignette branch from c5517fa to 44606e7 Compare March 28, 2024 11:48
@codecov
Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.60%. Comparing base (948f251) to head (0f38601).
Report is 2 commits behind head on main.

❗ Current head 0f38601 differs from pull request most recent head 832a420. Consider uploading reports for the commit 832a420 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #762   +/-   ##
=======================================
  Coverage   95.60%   95.60%           
=======================================
  Files          21       21           
  Lines        1569     1569           
=======================================
  Hits         1500     1500           
  Misses         69       69           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seabbs seabbs force-pushed the update-metrics-vignette branch from a353255 to cf63a41 Compare March 28, 2024 20:57
@nikosbosse nikosbosse marked this pull request as ready for review April 5, 2024 14:36
@nikosbosse nikosbosse changed the title Update metrics vignette Issue #758 - Update metrics vignette Apr 5, 2024
@nikosbosse
Copy link
Collaborator Author

@seabbs this is not the best vignette it could be, but I think it makes sense to merge a first version and reiterate.

@seabbs seabbs self-requested a review April 8, 2024 09:21
@seabbs seabbs enabled auto-merge (squash) April 8, 2024 09:21
@@ -0,0 +1,470 @@
---
title: "Scoring rules in `scoringutils`"
author: "Nikos Bosse"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in epinowcast we have moved away from tracking vignette authorship as it just gets out of date and is hard to agree on. I don't mind either way here


# Introduction

This vignette gives an overview of the default scoring rules made available through the `scoringutils` package. You can, of course, also use your own scoring rules, provided they follow the same format. If you want to obtain more detailed information about how the pacakge works, have a look at the [revised version](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) of our `scoringutils` paper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its really weird to me that we keep pointing at this google drive. I think if we are going to do this we should add it to inst as a pdf and point at that for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make the most sense to just make it a package Vignette (created an issue: #784). And once we upload it to CRAN we should update the arXiv preprint


Scoring rules are functions that take a forecast and an observation as input and return a single numeric value. For point forecasts, they take the form $S(\hat{y}, y)$, where $\hat{y}$ is the forecast and $y$ is the observation. For probabilistic forecasts, they usually take the form $S(F, y)$, where $F$ is the cumulative density function (CDF) of the predictive distribution and $y$ is the observation. By convention, scoring rules are usually negatively oriented, meaning that smaller values are better (the best possible score is usually zero). In that sense, the score can be understood as a penalty.

Many scoring rules for probabilistic forecasts are so-called (strictly) proper scoring rules. Essentially, this means that they cannot be "cheated": A forecaster evaluated by a strictly proper scoring rule is always incentivised to report her honest best belief about the future and cannot, in expectation, improve her score by reporting something else. A more formal definition is the following: Let $G$ be the true, unobserved data-generating distribution. A scoring rule is said to be proper, if under $G$ and for an ideal forecast $F = G$, there is no forecast $F' \neq F$ that in expectation receives a better score than $F$. A scoring rule is considered strictly proper if, under $G$, no other forecast $F'$ in expectation receives a score that is better than or the same as that of $F$.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a reference

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. This looks like a good start


# Metrics for point forecasts

See a list of the default metrics for point forecasts by calling `?metrics_point()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think might be better to talk about "reading the docs for metrics_point vs calling as lots of people will be on the website


Scoring point forecasts can be tricky business. Depending on the choice of the scoring rule, a forecaster who is clearly worse than another, might consistently receive better scores (see @gneitingMakingEvaluatingPoint2011 for an illustrative example).

Every scoring rule for a point forecast is implicitly minimised by a specific aspect of the predictive distribution. The mean squared error, for example, is only a meaningful scoring rule if the forecaster actually reported the mean of their predictive distribution as a point forecast. If the forecaster reported the median, then the mean absolute error would be the appropriate scoring rule. If the scoring rule and the predictive task do not align, misleading results ensue. Consider the following example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref to transformed scores paper?

mean(Metrics::se(observed, predicted_mu))
mean(Metrics::se(observed, predicted_not_mu))
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random white space


---

# Sample-based forecasts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to talk about different kinds of probabilistic forecasts in the intro as currently this is not explained.

@seabbs seabbs disabled auto-merge April 8, 2024 09:36
@seabbs seabbs merged commit 00ae48c into main Apr 8, 2024
@seabbs seabbs deleted the update-metrics-vignette branch April 8, 2024 09:36
@nikosbosse
Copy link
Collaborator Author

Thank you! I opened a new issue to address your comments: #785. I currently set it to version 2.1., but feel free to move it to 2.0. if you think we should address this earlier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants