Skip to content

YoY double-counting/asymmetry bug for leap days #283

@kandersolar

Description

@kandersolar

While playing around with #282 I noticed an asymmetry in how leap days contribute to the distribution of YoY slopes. Ignoring filtering and first/last year complications for a moment, each day in the aggregated series is supposed to contribute to one forward and one backward slope. However it seems that there is a small bug related to leap days where a single point can contribute to three slopes instead of two.

To reproduce:

import pandas as pd
import rdtools
import matplotlib.pyplot as plt
daily_pm = pd.Series(1, index=pd.date_range('2014-01-01', '2017-12-31', freq='d'))
daily_pm.loc['2015-02-28'] = 0  # outlier point that interacts with a leap day
rd, ci, calc_info = rdtools.degradation.degradation_year_on_year(daily_pm)

fig = rdtools.plotting.degradation_summary_plots(rd, ci, calc_info, daily_pm)
fig.axes[1].set_ylim(0, 10)  # shrink y-axis to show detail

image

Note that in the histogram plot, the left-most bin has height=1 and the right-most bin has height=2. So a single outlier day that interacts with a leap day creates one big negative slope but two big positive slopes. Examining the df variable inside rdtools.degradation.degradation_year_on_year confirms this -- Feb 28 gets paired with Feb 28, but it also gets paired with Feb 29 in one direction:

df.loc['2015-02'].tail():

                   dt  energy   dt_right  energy_right dt_shifted  time_diff_years    yoy
dt
2015-02-24 2015-02-24     1.0 2014-02-24           1.0 2015-02-24              1.0    0.0
2015-02-25 2015-02-25     1.0 2014-02-25           1.0 2015-02-25              1.0    0.0
2015-02-26 2015-02-26     1.0 2014-02-26           1.0 2015-02-26              1.0    0.0
2015-02-27 2015-02-27     1.0 2014-02-27           1.0 2015-02-27              1.0    0.0
2015-02-28 2015-02-28     0.0 2014-02-28           1.0 2015-02-28              1.0 -100.0

df.loc['2016-02'].tail():

                   dt  energy   dt_right  energy_right dt_shifted  time_diff_years         yoy
dt
2016-02-25 2016-02-25     1.0 2015-02-25           1.0 2016-02-25          1.00000    0.000000
2016-02-26 2016-02-26     1.0 2015-02-26           1.0 2016-02-26          1.00000    0.000000
2016-02-27 2016-02-27     1.0 2015-02-27           1.0 2016-02-27          1.00000    0.000000
2016-02-28 2016-02-28     1.0 2015-02-28           0.0 2016-02-28          1.00000  100.000000
2016-02-29 2016-02-29     1.0 2015-02-28           0.0 2016-02-28          1.00274   99.726776

I suspect, but did not verify, that this has to do with pd.merge_asof's default choice of direction='backward'. Possible solutions:

  • do nothing because this probably has negligible impact on results
  • filter out leap days from the series before calculating YoY slopes
  • something else?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions