-
Notifications
You must be signed in to change notification settings - Fork 77
Description
While playing around with #282 I noticed an asymmetry in how leap days contribute to the distribution of YoY slopes. Ignoring filtering and first/last year complications for a moment, each day in the aggregated series is supposed to contribute to one forward and one backward slope. However it seems that there is a small bug related to leap days where a single point can contribute to three slopes instead of two.
To reproduce:
import pandas as pd
import rdtools
import matplotlib.pyplot as plt
daily_pm = pd.Series(1, index=pd.date_range('2014-01-01', '2017-12-31', freq='d'))
daily_pm.loc['2015-02-28'] = 0 # outlier point that interacts with a leap day
rd, ci, calc_info = rdtools.degradation.degradation_year_on_year(daily_pm)
fig = rdtools.plotting.degradation_summary_plots(rd, ci, calc_info, daily_pm)
fig.axes[1].set_ylim(0, 10) # shrink y-axis to show detailNote that in the histogram plot, the left-most bin has height=1 and the right-most bin has height=2. So a single outlier day that interacts with a leap day creates one big negative slope but two big positive slopes. Examining the df variable inside rdtools.degradation.degradation_year_on_year confirms this -- Feb 28 gets paired with Feb 28, but it also gets paired with Feb 29 in one direction:
df.loc['2015-02'].tail():
dt energy dt_right energy_right dt_shifted time_diff_years yoy
dt
2015-02-24 2015-02-24 1.0 2014-02-24 1.0 2015-02-24 1.0 0.0
2015-02-25 2015-02-25 1.0 2014-02-25 1.0 2015-02-25 1.0 0.0
2015-02-26 2015-02-26 1.0 2014-02-26 1.0 2015-02-26 1.0 0.0
2015-02-27 2015-02-27 1.0 2014-02-27 1.0 2015-02-27 1.0 0.0
2015-02-28 2015-02-28 0.0 2014-02-28 1.0 2015-02-28 1.0 -100.0
df.loc['2016-02'].tail():
dt energy dt_right energy_right dt_shifted time_diff_years yoy
dt
2016-02-25 2016-02-25 1.0 2015-02-25 1.0 2016-02-25 1.00000 0.000000
2016-02-26 2016-02-26 1.0 2015-02-26 1.0 2016-02-26 1.00000 0.000000
2016-02-27 2016-02-27 1.0 2015-02-27 1.0 2016-02-27 1.00000 0.000000
2016-02-28 2016-02-28 1.0 2015-02-28 0.0 2016-02-28 1.00000 100.000000
2016-02-29 2016-02-29 1.0 2015-02-28 0.0 2016-02-28 1.00274 99.726776
I suspect, but did not verify, that this has to do with pd.merge_asof's default choice of direction='backward'. Possible solutions:
- do nothing because this probably has negligible impact on results
- filter out leap days from the series before calculating YoY slopes
- something else?
