Skip to content

Implement SKLearn interface#272

Merged
jaredsnyder merged 72 commits into
mainfrom
kpi_move_segments_to_base
Oct 8, 2024
Merged

Implement SKLearn interface#272
jaredsnyder merged 72 commits into
mainfrom
kpi_move_segments_to_base

Conversation

@jaredsnyder
Copy link
Copy Markdown
Contributor

@jaredsnyder jaredsnyder commented Aug 23, 2024

Changes:

  • data pull via metric club removed from all classes
  • start_date and end_date attributes removed from all classes, time filtering will now happen in kpi_forecasting.py before passing data to model classes
  • New class, BaseEnsembleForecast, created to deal with segmented models like FunnelForecast used to implement
  • New class, ProphetAutotunerForecast, created to implement automated hyperparameter tuning
  • FunnelForecast recreated as a BaseEnsembleForecast that uses a ProphetAutotunerForecast as the base model
  • summarize and write_functions, along with all the functions called within them, moved outside of classes

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is
    referenced, the pull request should include the bug number in the title)
  • Scan the PR and verify that no changes (particularly to
    .circleci/config.yml) will cause environment variables (particularly
    credentials) to be exposed in test logs
  • Ensure the container image will be using permissions granted to
    telemetry-airflow
    responsibly.

jaredsnyder and others added 30 commits July 23, 2024 14:41
change signatures of `fit` and `predict` to take arguments that default to attributes

Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting.py Outdated
Comment thread jobs/kpi-forecasting/kpi_forecasting/models/base_forecast.py Outdated
def _auto_tuning(
self, observed_df, segment_settings: SegmentModelSettings
) -> Dict[str, float]:
def _auto_tuning(self, observed_df) -> ProphetForecast:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful idea. My favorite model tuning fact is that random search is usually more efficient than grid search, especially across higher-dimensional spaces. The KPI Prophet models have been tuned with random search. If this method is computationally intensive or long-running, switching to random search could be a quick improvement.

image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think a big improvement would be to refactor to use a standardized parameter optimization library like Optuna or something so we'd get different methods for free

Comment on lines +128 to +132
if self.holidays == []:
self.holidays = None
self.holidays_raw = None
elif not self.holidays:
self.holidays_raw = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these different conditionals checking for? The first one is clearly checking for an empty list. Is the second one checking if self.holidays is None?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that its possible to have an empty list for holidays in the config, and in that case things would get screwed up in the elif case. So this ensures that an empty list is treated the same as just leaving it out of the config.

The whole holidays_raw and holidays thing is annoying. It comes down to the fact we have an intermediate class to parse the holidays in the config into the dataframe (oh prophet...) that is expected. And if I want to easily create new prophet classes it's easier to distinguish between the "raw" non-df holiday format and the 'not raw' df format that can be passed to prophet under the holidays keyword argument. It's annoying but I couldn't think of a cleaner way.

Comment on lines +285 to +289
if self.growth == "logistic":
self.logistic_growth_floor = observed_df["y"].min() * 0.5
observed_df["floor"] = self.logistic_growth_floor
self.logistic_growth_cap = observed_df["y"].max() * 1.5
observed_df["cap"] = self.logistic_growth_cap
Copy link
Copy Markdown
Contributor

@bochocki bochocki Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? These floor/cap scalars seem quite arbitrary, I'd be hesitant to use the scalars as-is, and the way this is parameterized the scalars aren't changeable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on this a little: logistic growth is convenient for some models because it enables exponential growth at values near the floor and saturating growth at values near the cap. For intermediate regions, the growth rate is roughly linear.

If the distance between the bounds is too large, you lose the dynamic benefits of a logistic curve and most of the growth will be modeled as roughly linear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is used in the funnel forecast

Comment thread jobs/kpi-forecasting/kpi_forecasting/models/prophet_forecast.py Outdated
Copy link
Copy Markdown
Contributor

@bochocki bochocki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve left some comments —- overall, this looks good. However, I am slightly concerned that the code complexity seems to have increased significantly. This might make it harder to maintain without a dedicated developer. It would be helpful to consider ways to simplify or document the code further for ease of future maintenance.

From the notebook, it seems like the KPI forecasts remain the same before and after these changes. If that’s confirmed, I think we’re in a good position to move forward with merging once the outstanding comments are addressed.

jaredsnyder and others added 5 commits September 26, 2024 09:55
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
@jaredsnyder
Copy link
Copy Markdown
Contributor Author

WRT code complexity: Yeah that is the definite downside to trying to "promote" models with segments so they'd be easier to use. I can take another pass at documenting/commenting so it's easier to work with, and can brainstorm ways to clean it up. We could also meet to try and come up with something if you think that'd be useful

Copy link
Copy Markdown
Contributor

@bochocki bochocki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround! I don't have any good ideas for reducing the code complexity right now, but maybe it's something we can keep in the back of our minds to improve slowly over time.

@jaredsnyder
Copy link
Copy Markdown
Contributor Author

Another thing I want to look into is trying to use DARTs (https://unit8co.github.io/darts/) which might eliminate a lot of the wrapper code around prophet, and maybe some of the stuff for handling data too

@bochocki
Copy link
Copy Markdown
Contributor

Darts does look neat! I tried to evaluate it as part of the KPI model selection exercise that we used to decide on prophet, but at the time they didn't have M1 support and that was enough of a blocker for local development that I didn't explore it further.

@jaredsnyder jaredsnyder merged commit b5740d8 into main Oct 8, 2024
@jaredsnyder jaredsnyder deleted the kpi_move_segments_to_base branch October 8, 2024 19:38
bochocki added a commit that referenced this pull request Oct 9, 2024
bochocki added a commit that referenced this pull request Oct 9, 2024
jaredsnyder pushed a commit that referenced this pull request Oct 9, 2024
This reverts commit b5740d8.

(cherry picked from commit 73e76df)
jaredsnyder added a commit that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants