Make misfire grace time and reporter start time configurable #445

zollman · 2016-11-16T19:37:47Z

Type: feature

Why:
When working with a large number of accounts, sometimes the run_reporter
is never run for some accounts.

There were two issues with the app scheduler configuration:

Jobs are scheduled before the scheduler starts and when there are
a lot of accounts, some of the scheduled times fall before the
scheduler is started.
There is a hardcoded misfire grace time of 30 seconds, meaning that
if a job is scheduled and does not start within 30 seconds because
of thread contention or other issues it will be cancelled.

This change addresses the need by:
Providing configurable times for the schedule start delay and the
misfire grace time

Potential Side Effects:
No known side effects

Type: feature Why: When working with a large number of accounts, sometimes the run_reporter is never run for some accounts. There were two issues with the app scheduler configuration: 1) Jobs are scheduled before the scheduler starts and when there are a lot of accounts, some of the scheduled times fall before the scheduler is started. 2) There is a hardcoded misfire grace time of 30 seconds, meaning that if a job is scheduled and does not start within 30 seconds because of thread contention or other issues it will be cancelled. This change addresses the need by: Providing configurable times for the schedule start delay and the misfire grace time Potential Side Effects: No known side effects

scriptsrc · 2016-11-16T22:34:29Z

Do you have a recommendation for when we should modify these defaults?

Wat should they be when monitoring 30 or 45 or 60 accounts?

# Apscheduler Configurations
# Length of time, in seconds, before a scheduled job is cancelled due to thread contention or other issues
MISFIRE_GRACE_TIME=30
# Delay, in seconds, until reporter starts
REPORTER_START_DELAY=10

aebie · 2016-11-17T21:50:32Z

@MonkeySecurity on the reporter start delay we have it configured to the number of accounts + 2. Might be a little bit of overkill but we found that if the scheduler starts before all jobs are scheduled, the remaining ones don't run until the next interval. If the interval for some watchers is set to a large number like daily this can be problematic.

The MISFIRE_GRACE_TIME is a little trickier because it's related to the number of threads, and the average reporter run time. What we were seeing is that as the number of accounts grew and we added more threads, the reporter ran slower because of some inherent bottlenecks in boto when running multiple threads in the same process. By the time we got to 60 accounts it was basically just thrashing. We reduced the number of threads to to about half the number of accounts, but then ran into the situation where some accounts never got run because they would time out waiting for a thread. We made the MISFIRE_GRACE_TIME an hour and found that accounts ran reasonably fast and none of the accounts starved.

I think this is a temporary fix because of the Lambda based architecture you described, and also because we have another change that should be coming at some point where we stagger reporter runs across the interval.

scriptsrc · 2016-11-18T18:16:39Z

Glad I asked. That's excellent to know. I think I'll update my config accordingly.

scriptsrc merged commit 99d5c9e into Netflix:develop Nov 18, 2016

zollman deleted the 7869_run_reporter_config branch November 30, 2016 20:48

scriptsrc mentioned this pull request Dec 2, 2016

Release v0.8.0 #458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make misfire grace time and reporter start time configurable #445

Make misfire grace time and reporter start time configurable #445

Uh oh!

zollman commented Nov 16, 2016

Uh oh!

scriptsrc commented Nov 16, 2016

Uh oh!

aebie commented Nov 17, 2016 •

edited

Loading

Uh oh!

scriptsrc commented Nov 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make misfire grace time and reporter start time configurable #445

Make misfire grace time and reporter start time configurable #445

Uh oh!

Conversation

zollman commented Nov 16, 2016

Uh oh!

scriptsrc commented Nov 16, 2016

Uh oh!

aebie commented Nov 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scriptsrc commented Nov 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aebie commented Nov 17, 2016 •

edited

Loading