Skip to content

Closes #379: Add a task to monitor data source health.#388

Merged
emtwo merged 1 commit intomasterfrom
emtwo/heartbeat
Jun 20, 2018
Merged

Closes #379: Add a task to monitor data source health.#388
emtwo merged 1 commit intomasterfrom
emtwo/heartbeat

Conversation

@emtwo
Copy link
Copy Markdown

@emtwo emtwo commented Apr 27, 2018

This still needs some tests, a bit of code refactoring, and changing the task schedule to be less frequent than 30 seconds.

Otherwise, it satisfies the basic requirement to save data source info to redis and output to status.json

@arikfr
Copy link
Copy Markdown

arikfr commented Apr 27, 2018

All the code used here except for the change in monitor.py and the schedule for the task, can be placed in its own module. I think it will be easier for you when syncing with upstream (and later easier to move to a plugin).

@emtwo
Copy link
Copy Markdown
Author

emtwo commented Apr 27, 2018

@arikfr good point, thank you!

@emtwo emtwo force-pushed the emtwo/heartbeat branch from b1074f3 to 6b7a2fa Compare May 1, 2018 21:54
@emtwo
Copy link
Copy Markdown
Author

emtwo commented May 2, 2018

One important note here is that right now this code assumes that a data source is query-able with basic SQL and that the schema is returned by get_schema() in the same format. I want to look into these things because I suspect there are non-sql data sources which may need to be handled differently.

@emtwo emtwo force-pushed the emtwo/heartbeat branch from 6b7a2fa to 8d163bd Compare May 17, 2018 16:57
@emtwo
Copy link
Copy Markdown
Author

emtwo commented May 17, 2018

I've updated the PR to leverage the existing test_connection() function here: https://github.com/mozilla/redash/blob/master/redash/query_runner/__init__.py#L105

So now it uses the existing no-op queries hardcoded for all data sources by default and has the additional ability to customize queries used for a data source.

In status.json the new field will look something like this:

  "data_sources": {
    "postgres": {
      "SELECT 1": {
        "last_run": 1526571509.069719, 
        "last_run_human": "2018-05-17 15:38:29", 
        "runtime": 0.003139972686767578, 
        "status": "SUCCESS"
      }, 
      "select * from api_keys;": {
        "last_run": 1526570795.251458, 
        "last_run_human": "2018-05-17 15:26:35", 
        "runtime": null, 
        "status": "FAIL"
      }
    }
  }

@robotblake @jezdez what do you think of this approach?

@emtwo
Copy link
Copy Markdown
Author

emtwo commented May 25, 2018

Added a commit which changes the status.json format a bit, allowing for metadata and keying on data source by ID as well as setting custom health queries by data source ID. Here is what the new format looks like:

  "data_sources": {
    "1": {
      "metadata": {
        "name": "postgres"
      }, 
      "queries": {
        "SELECT 1": {
          "last_run": 1527265788.569606, 
          "last_run_human": "2018-05-25 16:29:48", 
          "runtime": 0.0092010498046875, 
          "status": "SUCCESS"
        }, 
        "select * from users": {
          "last_run": 1527265818.619137, 
          "last_run_human": "2018-05-25 16:30:18", 
          "runtime": null, 
          "status": "FAIL"
        }
      }
    }
},

@emtwo emtwo changed the title [WIP] Closes #379: Add a task to monitor data source health. Closes #379: Add a task to monitor data source health. May 25, 2018
@robotblake robotblake requested a review from jasonthomas May 30, 2018 15:25
Copy link
Copy Markdown
Member

@jasonthomas jasonthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perform an AND operation on all the query statuses for a given datasource and expose that as an overall status for that datasource? My main reason for doing this is so to reduce the complexity of the monitoring code. From the ops perspective we should consider a datasource in a failed state if one or more queries have failed.

@emtwo
Copy link
Copy Markdown
Author

emtwo commented May 31, 2018

thanks @jasonthomas for the feedback! Yes I like that idea! How about keeping the data that's currently visible in the json example I showed above for some bonus context, with an additional higher-level status field? For example:

  "data_sources": {
    "1": {
      "status": "FAIL",
      "metadata": {
        "name": "postgres"
      }, 
      "queries": {
        "SELECT 1": {
          "last_run": 1527265788.569606, 
          "last_run_human": "2018-05-25 16:29:48", 
          "runtime": 0.0092010498046875, 
          "status": "SUCCESS"
        }, 
        "select * from users": {
          "last_run": 1527265818.619137, 
          "last_run_human": "2018-05-25 16:30:18", 
          "runtime": null, 
          "status": "FAIL"
        }
      }
    }
},

@emtwo emtwo force-pushed the emtwo/heartbeat branch 2 times, most recently from 80017eb to a027741 Compare May 31, 2018 16:53
@jasonthomas
Copy link
Copy Markdown
Member

That works for me. Thanks!

@emtwo
Copy link
Copy Markdown
Author

emtwo commented Jun 1, 2018

@jasonthomas I've made that change and also set it to run the heartbeat once every 12 hours (for now). Does that sound reasonable?

@jasonthomas
Copy link
Copy Markdown
Member

I was thinking more of every 5-15 minutes. Can we make it so that is configurable via environment variable? We can make the default 12 hours and ops can override it.

@emtwo emtwo force-pushed the emtwo/heartbeat branch from d3f8970 to d831c9a Compare June 4, 2018 15:14
@emtwo emtwo force-pushed the emtwo/heartbeat branch from d831c9a to db4f9eb Compare June 4, 2018 15:18
@emtwo
Copy link
Copy Markdown
Author

emtwo commented Jun 6, 2018

@jasonthomas I've updated to include an environment variable for the heartbeat frequency. Please let me know if this all looks good or if there's anything else you'd like to see! Thanks!

Copy link
Copy Markdown
Member

@jasonthomas jasonthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@emtwo emtwo merged commit 5e9cd86 into master Jun 20, 2018
@jezdez jezdez deleted the emtwo/heartbeat branch June 20, 2018 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants