Closes #379: Add a task to monitor data source health. by emtwo · Pull Request #388 · mozilla/redash

emtwo · 2018-04-27T14:34:27Z

This still needs some tests, a bit of code refactoring, and changing the task schedule to be less frequent than 30 seconds.

Otherwise, it satisfies the basic requirement to save data source info to redis and output to status.json

arikfr · 2018-04-27T14:58:17Z

All the code used here except for the change in monitor.py and the schedule for the task, can be placed in its own module. I think it will be easier for you when syncing with upstream (and later easier to move to a plugin).

emtwo · 2018-04-27T15:05:22Z

@arikfr good point, thank you!

emtwo · 2018-05-02T15:12:59Z

One important note here is that right now this code assumes that a data source is query-able with basic SQL and that the schema is returned by get_schema() in the same format. I want to look into these things because I suspect there are non-sql data sources which may need to be handled differently.

emtwo · 2018-05-17T17:04:37Z

I've updated the PR to leverage the existing test_connection() function here: https://github.com/mozilla/redash/blob/master/redash/query_runner/__init__.py#L105

So now it uses the existing no-op queries hardcoded for all data sources by default and has the additional ability to customize queries used for a data source.

In status.json the new field will look something like this:

  "data_sources": {
    "postgres": {
      "SELECT 1": {
        "last_run": 1526571509.069719, 
        "last_run_human": "2018-05-17 15:38:29", 
        "runtime": 0.003139972686767578, 
        "status": "SUCCESS"
      }, 
      "select * from api_keys;": {
        "last_run": 1526570795.251458, 
        "last_run_human": "2018-05-17 15:26:35", 
        "runtime": null, 
        "status": "FAIL"
      }
    }
  }

@robotblake @jezdez what do you think of this approach?

emtwo · 2018-05-25T16:36:41Z

Added a commit which changes the status.json format a bit, allowing for metadata and keying on data source by ID as well as setting custom health queries by data source ID. Here is what the new format looks like:

  "data_sources": {
    "1": {
      "metadata": {
        "name": "postgres"
      }, 
      "queries": {
        "SELECT 1": {
          "last_run": 1527265788.569606, 
          "last_run_human": "2018-05-25 16:29:48", 
          "runtime": 0.0092010498046875, 
          "status": "SUCCESS"
        }, 
        "select * from users": {
          "last_run": 1527265818.619137, 
          "last_run_human": "2018-05-25 16:30:18", 
          "runtime": null, 
          "status": "FAIL"
        }
      }
    }
},

jasonthomas

Can we perform an AND operation on all the query statuses for a given datasource and expose that as an overall status for that datasource? My main reason for doing this is so to reduce the complexity of the monitoring code. From the ops perspective we should consider a datasource in a failed state if one or more queries have failed.

emtwo · 2018-05-31T16:48:45Z

thanks @jasonthomas for the feedback! Yes I like that idea! How about keeping the data that's currently visible in the json example I showed above for some bonus context, with an additional higher-level status field? For example:

  "data_sources": {
    "1": {
      "status": "FAIL",
      "metadata": {
        "name": "postgres"
      }, 
      "queries": {
        "SELECT 1": {
          "last_run": 1527265788.569606, 
          "last_run_human": "2018-05-25 16:29:48", 
          "runtime": 0.0092010498046875, 
          "status": "SUCCESS"
        }, 
        "select * from users": {
          "last_run": 1527265818.619137, 
          "last_run_human": "2018-05-25 16:30:18", 
          "runtime": null, 
          "status": "FAIL"
        }
      }
    }
},

jasonthomas · 2018-05-31T17:30:55Z

That works for me. Thanks!

emtwo · 2018-06-01T16:48:31Z

@jasonthomas I've made that change and also set it to run the heartbeat once every 12 hours (for now). Does that sound reasonable?

jasonthomas · 2018-06-01T17:14:41Z

I was thinking more of every 5-15 minutes. Can we make it so that is configurable via environment variable? We can make the default 12 hours and ops can override it.

emtwo · 2018-06-06T16:48:34Z

@jasonthomas I've updated to include an environment variable for the heartbeat frequency. Please let me know if this all looks good or if there's anything else you'd like to see! Thanks!

jasonthomas

lgtm!

emtwo force-pushed the emtwo/heartbeat branch from b1074f3 to 6b7a2fa Compare May 1, 2018 21:54

emtwo force-pushed the emtwo/heartbeat branch from 6b7a2fa to 8d163bd Compare May 17, 2018 16:57

rafrombrc mentioned this pull request May 23, 2018

Investigate ways to resolve running jobs queue not reflecting reality #287

Closed

emtwo force-pushed the emtwo/heartbeat branch from c2b438c to d9cb33f Compare May 25, 2018 16:32

emtwo changed the title ~~[WIP] Closes #379: Add a task to monitor data source health.~~ Closes #379: Add a task to monitor data source health. May 25, 2018

robotblake requested a review from jasonthomas May 30, 2018 15:25

jasonthomas reviewed May 31, 2018

View reviewed changes

emtwo force-pushed the master branch from 3cf4c0f to b553699 Compare May 31, 2018 16:01

emtwo force-pushed the emtwo/heartbeat branch 2 times, most recently from 80017eb to a027741 Compare May 31, 2018 16:53

emtwo force-pushed the emtwo/heartbeat branch from d3f8970 to d831c9a Compare June 4, 2018 15:14

Closes #379: Add a task to monitor data source health.

db4f9eb

emtwo force-pushed the emtwo/heartbeat branch from d831c9a to db4f9eb Compare June 4, 2018 15:18

This was referenced Jun 6, 2018

Make data source health monitoring an extension #415

Closed

Closes #4: Add datasource health extension. mozilla/redash-stmo#5

Merged

jasonthomas approved these changes Jun 13, 2018

View reviewed changes

emtwo merged commit 5e9cd86 into master Jun 20, 2018

jezdez deleted the emtwo/heartbeat branch June 20, 2018 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #379: Add a task to monitor data source health.#388

Closes #379: Add a task to monitor data source health.#388
emtwo merged 1 commit intomasterfrom
emtwo/heartbeat

emtwo commented Apr 27, 2018 •

edited

Loading

Uh oh!

arikfr commented Apr 27, 2018

Uh oh!

emtwo commented Apr 27, 2018

Uh oh!

emtwo commented May 2, 2018

Uh oh!

emtwo commented May 17, 2018

Uh oh!

emtwo commented May 25, 2018

Uh oh!

jasonthomas left a comment

Uh oh!

emtwo commented May 31, 2018

Uh oh!

jasonthomas commented May 31, 2018

Uh oh!

emtwo commented Jun 1, 2018

Uh oh!

jasonthomas commented Jun 1, 2018

Uh oh!

emtwo commented Jun 6, 2018

Uh oh!

jasonthomas left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

emtwo commented Apr 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arikfr commented Apr 27, 2018

Uh oh!

emtwo commented Apr 27, 2018

Uh oh!

emtwo commented May 2, 2018

Uh oh!

emtwo commented May 17, 2018

Uh oh!

emtwo commented May 25, 2018

Uh oh!

jasonthomas left a comment

Choose a reason for hiding this comment

Uh oh!

emtwo commented May 31, 2018

Uh oh!

jasonthomas commented May 31, 2018

Uh oh!

emtwo commented Jun 1, 2018

Uh oh!

jasonthomas commented Jun 1, 2018

Uh oh!

emtwo commented Jun 6, 2018

Uh oh!

jasonthomas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emtwo commented Apr 27, 2018 •

edited

Loading