Closes #379: Add a task to monitor data source health.#388
Conversation
|
All the code used here except for the change in |
|
@arikfr good point, thank you! |
|
One important note here is that right now this code assumes that a data source is query-able with basic SQL and that the schema is returned by get_schema() in the same format. I want to look into these things because I suspect there are non-sql data sources which may need to be handled differently. |
|
I've updated the PR to leverage the existing So now it uses the existing no-op queries hardcoded for all data sources by default and has the additional ability to customize queries used for a data source. In @robotblake @jezdez what do you think of this approach? |
|
Added a commit which changes the status.json format a bit, allowing for metadata and keying on data source by ID as well as setting custom health queries by data source ID. Here is what the new format looks like: |
jasonthomas
left a comment
There was a problem hiding this comment.
Can we perform an AND operation on all the query statuses for a given datasource and expose that as an overall status for that datasource? My main reason for doing this is so to reduce the complexity of the monitoring code. From the ops perspective we should consider a datasource in a failed state if one or more queries have failed.
|
thanks @jasonthomas for the feedback! Yes I like that idea! How about keeping the data that's currently visible in the json example I showed above for some bonus context, with an additional higher-level status field? For example: |
80017eb to
a027741
Compare
|
That works for me. Thanks! |
|
@jasonthomas I've made that change and also set it to run the heartbeat once every 12 hours (for now). Does that sound reasonable? |
|
I was thinking more of every 5-15 minutes. Can we make it so that is configurable via environment variable? We can make the default 12 hours and ops can override it. |
|
@jasonthomas I've updated to include an environment variable for the heartbeat frequency. Please let me know if this all looks good or if there's anything else you'd like to see! Thanks! |
This still needs some tests, a bit of code refactoring, and changing the task schedule to be less frequent than 30 seconds.
Otherwise, it satisfies the basic requirement to save data source info to redis and output to status.json