Skip to content

Application monitoring needed to catch when APIs are down (low cost, alerting) #213

@MikeTheCanuck

Description

@MikeTheCanuck

Turns out that we took out the Housing-Affordability APIs during an infrastructure change. Shame on me for missing that - it was a subtle mistake and after as many changes that had no destructive impact, I just didn't thoroughly evaluate the API health.

This is normal, and there's no way any distributed system like ours should have to rely on humans to remember to validate every piece of the stack every time a change rolls out.

We need to find a low- or no-cost monitoring solution of production assets that lets us achieve the following:

  • validate the health of each endpoint similarly to our smoke tests - e.g. do we get a 200 from each container (aka is the web server running)? Do we receive compliant JSON from each endpoint (aka is the Django app answering with something it got from the database)?
  • canary queries - e.g. is there a specific query for each endpoint that will remain mostly stable, and will demonstrate that the database is returning expected data?
  • validate the health of the React apps - e.g. do we get a 200 from each React app? Do we receive a reasonable "HTML response" (or some other lightweight way to show the React app is sending valid data to the requesting browser)?
  • validate the database listener - do we get a response on 5432? Is there a way to show that each database is up and responding (without having to hard-code creds in our testing harness)?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions