Conversation
429 indicates an over-limit condition which will have been logged by the component that detected it.
|
I worry this will make problem investigation harder. Right now, when our monitoring shows a certain error rate for a component - say nHz, the logs for that component will have a corresponding number of error entries (n per second). So in the example, the error currently shows up in the metrics of the ingester, distributor and authfe. And there will be corresponding error messages in the logs of all three. Whereas with the change, the metrics will be the same, but the error log messages will only show up in the ingester. We usually investigate errors "front to back": we'd typically first look at the error rate in authfe, look at the logs to figure out what it is, and what, if any, component it originates from, then look at the error rate of that component, then look at its logs, etc, until we get to the bottom. Only logging errors at the bottom component will break this method of investigation. |
| i := &interceptor{ResponseWriter: w, statusCode: http.StatusOK} | ||
| next.ServeHTTP(i, r) | ||
| if 100 <= i.statusCode && i.statusCode < 400 { | ||
| if 100 <= i.statusCode && (i.statusCode < 400 || i.statusCode < 429) { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
|
@rade A valid concern. However, note that you can still employ a front-to-back approach by looking at the metrics, chasing down which service the 429s come from. |
|
This PR was obviated by #59 which moved all 400-level errors to debug. |
429 indicates an over-limit condition which will have been logged by the component that detected it, so we don't need to log it again on the calling side.
Example: