-
Notifications
You must be signed in to change notification settings - Fork 850
Closed
Description
I was just trying to get a graph of our frontend's QPS over the last week using the data we have in Cortex. I could get as far as 2 days, but even then, Grafana would sometimes render an error page. On 7 days, it consistently errors.
Looking at the Cortex logs for the querier service, I see:
time="2017-02-06T14:09:14Z" level=warning msg="Error fetching from cache: read tcp 10.244.254.146:52292->10.244.229.94:11211: i/o timeout" source="chunk_store.go:469"
time="2017-02-06T14:09:23Z" level=error msg="Error in MergeQuerier.QueryRange: InternalError: We encountered an internal error. Please try again.\n\tstatus code: 500, request id: 8CED988DD7ED850A" source="querier.go:130"
and
time="2017-02-06T14:07:08Z" level=warning msg="Error fetching from cache: read tcp 10.244.228.139:42174->10.244.253.92:11211: i/o timeout" source="chunk_store.go:469"
time="2017-02-06T14:07:28Z" level=error msg="Error in MergeQuerier.QueryRange: RequestError: send request failed\ncaused by: Get https://weaveworks-prod-chunks.s3.amazonaws.com/2/15428021661599280118%3A1486162592195%3A1486170002195: http: server closed idle connection" source="querier.go:130"
time="2017-02-06T14:07:28Z" level=error msg="Error in MergeQuerier.QueryRange: RequestError: send request failed\ncaused by: Get https://weaveworks-prod-chunks.s3.amazonaws.com/2/12619149369118128877%3A1486247414537%3A1486262234537: EOF" source="querier.go:130"
The error grafana sees looks like:
{
"status": "error",
"errorType": "execution",
"error": "InternalError: We encountered an internal error. Please try again.\n\tstatus code: 500, request id: 0BA1F8D00B3A2DDC",
"message": "InternalError: We encountered an internal error. Please try again.\n\tstatus code: 500, request id: 0BA1F8D00B3A2DDC"
}
I guess there are two possible implications of this behaviour:
- maybe we should do some sort of more sophisticated error handling on reads (e.g. retries)
- maybe we need to provide special logic for longer range queries with more data in order to make them perform reasonably
Reactions are currently unavailable