Conversation
91ab6d2 to
c1bd9b3
Compare
jml
left a comment
There was a problem hiding this comment.
A couple of questions, but nothing that needs to change.
If you want to bounce back for another round of review, feel free, but no need to do so.
pkg/chunk/aws_storage_client.go
Outdated
There was a problem hiding this comment.
I'm more familiar with choosing the level of parallelism (e.g. how many concurrent goroutines) than choosing the size of each concurrent job, as you're doing here. I don't have opinions on which is better. Why did you decide to do it this way?
There was a problem hiding this comment.
Choosing the level of parallelism requires a global queue of work across all queries. I'd like to do that, but don't feel I can complete it in the current sprint.
Computing the "gang size" to target a certain parallelism per query is harder to tune (wanting to keep the batches sent to DynamdDB fairly large), and the end result, given many queries in parallel, will still have highly variable overall parallelism.
pkg/chunk/aws_storage_client.go
Outdated
There was a problem hiding this comment.
Obligatory type theory nerdery: this is more accurately chunksTimesError. I'm not actually suggesting you change this.
pkg/chunk/aws_storage_client.go
Outdated
There was a problem hiding this comment.
We have a few other implementations of "parallel map a function that returns (a, err)". Some of the others use a buffered channel. I don't think it's necessary here, because the loop immediately following will drain all of results as quickly as possible.
No action required, just flagging in case my reading is wrong and you notice something on a second glance.
There was a problem hiding this comment.
Yes, a buffer would be extra memory management overhead for no obvious benefit.
Run multiple goroutines in parallel to drive DynamoDB harder. Group chunk fetches into "gangs" to allow some configuration over how hard we hit DynamoDB
c1bd9b3 to
bec45d9
Compare
Run multiple goroutines in parallel to drive DynamoDB harder: group chunk fetches into "gangs" to allow some configuration over how hard we hit DynamoDB
This is a very simple scheme; it could be done much more intelligently but I think it's worth deploying something simple
Example query shown below, run against dev: although the overall time is only a little shorter this includes one DynamoDB batch fetch that took 3 seconds, similar to what is described at #602. In the presence of throttling it gets more efficient to run several queries in parallel which may hit different tables or shards.
Query is
sum(rate(container_cpu_usage_seconds_total{image!="",namespace!=""}[5m])) BY (namespace)start=1510770800 end=1510770830Trace before:
and after: