Batch index lookups (take #2)#981
Conversation
f0bb43e to
adfed09
Compare
adfed09 to
88ca5cc
Compare
|
Have rebased, this should be more reviewable now. |
pkg/chunk/gcp/storage_client.go
Outdated
pkg/chunk/gcp/storage_client.go
Outdated
pkg/chunk/gcp/storage_client.go
Outdated
…ch-index-lookups"" This reverts commit 8b74f90. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
b87454a to
0a0e44e
Compare
- Use iterators on the batches, so the batches themselves aren't mutated and can be cached. - Actually write back to the cache in the index caching! - Add a test that we've written back to the cache. - Use bytes.Compare, bytes.Equal etc in the filteringBatchIter to reduce copies. This reverts commit 8b74f90. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
0a0e44e to
5665e44
Compare
|
@gouthamve had to do a fair amount of work to fix the merge clash, so probably worth another look. Also, remove the committed proto and added some gogo optimisations. |
…wrong. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
|
|
||
| span, ctx := ot.StartSpanFromContext(ctx, "Index cache lookups") | ||
| for _, query := range queries { | ||
| key := queryKey(query) |
There was a problem hiding this comment.
Hmm, so queryKey only depends on TableName and HashKey. So if two different queries have those two same, we're in trouble. Though it might not be the case now (which I highly doubt), future schemas might make this point moot. Hence, I think we should make they key here out of everything rather than just TableName and HashKey.
| continue | ||
| } | ||
| keys = append(keys, key) | ||
| queriesByKey[key] = query |
There was a problem hiding this comment.
Hmm, we're actually in a position to silently drop a query here. The idea is that two different queries, might produce the same key. This key is used for caching, but I think it's wrong for using to uniquely identify a query.
So if two different queries produce the same cacheable key, we'll silently drop the first query.
There was a problem hiding this comment.
Fair enough I think you're right. Will build a test.
|
LGTM! |
|
|
||
| if err := c.storage.QueryPages(ctx, query, func(resp ReadBatch) (shouldContinue bool) { | ||
| for i := 0; i < resp.Len(); i++ { | ||
| err := c.storage.QueryPages(ctx, queries, func(query IndexQuery, resp ReadBatch) bool { |
There was a problem hiding this comment.
Since QueryPages() can now call the callback on multiple goroutines, entries needs to be protected.
This bug causes queries to return different answers randomly.
Another go at #969.
This time, batches expose iterators, so that when we cache batches they aren't mutable.