Skip to content

Batch index lookups (take #2)#981

Merged
tomwilkie merged 9 commits intocortexproject:masterfrom
grafana:batch-index-lookups
Sep 14, 2018
Merged

Batch index lookups (take #2)#981
tomwilkie merged 9 commits intocortexproject:masterfrom
grafana:batch-index-lookups

Conversation

@tomwilkie
Copy link
Contributor

Another go at #969.

This time, batches expose iterators, so that when we cache batches they aren't mutable.

@tomwilkie tomwilkie changed the title Batch index lookups Batch index lookups (take #2) Sep 3, 2018
@tomwilkie
Copy link
Contributor Author

Have rebased, this should be more reviewable now.

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space.

…ch-index-lookups""

This reverts commit 8b74f90.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie tomwilkie force-pushed the batch-index-lookups branch 3 times, most recently from b87454a to 0a0e44e Compare September 11, 2018 14:31
- Use iterators on the batches, so the batches themselves aren't mutated and can be cached.
- Actually write back to the cache in the index caching!
- Add a test that we've written back to the cache.
- Use bytes.Compare, bytes.Equal etc in the filteringBatchIter to reduce copies.

This reverts commit 8b74f90.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie
Copy link
Contributor Author

@gouthamve had to do a fair amount of work to fix the merge clash, so probably worth another look. Also, remove the committed proto and added some gogo optimisations.

…wrong.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

span, ctx := ot.StartSpanFromContext(ctx, "Index cache lookups")
for _, query := range queries {
key := queryKey(query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so queryKey only depends on TableName and HashKey. So if two different queries have those two same, we're in trouble. Though it might not be the case now (which I highly doubt), future schemas might make this point moot. Hence, I think we should make they key here out of everything rather than just TableName and HashKey.

continue
}
keys = append(keys, key)
queriesByKey[key] = query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we're actually in a position to silently drop a query here. The idea is that two different queries, might produce the same key. This key is used for caching, but I think it's wrong for using to uniquely identify a query.

So if two different queries produce the same cacheable key, we'll silently drop the first query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough I think you're right. Will build a test.

@gouthamve
Copy link
Contributor

LGTM!

@tomwilkie tomwilkie merged commit 8bfb9ee into cortexproject:master Sep 14, 2018
@tomwilkie tomwilkie deleted the batch-index-lookups branch September 14, 2018 12:03

if err := c.storage.QueryPages(ctx, query, func(resp ReadBatch) (shouldContinue bool) {
for i := 0; i < resp.Len(); i++ {
err := c.storage.QueryPages(ctx, queries, func(query IndexQuery, resp ReadBatch) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since QueryPages() can now call the callback on multiple goroutines, entries needs to be protected.
This bug causes queries to return different answers randomly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants