Skip to content

Index series, not chunks#875

Merged
tomwilkie merged 6 commits intocortexproject:masterfrom
grafana:schema-v9
Aug 23, 2018
Merged

Index series, not chunks#875
tomwilkie merged 6 commits intocortexproject:masterfrom
grafana:schema-v9

Conversation

@tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Jul 11, 2018

Fixes #433 , fixes #884

Consist of 4 changes:


Move compositeSchema abstraction up to componsiteStore.

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time. This allows us to vary the store implementation over time, and not just the schema. This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks.

Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end. This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway. It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store.


Index series, not chunks

We should index the series, not chunks; this will reduce the number of entries in the index by replication factor * (chunk size / bucket size), or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query. Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries.

This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks.

I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest. So I factored out the common bits as best I could and made a new chunk store.


Add heapCache, a cache that uses a heap for evictions.

heapCache is a simple string -> interface{} cache which uses a heap to manage evictions. O(log N) inserts and updates, O(1) gets.


Skip index queries for high cardinality labels.

Firstly, cache the length of index rows we query (by the hash and range key). Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them. Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results.

Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail.

@tomwilkie tomwilkie mentioned this pull request Jul 12, 2018
@tomwilkie tomwilkie force-pushed the schema-v9 branch 6 times, most recently from 2572bf4 to bcd0443 Compare July 12, 2018 15:18
@tomwilkie tomwilkie changed the title [WIP] v9 Schema: Index series, not chunks v9 Schema: Index series, not chunks Jul 12, 2018
@tomwilkie
Copy link
Contributor Author

Got this deployed in our dev env and it seems to work. Hard to draw any conclusions as there aren't any taxing queries, but there seems to be a drop in bigtable latency in line with a reduction the amount of data we're reading.

@tomwilkie
Copy link
Contributor Author

Running in prod for a few days now and seen the worse queries go from minutes to seconds, in line with a 6-10x latency improvement. Once we rotate out the index tables, will report on their size.

@tomwilkie
Copy link
Contributor Author

The was a bug in the way we handled chunk spanning the transition; fixed now.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts.

}

chunkStore, err := chunk.NewStore(chunkStoreConfig, schemaConfig, storageClient)
chunkStore, err := chunk.NewCompositeStore(chunkStoreConfig, schemaConfig, storageClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM that you could have left the exported name as NewStore() to minimise impact elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

// Put implements ChunkStore
func (c *store) Put(ctx context.Context, chunks []Chunk) error {
for _, chunk := range chunks {
if err := c.PutOne(ctx, chunk.From, chunk.Through, chunk); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like rather a large change in performance characteristics, if you have a number of chunks to write.


var schemas = []struct {
name string
fn func(cfg SchemaConfig) Schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting storeFn to come in this commit, even though they would all be the same at this point.

@cboggs
Copy link
Contributor

cboggs commented Jul 20, 2018

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

@tomwilkie tomwilkie force-pushed the schema-v9 branch 2 times, most recently from 420ed55 to 01c790c Compare July 25, 2018 16:56
@tomwilkie
Copy link
Contributor Author

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

It could, but still doesn't avoid the problem of hotspotting the "name" row. Although it would reduce the load on that row by 12x, so might be doable at this point.

@tomwilkie tomwilkie force-pushed the schema-v9 branch 5 times, most recently from 4c6a07b to 5c2e17d Compare July 25, 2018 17:20
@tomwilkie tomwilkie changed the title v9 Schema: Index series, not chunks Index series, not chunks Jul 25, 2018
@tomwilkie
Copy link
Contributor Author

This seems like rather a large change in performance characteristics, if you have a number of chunks to write.

Yes, it potentially is. I don't think it will effect normal operation, as I think we only flush one chunk at once normally - we've been running this for a week or so, haven't notices anything. But may effect shutdown flushes.

I think the correct solution is to potentially split out the ChunkStore (get and put chunks by ID) and ChunkIndex (write entries and find entries by matchers), something that is starting to happen with the seriesStore anyway. Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

@tomwilkie
Copy link
Contributor Author

Do we really need to write another cache?

How many do we have? I see the chunk caches (which are fundamentally different things), a vendored prometheus treecache (something to do with zookeeper) and some caches in k8s/client-go. I guess the grpc connection pool is a cache, but thats a different beast too. We also cache a single result in the chunk iterator, but thats just a single result.

I'm not actually aware of any other caches (like this) in use in the codebase. We could use github.com/bluele/gcache, but I was never a fan of that library.

OTOH, not clear using a heap for evictions if the best idea; we could thread a link list through the entries and reorder that to get LRU.

@tomwilkie
Copy link
Contributor Author

I've tidied up this PR so it should be more reviewable. I see two things left todo: figure out what do to with chunks on store boundaries (try and bring back the batching) and accurately record cardinality or rows for multi-day queries.

Let me know if you get a chance to take a look @bboreham @cboggs @gouthamve.

@bboreham
Copy link
Contributor

By "we" I meant the Go community.

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

@bboreham
Copy link
Contributor

I think we only flush one chunk at once normally

If the flush queue gets very large (and we've had it in the millions many times) this assumption breaks down.

Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

I wrote some thoughts at #684 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to PutChunks then calculate all the index entries and write them at once? Might alleviate the performance characteristic changes Bryan commented about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, data should be date

level.Debug(log).Log("Chunk IDs", len(chunkIDs))

// Protect ourselves against OOMing.
if len(chunkIDs) > c.cfg.QueryChunkLimit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to do this after filtering out the chunks by time?

@csmarchbanks
Copy link
Contributor

I also have some concerns about putting one chunk vs many chunks, and added a comment with a possible idea. I like the idea of splitting ChunkStore and ChunkIndex, or some of Bryan's comments, but perhaps not as part of this PR

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time.  This allows us to vary the store implementation over time, and not just the schema.  This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks.

Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end.  This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway.  It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
We should index the series, not chunks; this will reduce the number of entries in the index by `replication factor * (chunk size / bucket size)`, or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query.  Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries.

This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks.

I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest.  So I factored out the common bits as best I could and made a new chunk store.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

Tidy up some of the logging.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie
Copy link
Contributor Author

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

I've looked at go-cache, it still uses a background goroutine to periodically expunge entries from the cache. AFAICT this means the cache can grow without bounds between these periods, and it also locks the entire cache to do this. The heap cache isn't ideal, I'm going to update it to use a simple FIFO list for evictions, but I think its better that go-cache.

fifoCache is a simple string -> interface{} cache which uses a fifo to manage evictions.  O(1) inserts, updates and gets.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Firstly, cache the length of index rows we query (by the hash and range key).  Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them.  Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results.

Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this since it has some nice changes. If anyone is worried about the performance characteristic change they can leave it in v6 until those are updated.

@tomwilkie tomwilkie merged commit 2f1e56b into cortexproject:master Aug 23, 2018
@tomwilkie tomwilkie deleted the schema-v9 branch August 23, 2018 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't query very high cardinality labels Chunk Store should index series, not chunks

5 participants