Index series, not chunks by tomwilkie · Pull Request #875 · cortexproject/cortex

tomwilkie · 2018-07-11T21:05:33Z

Fixes #433 , fixes #884

Consist of 4 changes:

Move compositeSchema abstraction up to componsiteStore.

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time. This allows us to vary the store implementation over time, and not just the schema. This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks.

Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end. This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway. It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store.

Index series, not chunks

We should index the series, not chunks; this will reduce the number of entries in the index by replication factor * (chunk size / bucket size), or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query. Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries.

This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks.

I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest. So I factored out the common bits as best I could and made a new chunk store.

Add heapCache, a cache that uses a heap for evictions.

heapCache is a simple string -> interface{} cache which uses a heap to manage evictions. O(log N) inserts and updates, O(1) gets.

Skip index queries for high cardinality labels.

Firstly, cache the length of index rows we query (by the hash and range key). Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them. Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results.

Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail.

tomwilkie · 2018-07-12T18:32:41Z

Got this deployed in our dev env and it seems to work. Hard to draw any conclusions as there aren't any taxing queries, but there seems to be a drop in bigtable latency in line with a reduction the amount of data we're reading.

tomwilkie · 2018-07-16T10:03:18Z

Running in prod for a few days now and seen the worse queries go from minutes to seconds, in line with a 6-10x latency improvement. Once we rotate out the index tables, will report on their size.

tomwilkie · 2018-07-17T17:14:46Z

The was a bug in the way we handled chunk spanning the transition; fixed now.

bboreham

A few thoughts.

bboreham · 2018-07-19T14:06:23Z

cmd/ingester/main.go

 	}

-	chunkStore, err := chunk.NewStore(chunkStoreConfig, schemaConfig, storageClient)
+	chunkStore, err := chunk.NewCompositeStore(chunkStoreConfig, schemaConfig, storageClient)


ISTM that you could have left the exported name as NewStore() to minimise impact elsewhere.

Good idea, done.

bboreham · 2018-07-19T14:13:56Z

pkg/chunk/chunk_store.go

 // Put implements ChunkStore
 func (c *store) Put(ctx context.Context, chunks []Chunk) error {
+	for _, chunk := range chunks {
+		if err := c.PutOne(ctx, chunk.From, chunk.Through, chunk); err != nil {


This seems like rather a large change in performance characteristics, if you have a number of chunks to write.

bboreham · 2018-07-19T14:15:22Z

pkg/chunk/chunk_store_test.go


+var schemas = []struct {
+	name              string
+	fn                func(cfg SchemaConfig) Schema


I was expecting storeFn to come in this commit, even though they would all be the same at this point.

cboggs · 2018-07-20T13:47:59Z

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

tomwilkie · 2018-07-25T16:57:11Z

Seems to me that this schema should also allow (or lay the groundwork for) successful queries without invariant metric names... am I thinking on the right path there?

It could, but still doesn't avoid the problem of hotspotting the "name" row. Although it would reduce the load on that row by 12x, so might be doable at this point.

tomwilkie · 2018-07-25T17:33:21Z

This seems like rather a large change in performance characteristics, if you have a number of chunks to write.

Yes, it potentially is. I don't think it will effect normal operation, as I think we only flush one chunk at once normally - we've been running this for a week or so, haven't notices anything. But may effect shutdown flushes.

I think the correct solution is to potentially split out the ChunkStore (get and put chunks by ID) and ChunkIndex (write entries and find entries by matchers), something that is starting to happen with the seriesStore anyway. Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

tomwilkie · 2018-07-25T17:45:23Z

Do we really need to write another cache?

How many do we have? I see the chunk caches (which are fundamentally different things), a vendored prometheus treecache (something to do with zookeeper) and some caches in k8s/client-go. I guess the grpc connection pool is a cache, but thats a different beast too. We also cache a single result in the chunk iterator, but thats just a single result.

I'm not actually aware of any other caches (like this) in use in the codebase. We could use github.com/bluele/gcache, but I was never a fan of that library.

OTOH, not clear using a heap for evictions if the best idea; we could thread a link list through the entries and reorder that to get LRU.

tomwilkie · 2018-07-25T17:52:00Z

I've tidied up this PR so it should be more reviewable. I see two things left todo: figure out what do to with chunks on store boundaries (try and bring back the batching) and accurately record cardinality or rows for multi-day queries.

Let me know if you get a chance to take a look @bboreham @cboggs @gouthamve.

bboreham · 2018-07-26T15:40:19Z

By "we" I meant the Go community.

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

bboreham · 2018-07-26T15:51:49Z

I think we only flush one chunk at once normally

If the flush queue gets very large (and we've had it in the millions many times) this assumption breaks down.

Then the composite store can write the index entries and the chunks once, in a batch. WDYT?

I wrote some thoughts at #684 (comment)

csmarchbanks · 2018-08-13T15:24:13Z

pkg/chunk/chunk_store.go

Would it make more sense to PutChunks then calculate all the index entries and write them at once? Might alleviate the performance characteristic changes Bryan commented about

csmarchbanks · 2018-08-13T20:16:39Z

pkg/chunk/schema_config.go

typo, data should be date

csmarchbanks · 2018-08-13T21:18:23Z

pkg/chunk/series_store.go

+	level.Debug(log).Log("Chunk IDs", len(chunkIDs))
+
+	// Protect ourselves against OOMing.
+	if len(chunkIDs) > c.cfg.QueryChunkLimit {


Would it make more sense to do this after filtering out the chunks by time?

csmarchbanks · 2018-08-13T21:55:05Z

I also have some concerns about putting one chunk vs many chunks, and added a comment with a possible idea. I like the idea of splitting ChunkStore and ChunkIndex, or some of Bryan's comments, but perhaps not as part of this PR

Promote the composite schema abstraction to "composite chunk store" - a chunk store which delegates to different chunk stores based on time. This allows us to vary the store implementation over time, and not just the schema. This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks. Corner case when a writing chunks which span multiple stores: they are written to both stores, and instead of using the chunk start/end we use the schema start/end. This will lead to duplication of the index entries on schema migrations, but is actually already the case for day boundaries anyway. It will lead to duplicate writes of the chunk on schema migrations - they should be deduped by the underlying store. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

We should index the series, not chunks; this will reduce the number of entries in the index by `replication factor * (chunk size / bucket size)`, or 3 * 6hrs / 24hrs - ie 12x. This will however mean we need another index from series to chunks, introducing 1 extra write an N extra reads per query. Expectation is a reduction in query latency (and bigtable query usage, and memory cost) by 12x, and then an increase by 2x as we have to do a bunch of queries. This change introduces the seriesStore, a new chunk store implementation that, combined with the v9 schema, indexes series not chunks. I tried to adapt the original chunk store to support this style of indexing - easy on the write path, but the read path became even more of a rats nest. So I factored out the common bits as best I could and made a new chunk store. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Tidy up some of the logging. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

tomwilkie · 2018-08-16T11:01:10Z

I've used github.com/patrickmn/go-cache successfully in https://github.com/weaveworks/scope, to replace github.com/bluele/gcache for performance reasons.

I've looked at go-cache, it still uses a background goroutine to periodically expunge entries from the cache. AFAICT this means the cache can grow without bounds between these periods, and it also locks the entire cache to do this. The heap cache isn't ideal, I'm going to update it to use a simple FIFO list for evictions, but I think its better that go-cache.

fifoCache is a simple string -> interface{} cache which uses a fifo to manage evictions. O(1) inserts, updates and gets. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

Firstly, cache the length of index rows we query (by the hash and range key). Secondly, fail for rows with > 100k, either because the cache told us so, or because we read them. Finally, allow matchers to fail on cardinality errors but proceed with the query (as long as at least one matcher succeeds), and then filter results. Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

csmarchbanks

I am approving this since it has some nice changes. If anyone is worried about the performance characteristic change they can leave it in v6 until those are updated.

tomwilkie mentioned this pull request Jul 12, 2018

Composite chunk store #877

Closed

tomwilkie force-pushed the schema-v9 branch 6 times, most recently from 2572bf4 to bcd0443 Compare July 12, 2018 15:18

tomwilkie changed the title ~~[WIP] v9 Schema: Index series, not chunks~~ v9 Schema: Index series, not chunks Jul 12, 2018

tomwilkie force-pushed the schema-v9 branch from b315974 to 9248f0d Compare July 13, 2018 09:42

tomwilkie mentioned this pull request Jul 16, 2018

Reduce duplication when writing #607

Closed

This was referenced Jul 16, 2018

Don't query very high cardinality labels #884

Closed

Don't query high cardinality labels. #886

Closed

bboreham reviewed Jul 19, 2018

View reviewed changes

tomwilkie force-pushed the schema-v9 branch 2 times, most recently from 420ed55 to 01c790c Compare July 25, 2018 16:56

tomwilkie force-pushed the schema-v9 branch 5 times, most recently from 4c6a07b to 5c2e17d Compare July 25, 2018 17:20

tomwilkie changed the title ~~v9 Schema: Index series, not chunks~~ Index series, not chunks Jul 25, 2018

tomwilkie force-pushed the schema-v9 branch from 5c2e17d to c15a682 Compare July 25, 2018 17:29

tomwilkie force-pushed the schema-v9 branch from c15a682 to 3c7bebd Compare July 25, 2018 17:48

csmarchbanks reviewed Aug 13, 2018

View reviewed changes

pkg/chunk/schema_config.go Outdated

Copy link

Contributor

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, data should be date

tomwilkie reacted with thumbs up emoji

csmarchbanks reviewed Aug 13, 2018

View reviewed changes

tomwilkie added 2 commits August 16, 2018 11:55

tomwilkie force-pushed the schema-v9 branch from 1fb2d05 to 43c3817 Compare August 16, 2018 10:55

tomwilkie added 3 commits August 16, 2018 13:18

Add fifoCache, a cache that uses a fifo linked list for evictions.

5d50cd2

fifoCache is a simple string -> interface{} cache which uses a fifo to manage evictions. O(1) inserts, updates and gets. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

Review feedback.

281c33f

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

tomwilkie force-pushed the schema-v9 branch from b9afb89 to 281c33f Compare August 16, 2018 12:21

Name and finish spans properly

6276249

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

csmarchbanks approved these changes Aug 16, 2018

View reviewed changes

tomwilkie merged commit 2f1e56b into cortexproject:master Aug 23, 2018

tomwilkie deleted the schema-v9 branch August 23, 2018 10:28

Conversation

tomwilkie commented Jul 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomwilkie commented Jul 12, 2018

Uh oh!

tomwilkie commented Jul 16, 2018

Uh oh!

tomwilkie commented Jul 17, 2018

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham Jul 19, 2018

Choose a reason for hiding this comment

Uh oh!

tomwilkie Jul 25, 2018

Choose a reason for hiding this comment

Uh oh!

bboreham Jul 19, 2018

Choose a reason for hiding this comment

Uh oh!

bboreham Jul 19, 2018

Choose a reason for hiding this comment

Uh oh!

cboggs commented Jul 20, 2018

Uh oh!

tomwilkie commented Jul 25, 2018

Uh oh!

tomwilkie commented Jul 25, 2018

Uh oh!

tomwilkie commented Jul 25, 2018

Uh oh!

tomwilkie commented Jul 25, 2018

Uh oh!

bboreham commented Jul 26, 2018

Uh oh!

bboreham commented Jul 26, 2018

Uh oh!

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

csmarchbanks Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

csmarchbanks commented Aug 13, 2018

Uh oh!

tomwilkie commented Aug 16, 2018

Uh oh!

csmarchbanks left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tomwilkie commented Jul 11, 2018 •

edited

Loading