[WIP] Use BigTable Columns as range keys by gouthamve · Pull Request #753 · cortexproject/cortex

gouthamve · 2018-03-16T11:13:38Z

@tomwilkie Still have TODOs that I want to tackle in refactorings. Also I used SHA256 to hash, let me know if that's okay.

One curious thing to check is the prefix queries, I haven't found anything other than regex to fix that.

/cc @bboreham

This change is

See cortexproject#714 for motivation. One thing to notice is that unlike existing schemas, here the structure of the data being stored changed. This caused some backwards incompatible changes as the `StorageClient` interface doesn't have any time information in the `QueryPages` making it impossible to know if we need to range over the row-key or the columns to retrieve the data. For maintaining backwards compat, use `chunk.storage-client=gcpv1`. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

tomwilkie · 2018-03-16T12:15:17Z

Thanks Goutham! Will take a look tomorrow.

bboreham · 2018-03-16T12:36:15Z

@gouthamve please give the motivation for this change.

gouthamve · 2018-03-16T14:22:34Z

Sorry, explained everything in the commit message but forgot to post here too.

Motivation comes from: #714 This would mean accessing lesser number of rows per query.

tomwilkie · 2018-03-17T11:12:38Z

pkg/chunk/gcp/fixtures.go

 	srv  *bttest.Server
 	name string
+
+	version int


I'm generally less of a fan of "version" fields, and more of a fan of feature flags, mainly because feature fields are more self-explanatory.

tomwilkie · 2018-03-17T11:18:09Z

pkg/chunk/gcp/storage_client.go

-	// this format, so we need to do a proper migration.
-	rowKey := hashValue + separator + string(rangeValue)
+	hasher := sha256.New()
+	hasher.Write([]byte(hashValue))


SHA256 is pretty slow, and we don't need its cryptographic properties. I'd have probably picked FNV (fast and pretty uniform), and then append the row key to final hashed value to guarantee uniqueness.

tomwilkie · 2018-03-17T11:21:14Z

pkg/chunk/gcp/storage_client.go

-func (s *storageClient) NewWriteBatch() chunk.WriteBatch {
-	return bigtableWriteBatch{
+func (s *storageClientV2) NewWriteBatch() chunk.WriteBatch {
+	return bigtableWriteBatchV2{


Do we really need two implementations? Or can we abstract away row and column key generation into a set of functions that switch off the flag?

tomwilkie · 2018-03-17T11:25:03Z

pkg/chunk/gcp/storage_client.go

+
 	if len(query.RangeValuePrefix) > 0 {
-		rowRange = bigtable.PrefixRange(query.HashValue + separator + string(query.RangeValuePrefix))
+		rOpts = append(rOpts, bigtable.RowFilter(bigtable.ColumnFilter(string(query.RangeValuePrefix)+".*"))) // TODO: Check again and anchor.


Hmm yeah this is not particularly desirable. @mbrukman do you know of a better way?

@garye, what are your thoughts on this?

I'm a bit confused, this is doing a ColumnFilter with a value prefix?

Yep. We want to filter by prefix.

I was just confused because this is setting up a filter on column names, but the RangeValuePrefix implies that the filter should be used for values instead.

This is just a Cortex-ism; our datamodel at this level is super simple, consisting of three-tuples of (hash value, range value, 'cell' value). We always specify the hash value (mapped to the row key in bigtable), want to do range queries over the range value, and equality filtering on the 'cell' value.

Sorry for the confusion.

Makes sense, thanks for the explanation!

I'm not sure there's a significantly better way to do this... It's probably worth doing this filtering on the bigtable side but, if the amount of data that would get filtered out by this ColumnFilter is small, consider measuring against doing it all client-side.

I wonder, would it be valid to apply the same trick @bcotton added to the row ranges; use range query from prefix to prefix+null ?

tomwilkie · 2018-03-17T11:26:12Z

pkg/chunk/gcp/storage_client.go

-	}, bigtable.RowFilter(bigtable.FamilyFilter(columnFamily)))
+	hasher := sha256.New()
+	hasher.Write([]byte(query.HashValue))
+	hashValue := string(hasher.Sum(nil))


Would be good to share this with the code to generate the row key for writes.

tomwilkie · 2018-03-17T11:33:11Z

pkg/chunk/gcp/storage_client.go

+
+	for i := range val {
+		val[i].Column = strings.TrimPrefix(val[i].Column, columnFamily+":")
+		// TODO: Hacky hacky ^


You could push this into the readbatch.RangeValue function (if you want to).

tomwilkie · 2018-03-17T11:41:44Z

pkg/chunk/storage/storage_client_test.go

+			Name:             c.batch[0].tableName,
+			ProvisionedRead:  100000,
+			ProvisionedWrite: 100000,
+		}))


Not a huge fan of using the mock store to get the expected result; I think I'd prefer having the test case spell out explicitly what we expect to return.

Hmm, it actually increases the test verbosity by a fair share. I'd rather add test-cases to increase the confidence in the in-mem version, but it's your call.

tomwilkie · 2018-03-17T11:44:00Z

Done a first pass; let me know when you've addressed the comments.

Also, we should test this out and get some performance results before we merge, its not guaranteed to be an improvement.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

gouthamve · 2018-03-17T15:03:36Z

@tomwilkie PTAL. I've addressed most of the comments.

I've made it a single implementation as much as possible, but the QueryPages implementation differs enough to warrant a new type, I guess. I am not a big fan of queryFn as I think it hurts readability, for example, if it is:

func QueryPages() {
  return sc.queryFn()
}

Where is this queryFn passed? Do we want to have large fn definitions in the struct initialiser?

For the tests, I've done testing using the in-mem/Mock implementation for TSDB where the mock has tests to make sure it works as intended. I can do that, or just test the implementation independently. Your call.

bboreham · 2018-04-03T09:44:06Z

What is the status of this PR?

tomwilkie · 2018-04-03T10:44:45Z

We're testing it this week.

…

On Tue, Apr 3, 2018 at 10:44 AM Bryan Boreham ***@***.***> wrote: What is the status of this PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#753 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbGhdTmZ6Vy2IuiPtRrK-sPIco3K9Nxks5tk0RpgaJpZM4StnrB> .

bboreham · 2018-04-24T09:57:23Z

How did the testing go?

csmarchbanks · 2018-04-30T21:27:03Z

Looking through this it appears the change is not backwards compatible right now? Did I miss something, or are there plans to use a -from flag?

Also, if you need some more testing done, FreshTracks would be happy to help out!

tomwilkie · 2018-05-31T11:56:11Z

We dropped the ball on this: We did some testing, and couldn't explain a difference (increase) in size in the bigtables. Its still a WIP.

tomwilkie · 2018-06-16T16:14:09Z

This is block on pushing the schema switching up a layer (to the chunk store) so we can have a migration path, as discussed on a recent community call.

gouthamve added 2 commits March 16, 2018 16:08

Hash the hashvalue in BigTable for better dist.

9730567

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

tomwilkie reviewed Mar 17, 2018

View reviewed changes

gouthamve force-pushed the cortex-714 branch from f3f4438 to 250c16f Compare March 17, 2018 12:27

Address feedback

bfbfdac

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

gouthamve force-pushed the cortex-714 branch from 250c16f to bfbfdac Compare March 17, 2018 14:54

tomwilkie mentioned this pull request Apr 1, 2018

Move DynamoDB index tests to cover other storage backends, fix bunch of Cassandra issues. #783

Merged

tomwilkie changed the title ~~Use BigTable Columns as range keys~~ [WIP] Use BigTable Columns as range keys Jun 16, 2018

This was referenced Jul 12, 2018

Composite chunk store #877

Closed

Index series, not chunks #875

Merged

tomwilkie mentioned this pull request Aug 23, 2018

Use BigTable Columns as range keys #946

Merged

tomwilkie closed this Aug 23, 2018

Conversation

gouthamve commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomwilkie commented Mar 16, 2018

Uh oh!

bboreham commented Mar 16, 2018

Uh oh!

gouthamve commented Mar 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomwilkie Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomwilkie commented Mar 17, 2018

Uh oh!

gouthamve commented Mar 17, 2018

Uh oh!

bboreham commented Apr 3, 2018

Uh oh!

tomwilkie commented Apr 3, 2018 via email

Uh oh!

bboreham commented Apr 24, 2018

Uh oh!

csmarchbanks commented Apr 30, 2018

Uh oh!

tomwilkie commented May 31, 2018

Uh oh!

tomwilkie commented Jun 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gouthamve commented Mar 16, 2018 •

edited

Loading

tomwilkie Mar 17, 2018 •

edited

Loading