Skip to content

Write only as many chunk bytes as needed#1044

Merged
bboreham merged 3 commits intomasterfrom
shrunk-chunks
Nov 30, 2018
Merged

Write only as many chunk bytes as needed#1044
bboreham merged 3 commits intomasterfrom
shrunk-chunks

Conversation

@bboreham
Copy link
Contributor

@bboreham bboreham commented Sep 30, 2018

Fixes #631

Note we had not previously edited the chunk code copy-pasted from Prometheus (v1).

Only fully implemented for varbit chunks at present; I don't think we're intending to use other kinds.

Varbit chunks hold metadata in a header and a footer at the end of the chunk, however the footer is only used while appending more samples, so can be discarded when the chunk is "closed" and written to store.

A flag is added to make the new behaviour optional on writing, since shorter chunks cannot be read by older versions of Cortex so if you roll back after turning it on, you lose the ability to query any data written with short chunks.

The shorter encoding is also used when transferring chunks from one ingester to another on rolling updates.

@tomwilkie
Copy link
Contributor

Would you mind adding some unit tests?

@bboreham
Copy link
Contributor Author

I should have given some thought to making this optional. Once you write some smaller chunks, you need the code change to read them back, so you'd need to be sure you weren't rolling back past this change.

@bboreham
Copy link
Contributor Author

Would you mind adding some unit tests?

The existing unit tests exercise this code. What extra were you thinking of?

@tomwilkie
Copy link
Contributor

The existing unit tests exercise this code.

Theres nothing in this package which exercises this code. The tests in pkg/chunk/ cover it, but aren't exactly looking for bugs in the chunk encoding - for instance, TestChunkCodec only puts a single sample in the chunk.

I think it would be good to have a test which appends an increasing number of samples to a chunk, encodes it via a Marshal, decodes it and then checks all the samples are there.

@tomwilkie
Copy link
Contributor

Also needs a DCO.

@tomwilkie
Copy link
Contributor

Note we had not previously edited the chunk code copy-pasted from Prometheus (v1).

#1029 edited the code here too.

@tomwilkie
Copy link
Contributor

I think it would be good to have a test which append..

I added such a test in #1048

@tomwilkie
Copy link
Contributor

@bboreham now that #1048 is merged, want me to update this?

@bboreham
Copy link
Contributor Author

Sure, if you like. I left it alone because I saw only a 5% overall reduction in storage size, but I still think it's worth doing if you have time.

- Only effective for varbit chunks at present.
- Also allow undersized varbit chunks to be unmarshalled.
- Use varbit encoding in chunk tests, since that's what we use most commonly in production
- Remove chunk.Unmarshal(io.Reader), its only used in tests.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie
Copy link
Contributor

There you go @bboreham; I ended up remove the Unmarshal func on chunks, as it was only used in tests.

@tomwilkie
Copy link
Contributor

(Its an LGTM from me, but I'll let you merge it @bboreham)

@bboreham
Copy link
Contributor Author

Current status is I believe I have tried this and I believe it isn't working, in that it still outputs a bunch of zeros at the end of the chunk.
Further research required.

@bboreham
Copy link
Contributor Author

I take it back: the code where I thought this had been used was turned off.

@bboreham
Copy link
Contributor Author

I did a more effective test. Everything worked when I had ingesters rolled forward and queriers to match, but when I tried to roll back the ingesters the chunk hand-over failed:

level=warn ts=2018-11-29T14:31:32.59474578Z caller=gokit.go:46 method=/cortex.Ingester/TransferChunks duration=25.357151ms err="insufficient bytes copied from buffer during unmarshaling, want 1024, got 155" msg="gRPC\n"

this is something I hadn't thought about: it's actually a nice efficiency gain for the old-style chunks to do the hand-over without padding, but you can't fail back to older ingesters.

So, what to do? Make it optional on the write path so we can update everything and then turn it on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done more like PreallocConfig in the client pkg? https://github.com/cortexproject/cortex/blob/master/pkg/ingester/client/timeseries.go#L13

@tomwilkie
Copy link
Contributor

One nit, otherwise LGTM

This allows all components to be rolled out in a mode which accepts
either size of chunk, then changed over to write the new way at a
later date.

Signed-off-by: Bryan Boreham <bryan@weave.works>
Also un-export MarshalLen since it is not called from outside.

Signed-off-by: Bryan Boreham <bryan@weave.works>
@tomwilkie
Copy link
Contributor

Nice, thanks! LGTM

@bboreham bboreham merged commit 3c868b8 into master Nov 30, 2018
@bboreham
Copy link
Contributor Author

This is a bit broken, because you need the flag on the reading side but I only plumbed it in to ingester.

What I wanted was that all components could read chunks of either style, and the flag changes what ingester writes.

@bboreham
Copy link
Contributor Author

bboreham commented Dec 2, 2018

Still broken - the chunks passed from one ingester to another in hand-over cannot be appended to.
Don't turn the feature on until this is fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop writing trailing zeros out to chunk store

2 participants