Skip to content

Introduce config to cache segments locally before uncompressing#4302

Closed
niketh wants to merge 2 commits intoapache:masterfrom
niketh:cache-segments
Closed

Introduce config to cache segments locally before uncompressing#4302
niketh wants to merge 2 commits intoapache:masterfrom
niketh:cache-segments

Conversation

@niketh
Copy link
Copy Markdown
Contributor

@niketh niketh commented May 19, 2017

No description provided.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented May 19, 2017

Just a general comment overall to discuss, but I'm starting to dislike having so many 'optional' features and increasingly adding more configuration. It is just becoming difficult to maintain and understand what is important to turn on when. I think Druid should strive to have less configuration, not more.

@niketh
Copy link
Copy Markdown
Contributor Author

niketh commented May 19, 2017

@fjy I agree. But in this specific case, S3DataSegmentPuller always downloads to a temp directory before it can uncompress. This behaviour might not be required for everyone

@niketh niketh force-pushed the cache-segments branch 2 times, most recently from 0eb3f1d to 3b9880b Compare May 19, 2017 22:55
@niketh niketh closed this May 30, 2017
@niketh niketh reopened this May 30, 2017
|`druid.segmentCache.announceIntervalMillis`|How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing.|5000 (5 seconds)|
|`druid.segmentCache.numLoadingThreads`|How many segments to load concurrently from from deep storage.|1|
|`druid.segmentCache.numBootstrapThreads`|How many segments to load concurrently from local storage at startup.|1|
|`druid.segmentCache.cacheSegmentsLocally`|Download segments to temp before uncompressing.|false|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means or when I would want to set it, which I think means it should be written more clearly.

@niketh
Copy link
Copy Markdown
Contributor Author

niketh commented May 31, 2017

@gianm Is the documentation clearer now?

@drcrallen
Copy link
Copy Markdown
Contributor

drcrallen commented May 31, 2017

I don't think the API for load spec and data puller should change.

This assumes all pullers will always require a copy and a processing of the data, which is not true. It is possible to have a load spec that wires up data without needing to copy/process the data in the way that the s3 puller does currently.

As such caching is a puller property, not a LoadSpec property.

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jun 1, 2017

I agree with @drcrallen if we add this config, it should be through adding a Config object to the S3DataSegmentPuller's constructor without any change to the interfaces.

Alternatively, taking what @fjy said (which was parrotted by @himanshug internally), we could just eliminate the download-first strategy altogether. @drcrallen you guys use the S3 puller, would you be against eliminating download-first altogether?

@niketh
Copy link
Copy Markdown
Contributor Author

niketh commented Jun 5, 2017

Closed in favour of #4364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants