Introduce config to cache segments locally before uncompressing#4302
Introduce config to cache segments locally before uncompressing#4302niketh wants to merge 2 commits intoapache:masterfrom
Conversation
|
Just a general comment overall to discuss, but I'm starting to dislike having so many 'optional' features and increasingly adding more configuration. It is just becoming difficult to maintain and understand what is important to turn on when. I think Druid should strive to have less configuration, not more. |
|
@fjy I agree. But in this specific case, S3DataSegmentPuller always downloads to a temp directory before it can uncompress. This behaviour might not be required for everyone |
0eb3f1d to
3b9880b
Compare
| |`druid.segmentCache.announceIntervalMillis`|How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing.|5000 (5 seconds)| | ||
| |`druid.segmentCache.numLoadingThreads`|How many segments to load concurrently from from deep storage.|1| | ||
| |`druid.segmentCache.numBootstrapThreads`|How many segments to load concurrently from local storage at startup.|1| | ||
| |`druid.segmentCache.cacheSegmentsLocally`|Download segments to temp before uncompressing.|false| |
There was a problem hiding this comment.
I'm not sure what this means or when I would want to set it, which I think means it should be written more clearly.
|
@gianm Is the documentation clearer now? |
|
I don't think the API for load spec and data puller should change. This assumes all pullers will always require a copy and a processing of the data, which is not true. It is possible to have a load spec that wires up data without needing to copy/process the data in the way that the s3 puller does currently. As such caching is a puller property, not a LoadSpec property. |
|
I agree with @drcrallen if we add this config, it should be through adding a Config object to the S3DataSegmentPuller's constructor without any change to the interfaces. Alternatively, taking what @fjy said (which was parrotted by @himanshug internally), we could just eliminate the download-first strategy altogether. @drcrallen you guys use the S3 puller, would you be against eliminating download-first altogether? |
|
Closed in favour of #4364 |
No description provided.