Affected Version
0.14.0-incubating
Description
There seems to be room for improvement in adding logic to distribute segments among multiple segment cache locations which could speed up overall I/O if the locations are backed by different physical drives. Right now, the algorithm seems to pick one location, fills it up to capacity, and then moves on to the next one.
An example test configuration:
druid.server.maxSize=35000000000
druid.segmentCache.locations=[{"path"\:"/mnt/var/druid/segment-cache","maxSize"\:"5000000000"}, {"path"\:"/mnt1/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt2/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt3/var/druid/segment-cache","maxSize"\:"10000000000"}]
Some point-in-time snapshots of disk usage:
t=1
/dev/nvme0n1 1.7T 79M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 3.9G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 77M 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 77M 1.7T 1% /mnt3
t=2
/dev/nvme0n1 1.7T 79M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.2G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 952M 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 77M 1.7T 1% /mnt3
t=3
/dev/nvme0n1 1.7T 79M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.2G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 5.4G 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 77M 1.7T 1% /mnt3
t=4
/dev/nvme0n1 1.7T 79M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.2G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 9.2G 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 937M 1.7T 1% /mnt3
t=5
/dev/nvme0n1 1.7T 79M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.2G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 9.2G 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 5.1G 1.7T 1% /mnt3
t=6
/dev/nvme0n1 1.7T 859M 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.2G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 9.2G 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 9.2G 1.7T 1% /mnt3
t=7 (steady state, all locations are 'full')
/dev/nvme0n1 1.7T 4.6G 1.7T 1% /mnt
/dev/nvme1n1 1.7T 9.3G 1.7T 1% /mnt1
/dev/nvme2n1 1.7T 9.2G 1.7T 1% /mnt2
/dev/nvme3n1 1.7T 9.2G 1.7T 1% /mnt3
Additionally, I'm seeing some misleading WARN level logs when it attempts to use a location that is at capacity and fails, after which it moves onto the next location:
2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt1/var/druid/segment-cache:268,514,257]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt2/var/druid/segment-cache:237,382,486]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://imply-awstest-druid/a336020b-9c30-4ecd-bd15-4cd4433998c3/segments/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1/index.zip] to outDir[/mnt3/var/druid/segment-cache/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1]
Affected Version
0.14.0-incubating
Description
There seems to be room for improvement in adding logic to distribute segments among multiple segment cache locations which could speed up overall I/O if the locations are backed by different physical drives. Right now, the algorithm seems to pick one location, fills it up to capacity, and then moves on to the next one.
An example test configuration:
Some point-in-time snapshots of disk usage:
Additionally, I'm seeing some misleading WARN level logs when it attempts to use a location that is at capacity and fails, after which it moves onto the next location: