Skip to content

Suboptimal usage of multiple segment cache locations #7641

@dclim

Description

@dclim

Affected Version

0.14.0-incubating

Description

There seems to be room for improvement in adding logic to distribute segments among multiple segment cache locations which could speed up overall I/O if the locations are backed by different physical drives. Right now, the algorithm seems to pick one location, fills it up to capacity, and then moves on to the next one.

An example test configuration:

druid.server.maxSize=35000000000
druid.segmentCache.locations=[{"path"\:"/mnt/var/druid/segment-cache","maxSize"\:"5000000000"}, {"path"\:"/mnt1/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt2/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt3/var/druid/segment-cache","maxSize"\:"10000000000"}]

Some point-in-time snapshots of disk usage:

t=1
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  3.9G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T   77M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=2
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  952M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=3
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  5.4G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=4
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  937M  1.7T   1% /mnt3

t=5
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  5.1G  1.7T   1% /mnt3

t=6
/dev/nvme0n1    1.7T  859M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

t=7 (steady state, all locations are 'full')
/dev/nvme0n1    1.7T  4.6G  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.3G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

Additionally, I'm seeing some misleading WARN level logs when it attempts to use a location that is at capacity and fails, after which it moves onto the next location:

2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt1/var/druid/segment-cache:268,514,257]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt2/var/druid/segment-cache:237,382,486]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://imply-awstest-druid/a336020b-9c30-4ecd-bd15-4cd4433998c3/segments/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1/index.zip] to outDir[/mnt3/var/druid/segment-cache/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions