Skip to content

Broken feature: appending linearly partitioned segments into a hash partitioned datasource #9352

@jihoonson

Description

@jihoonson

Affected Version

0.16, 0.17, master

Description

Before 0.16, Druid used to allow you to create a datasource with the HashedPartitionsSpec and then run a task that appends to the datasource with a linear partitioning (using maxRowsPerSegment). This was possible because the segments created with HashedPartitionsSpec have the HashBasedNumberedShardSpec which extends NumberedShardSpec which in turn is used for linearly partitioned segments (see https://github.com/apache/druid/blob/0.15.1-incubating/server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java#L691-L700).

This feature was broken in #7547 and it is supposed to be a bug. However, I'm wondering we really want to support this in the future because of the below reasons.

  • Allowing mixed partitioning methods for one datasource is confusing and not very useful.
  • This feature introduces an ambiguous concept of the "core partitions". Only the hash partitioned datasource has the core partitions which is the set of segments created by the initial task. All segments in the core partitions should have the same HashBasedNumberedShardSpec, but other segments should have the NumberedShardSpec. In the timeline management, a hash partitioned datasource is regarded as visible in brokers if and only if all segments in the core partitions become available in historicals no matter how many segments are left in the non-core partitions. I think this concept is not that useful but makes things complicated.
  • This feature allows you to append only linearly partitioned segments to a hash partitioned datasource. Other combinations or directions are not allowed.
  • Finally, Append support for hash/range-partitioned segments #9241 was recently proposed which seems more promising.

I would like to promote #9241 rather than fixing this bug. Welcome any thoughts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions