Skip to content

[Proposal] Consolidated segment metadata management #6849

@jihoonson

Description

@jihoonson

Motivation

Druid currently stores segment metadata in two places, i.e., metadata store and deep storage. In metadata store, segment metadata is stored in segments table. In deep storage, it's stored in descriptor.json file.

Druid core retrieves segment metadata only from the metadata store, and only insert-segment-to-db tool uses descriptor.json file to find segment files in deep storage.

However, storing metadata in two different places has several drawbacks.

  1. An additional effort is required to make sure that the same segment metadata is stored in metadata store and deep storage (S3DataSegmentPusher writes incomplete descriptor.json segment data to S3 #4170).
  2. Deep storage migration is complex because it needs to update metadata in both metadata store and deep storage.

insert-segment-to-db tool seems to be introduced for recovery when the metadata store is broken (#1861), but another approach should be employed to handle that kind of error, e.g., replicating metadata store.

Public Interfaces

DataSegmentFinder will be removed.

Proposed Changes

The segment metadata is stored only in the metadata store.

Compatibility, Deprecation, and Migration Plan

This is not a backward compatible change. descriptor.json file is no longer stored in deep storage. insert-segment-to-db tool will be removed.

Deep storage migration would become simpler since segment metadata needs to be updated in only metadata store.

Test Plan

N/A

Rejected Alternatives

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions