Skip to content

add dump-segment tool mode to dump v10 segment metadata#18901

Merged
gianm merged 1 commit intoapache:masterfrom
clintropolis:v10-dump-segment-tool
Jan 13, 2026
Merged

add dump-segment tool mode to dump v10 segment metadata#18901
gianm merged 1 commit intoapache:masterfrom
clintropolis:v10-dump-segment-tool

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Jan 9, 2026

Description

Since there is no longer a meta.smoosh in v10 segments, this PR adds an option to the dump segment tool to show v10 metadata. As a convenience to make this easy to run, I have added a bin/dump-segment to druid packaging, so its really easy to run against segments if you have a druid installation handy. Output is just the serialized json of the stored metadata, and can be piped into jq. For example:

$ ./bin/dump-segment --dump metadata_v10 -d ~/workspace/data/druid/segmentsCache/wikipedia-v10-no-rollup_2016-06-27T00\:00\:00.000Z_2016-06-28T00\:00\:00.000Z_2025-12-31T22\:19\:27.478Z/druid.segment | jq .
{
  "containers": [
    {
      "startOffset": 0,
      "size": 7069778
    }
  ],
  "files": {
    "__base/__time": {
      "container": 0,
      "startOffset": 0,
      "size": 120208
    },
    "__base/added": {
      "container": 0,
      "startOffset": 4241057,
      "size": 14
    },
...

or like do fancy jq stuff like show biggest internal files or whatever

$ ./bin/dump-segment --dump metadata_v10 -d ~/workspace/data/druid/segmentsCache/wikip10-no-rollup_2016-06-27T00\:00\:00.000Z_2016-06-28T00\:00\:00.000Z_2025-12-31T22\:19\:27.478Z/druid.segment | jq '.files | to_entries | map({file:.key, size:.value.size}) | sort_by(.size) | reverse'
[
  {
    "file": "__base/diffUrl.__stringDictionary",
    "size": 1908699
  },
  {
    "file": "__base/comment.__stringDictionary",
    "size": 1001152
  },
  {
    "file": "__base/page.__stringDictionary",
    "size": 732776
  },
  {
    "file": "__base/diffUrl.__valueIndexes",
    "size": 635276
  },
  {
    "file": "__base/page.__valueIndexes",
    "size": 586076
  },
  {
...

@gianm gianm merged commit c199efa into apache:master Jan 13, 2026
101 of 109 checks passed
@clintropolis clintropolis deleted the v10-dump-segment-tool branch January 13, 2026 00:30
clintropolis added a commit to clintropolis/druid that referenced this pull request Jan 13, 2026
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants