Potential idea: add PIF format with CIGAR-less features #5224

cmdcolin · 2025-11-15T00:19:23Z

Showing whole-genome overviews can be slow because large amounts of CIGAR data and other tags can slow down the data fetching and display code.

This proposes making our make-pif format strip the CIGAR string from a separate "prefix areas" of the PIF format

Background

The PIF format is a special tabix file with the data sorted by both query and target coordinates. It already has two "prefix areas" of the tabix file:

prefix q: query by query genome coordinates
prefix t: query by target genome coordinates

This PR

This PR makes two new prefixes

prefix a: query by query genome coordinates, without CIGAR
prefix b: query by target genome coordinates, stripped CIGAR

The hope would be fast whole genome overviews, plotted without CIGAR, that can be zoomed in to show CIGAR at arbitrary zoom levels. that are relatively faithful to the data

Alternatives

An alternative idea would be to use "reduced CIGAR" where it preserves large deletions and insertions relative to your zoom level but this is sort of hard to do because the CIGAR intricately maps to a specific coordinates in a way that makes you iterate through the whole thing

I think potentially the concept of tracepoints (Gene Myers blog https://dazzlerblog.wordpress.com/2015/11/05/trace-points/) could be an alternative but I don't fully understand them yet. Some pangenome people have been making tooling around tracepoints https://github.com/AndreaGuarracino/lib_tracepoints

[skip ci] New PIF format with cigar-less prefixes

e85ab80

cmdcolin changed the title ~~Potential idea: add PIF format with CIGAR-less prefixes~~ Potential idea: add PIF format with CIGAR-less features Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential idea: add PIF format with CIGAR-less features #5224

Potential idea: add PIF format with CIGAR-less features #5224

Uh oh!

cmdcolin commented Nov 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Potential idea: add PIF format with CIGAR-less features #5224

Are you sure you want to change the base?

Potential idea: add PIF format with CIGAR-less features #5224

Uh oh!

Conversation

cmdcolin commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

This PR

Alternatives

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmdcolin commented Nov 15, 2025 •

edited

Loading