Skip to content

hutaobo/HistoSeg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

173 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HistoSeg

PyPI Docs Publish to PyPI License: PolyForm Noncommercial 1.0.0

HistoSeg: StructureMap-guided semantic contours and robust 3D tissue reconstruction via Signed Distance Fields (SDFs).

HistoSeg is centered on same-sample, multi-slice 3D Xenium contour reconstruction. It converts selected or curated tissue structure groups, including groups defined or audited with sfplot Search-and-Find/StructureMap relationships, into continuous semantic contours. Those named 2D contours then become aligned contour stacks, sampled 3D points, smoothed PLY/OBJ meshes, QC metrics, interactive HTML views, and SDF-based gene-structure measurements. The package also includes the H&E and 2D contour workflows needed to prepare and review the structures that feed the 3D reconstruction pipeline.

HistoSeg is organized around its 3D reconstruction surface, with two supporting analysis groups:

  • 3D Reconstruction (histoseg.threed) for same-sample, multi-slice Xenium contour alignment, 3D contour stacks, mesh export, and QC visualization.
  • 2D Contour Analysis (histoseg.contour) for StructureMap-guided semantic contour extraction from spatial/cell-coordinate data, including Pattern1 isolines and multi-structure Xenium exports.
  • H&E Analysis (histoseg.he) for image-based H&E tissue segmentation, neutral tissue partitioning, and aligned-image change detection.

Full documentation: histoseg.readthedocs.io

When To Use Each Feature Group

Use 3D Reconstruction when you are preparing for multi-slice Xenium contour reconstruction from the same sample. It can soft-align a hard-aligned moving contour GeoJSON to a fixed reference slice, or build a pyXenium-backed multi-slice contour stack with 3D points, smoothed PLY/OBJ surface meshes, and an interactive HTML view.

Use Contour Analysis when your input is spatial cell-coordinate data such as Xenium cells.parquet plus cluster assignments, and you want geometry extracted from cell neighborhoods or selected cluster groups. The selected groups can be curated directly or informed by sfplot Search-and-Find / cophenetic StructureMap relationships.

Use HE Analysis when your input is an H&E image such as PNG, JPG, TIFF, or GeoTIFF and you want masks, overlays, heatmaps, GeoJSON polygons, or region tables.

Installation

pip install -U histoseg

For flagship 3D Xenium stack reconstruction, install the pyXenium-backed 3D extra:

pip install -U "histoseg[threed]"

For reproducible static 3D figure rendering and local documentation builds:

pip install -U "histoseg[threed,viz,docs]"

For development:

git clone https://github.com/hutaobo/HistoSeg.git
cd HistoSeg
pip install -U pip
pip install -e ".[threed,he]"

Conda environments and Docker targets are provided for reproducible runs:

conda env create -f environment.yml
conda env create -f environment-viz.yml

docker build --target core -t histoseg:core .
docker build --target viz -t histoseg:viz .

The core Docker target is for CPU/headless 3D analysis. The viz target adds Mesa/Xvfb and PyVista for static documentation or paper figure rendering.

For local Hugging Face MedSAM-backed HE segmentation only:

pip install -U "histoseg[he]"

Flagship 3D Reconstruction Quickstart

from histoseg.threed import (
    ThreeDContourReconstructionConfig,
    run_3d_contour_reconstruction,
)

cfg = ThreeDContourReconstructionConfig(
    fixed_geojson="slice_01.geojson",
    moving_hard_aligned_geojson="slice_02_hard_aligned_to_01.geojson",
    out_dir="outputs/3d_soft_alignment",
    group_property="structure",
    diagnostic_structure="Structure 5",
)

result = run_3d_contour_reconstruction(cfg)
print(result.soft_aligned_geojson)
print(result.diagnostic_report_png)
histoseg-3d reconstruct \
  --fixed-geojson slice_01.geojson \
  --moving-hard-aligned-geojson slice_02_hard_aligned_to_01.geojson \
  --out-dir outputs/3d_soft_alignment \
  --group-property structure \
  --diagnostic-structure "Structure 5"

For a complete same-sample Xenium stack:

histoseg-3d reconstruct-stack \
  --xenium-root polyp \
  --segmentation-strategy "polyp/contour for alignment/segmentationstrategy.txt" \
  --merged-h5ad "polyp/pdc_merge_leiden/polyp_32samples_processed_leiden.h5ad" \
  --merged-cluster-column leiden_1_0 \
  --out-dir outputs/polyp_3d_reconstruction \
  --z-spacing-um 5 \
  --mesh-smoothing-sigma-um 40 \
  --mesh-export-formats ply,obj

3D Reconstruction writes aligned per-slice GeoJSON files, pairwise alignment metrics, sampled 3D contour points, per-structure PLY/OBJ meshes, mesh QC metrics, and an interactive Plotly HTML visualization. Use --mesh-smoothing-sigma-um 0 to disable 3D smoothing for direct Marching Cubes output. The stack CLI defaults to --registration-backend auto, which compares the standard semantic contour seed with a label-free cross-group seed and keeps labels unchanged when cross-named contour groups provide the best geometric anchor.

Render an aligned 3D cell cloud as a browser-shareable Plotly HTML:

histoseg-3d render-cell-cloud \
  --stack-root outputs/polyp_3d_reconstruction \
  --aligned-cells-parquet outputs/polyp_3d_reconstruction/aligned_leiden_3d_cells.parquet \
  --out-html outputs/polyp_3d_reconstruction/leiden_3d_cells.html \
  --label-column leiden_1_0 \
  --max-points 300000

If the aligned cell table does not exist yet, render-cell-cloud can project a merged AnnData first by replacing --aligned-cells-parquet with --h5ad and --out-parquet.

For local inspection of many small gland-like contour components, render a per-gland QC atlas from an existing aligned stack:

histoseg-3d render-gland-qc-atlas \
  --stack-root outputs/polyp_3d_reconstruction \
  --aligned-cells-parquet outputs/polyp_3d_reconstruction/aligned_leiden_3d_cells.parquet \
  --out-dir outputs/polyp_3d_reconstruction/gland_qc \
  --structures "Structure 3" "Structure 4" \
  --max-gland-pages 250

The atlas assigns cross-slice gland_id values from aligned contour components and writes gland_qc_atlas.html, per-gland local 3D zoom pages, and CSV QC tables for slice continuity review. The CSV index is always full; use --max-gland-pages to render only the highest-priority local HTML pages first.

For lumen-seeded gland/crypt instance segmentation and cross-slice tracking:

histoseg-3d detect-gland-instances \
  --stack-root outputs/polyp_3d_reconstruction \
  --aligned-cells-parquet outputs/polyp_3d_reconstruction/aligned_leiden_3d_cells.parquet \
  --out-dir outputs/polyp_3d_reconstruction/gland_instances \
  --epithelial-structures "Structure 3" "Structure 4" \
  --markers EPCAM MUC2 LGR5 OLFM4 MKI67

The tracker uses one-to-one Hungarian assignment by default. Add --allow-many-to-many only for exploratory branch/merge review; candidate two-to-one links are reported in CSV/HTML QC outputs.

HE Analysis Quickstart

from histoseg.he import HESegmentationConfig, run_he_segmentation

result = run_he_segmentation(
    HESegmentationConfig(
        image="/path/to/he.png",
        out_dir="outputs/he_all_elements",
        task="all_elements",
        backend="heuristic",
        n_components=6,
    )
)

print(result.overlay_png)
print(result.geojson)
histoseg-he all-elements \
  --image /path/to/he.png \
  --out-dir outputs/he_all_elements \
  --backend heuristic

HE Analysis currently supports:

  • single: tissue foreground extraction, or user-prompted region extraction from boxes/points
  • all_elements: neutral tissue component partitioning (component_1, component_2, ...)
  • change: aligned before/after H&E change detection

Contour Analysis Quickstart

from histoseg.contour import Pattern1IsolineConfig, run_pattern1_isoline

cfg = Pattern1IsolineConfig(
    clusters_csv="/path/to/clusters.csv",
    cells_parquet="/path/to/cells.parquet",
    tissue_boundary_csv="/path/to/tissue_boundary.csv",
    out_dir="outputs/pattern1_isoline0p5",
    pattern1_clusters=(10, 23, 19, 27, 14, 20, 25, 26),
)

result = run_pattern1_isoline(cfg)
print(result.preview_png)
print(len(result.contours))

For Xenium transcript-defined niches such as GREM1-positive regions:

from histoseg.contour import GeneTranscriptIsolineConfig, run_gene_transcript_isoline

result = run_gene_transcript_isoline(
    GeneTranscriptIsolineConfig(
        xenium_root="/path/to/polyp",
        out_dir="outputs/gene_isolines",
        genes=("GREM1",),
        sample_glob="A079-C-008_*",
    )
)
print(result.run_log_csv)
histoseg-contour pattern1 \
  --clusters-csv clusters.csv \
  --cells-parquet cells.parquet \
  --out-dir outputs/pattern1 \
  --pattern1-clusters 10,23,19

histoseg-contour gene-isoline \
  --xenium-root polyp \
  --sample-glob "A079-C-008_*" \
  --genes GREM1,COL1A1 \
  --out-dir outputs/gene_isolines

Contour Analysis currently supports:

  • StructureMap-guided semantic contour synthesis from selected or curated structure groups
  • Pattern1 isoline contour generation from clustered cell coordinates
  • gene/transcript isoline contour generation from Xenium transcript tables
  • multi-structure contour partitioning
  • Xenium Explorer annotation exports
  • Hugging Face dataset helper workflows for Xenium-style inputs

Outputs

HistoSeg workflows write reviewable artifacts such as:

  • aligned 3D contour stacks and sampled 3D point clouds
  • PLY/OBJ surface meshes and mesh QC summaries
  • interactive Plotly 3D HTML views
  • PNG previews and overlays
  • label maps and heatmaps
  • GeoJSON polygons
  • CSV/Parquet region or contour tables
  • params.json and metrics.json provenance files

Documentation

Scientific Foundation & Reproducibility

HistoSeg's methods are documented as an implementation-faithful manuscript draft in Online Methods: Semantic Contours, SDF Quantification And Topology-Aware Alignment. The draft describes the full pipeline from sfplot Search-and-Find/StructureMap relationships to HistoSeg semantic isoline contours, topology-aware stack alignment, and the exact anisotropic SDF contract used by the package: scipy.ndimage.distance_transform_edt(..., sampling=(z_um, y_um, x_um)), negative distances inside structure masks, and positive distances outside.

The manuscript figure roadmap is maintained in docs/manuscripts/figure_plan.md. It defines the proof target for workflow infrastructure, SDF robustness, and the 32-slice polyp biological discovery case. Users can recreate the visualization-oriented outputs locally with:

pip install -U "histoseg[threed,viz]"

or with the reproducible viz environment:

conda env create -f environment-viz.yml
docker build --target viz -t histoseg:viz .

See reproducibility/README.md for the paper environment, tutorial artifact map, and alignment-hash provenance checks. The local wrapper python reproducibility/run_paper_pipeline.py regenerates the paper-facing cell-cloud HTML and spatial-module clustermaps from the validated 32-slice polyp paths and writes reproducibility/results_manifest.json.

Citation placeholder for the HistoSeg methods paper:

@article{histoseg_method_paper,
  title   = {HistoSeg: robust 3D reconstruction of tissue architecture from multi-slice spatial transcriptomics},
  author  = {Hu, Taobo and HistoSeg contributors},
  journal = {Manuscript in preparation},
  year    = {2026},
  note    = {Zenodo DOI pending}
}

License

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Academic and other noncommercial use is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for details.

Packages

 
 
 

Contributors

Generated from hutaobo/sfplot