Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
22 changes: 11 additions & 11 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ build/
dist/
docs/_build/
docs/api/generated/
benchmarking/lr_2026_atera/data/*
!benchmarking/lr_2026_atera/data/.gitkeep
benchmarking/lr_2026_atera/logs/*
!benchmarking/lr_2026_atera/logs/.gitkeep
benchmarking/lr_2026_atera/results/*
!benchmarking/lr_2026_atera/results/.gitkeep
benchmarking/lr_2026_atera/reports/*
!benchmarking/lr_2026_atera/reports/.gitkeep
benchmarking/lr_2026_atera/runs/*
!benchmarking/lr_2026_atera/runs/.gitkeep
benchmarking/lr_2026_atera/pdc_collected/
benchmarking/cci_2026_atera/data/*
!benchmarking/cci_2026_atera/data/.gitkeep
benchmarking/cci_2026_atera/logs/*
!benchmarking/cci_2026_atera/logs/.gitkeep
benchmarking/cci_2026_atera/results/*
!benchmarking/cci_2026_atera/results/.gitkeep
benchmarking/cci_2026_atera/reports/*
!benchmarking/cci_2026_atera/reports/.gitkeep
benchmarking/cci_2026_atera/runs/*
!benchmarking/cci_2026_atera/runs/.gitkeep
benchmarking/cci_2026_atera/pdc_collected/
48 changes: 42 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<h1 align="center">pyXenium</h1>

<p align="center">
Xenium I/O, multimodal analysis, topology workflows, contour-native spatial profiling, GMI inference, mechanostress analysis, and AI-driven spatial pathology handoff.
Xenium I/O, multimodal analysis, topology workflows, contour-native spatial profiling, GMI inference, mechanostress analysis, and optional external workflow bridges.
</p>

<p align="center">
Expand All @@ -28,21 +28,22 @@
<a href="https://github.com/hutaobo/pyXenium/releases">Releases</a>
</p>

pyXenium is a Python toolkit for **10x Genomics Xenium** with eight feature areas:
pyXenium is a Python toolkit for **10x Genomics Xenium** with nine feature areas:

- `pyXenium.io`: Xenium artifact loading, partial export recovery, SData I/O, and SpatialData-compatible export.
- `pyXenium.multimodal`: canonical RNA + protein loading, joint analysis, immune-resistance scoring, and packaged workflows.
- `pyXenium.ligand_receptor`: topology-native ligand-receptor analysis.
- `pyXenium.cci`: topology-native cell-cell interaction analysis.
- `pyXenium.pathway`: pathway topology analysis and pathway activity scoring.
- `pyXenium.contour`: contour import, contour expansion, and contour-aware density profiling around polygon annotations.
- `pyXenium.gmi`: contour-level GMI modeling for sparse main-effect and interaction discovery in spatial transcriptomics.
- `pyXenium.mechanostress`: morphology-derived mechanical stress states, including fibroblast axis strength, tumor-stroma growth patterning, and cell polarity.
- AI-Driven Spatial Pathologist via `spatho`: an optional external workflow layer for AI-driven spatial pathology, built on the Xenium data foundation provided by pyXenium's `XeniumSData` structure.
- `pyXenium.perturb`: SpatialPerturb Bridge for optional Perturb-seq reference projection onto Xenium tissue through the external `SpatialPerturb` package.

Legacy compatibility entry points under `pyXenium.analysis`, `pyXenium.validation`, and
`pyXenium.io.load_xenium_gene_protein(...)` remain importable, but new code should target the
canonical pyXenium namespaces above. The `spatho` workflow is installed and run separately; pyXenium
does not vendor it or add it as a runtime dependency.
canonical pyXenium namespaces above. The `spatho` and `SpatialPerturb` workflows are installed
and run separately; pyXenium does not vendor them or add them as core runtime dependencies.

## Release & Build

Expand Down Expand Up @@ -73,6 +74,12 @@ For documentation work:
pip install -e ".[docs]"
```

For the optional SpatialPerturb Bridge runtime on Python 3.9+:

```bash
pip install -e ".[perturb]"
```

## Quick examples

### Xenium I/O
Expand Down Expand Up @@ -166,11 +173,39 @@ spatho doctor --config workflow.json
spatho run --config workflow.json
```

In pyXenium, this is documented as the eighth feature area rather than a new package namespace.
In pyXenium, this is documented as an optional external workflow bridge rather than a new
`pyXenium.spatho` namespace.
The handoff is possible because `XeniumSData` keeps the cell table, transcript points,
cell/nucleus boundaries, H&E image metadata, and SpatialData-compatible organization together
for downstream tools.

### SpatialPerturb Bridge via SpatialPerturb

[`SpatialPerturb`](https://github.com/hutaobo/SpatialPerturb) is an external workflow package
for combining spatial transcriptomics with Perturb-seq references. pyXenium exposes a lightweight
`pyXenium.perturb` bridge that writes a handoff JSON and stable external CLI commands without
vendoring the SpatialPerturb algorithms.

```python
from pyXenium.perturb import SpatialPerturbBridgeConfig, write_spatialperturb_handoff

spec = write_spatialperturb_handoff(
SpatialPerturbBridgeConfig(
xenium_path="/path/to/Xenium_outs",
output_dir="spatialperturb_reports/breast_case_01",
cell_group_path="/path/to/cell_groups.csv",
roi_geojson_path="/path/to/xenium_explorer_annotations.geojson",
sample_name="breast_case_01",
),
"spatialperturb_bridge.json",
)
print(spec["command_text"]["run_reference_benchmark"])
```

SpatialPerturb Bridge scores mean Perturb-seq-derived program similarity projected onto Xenium
tissue. They do not mean the tissue cell contains the corresponding knockout, guide, or drug
perturbation.

## Documentation structure

The docs mirror the package surfaces, high-level workflows, and external handoffs:
Expand All @@ -180,6 +215,7 @@ The docs mirror the package surfaces, high-level workflows, and external handoff
- Workflows
- API Reference
- AI-Driven Spatial Pathologist via `spatho`
- SpatialPerturb Bridge via `SpatialPerturb`
- Changelog

Start here: [pyxenium.readthedocs.io](https://pyxenium.readthedocs.io/en/latest/)
Expand Down
85 changes: 85 additions & 0 deletions benchmarking/bmnet_pdc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# BM-Net/H&E morphology increment pilot on PDC

This scaffold runs a breast H&E morphology increment pilot for the aligned
Xenium RNA + H&E contour workflow. It is intentionally separate from the
published tutorial path so existing `run_contour_boundary_ecology_pilot`
behavior stays unchanged.

Default dataset:

```text
/cfs/klemming/scratch/h/hutaobo/topolink_cci_benchmark_2026-04/data/source_cache/breast/WTA_Preview_FFPE_Breast_Cancer_outs
```

Default BM-Net pilot root:

```text
/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04
```

## Backends

- `deterministic-smoke`: dependency-free BM-Net-like H&E proxy for validating
contour cropping, schema, artifacts, Slurm, and downstream increment tests.
This is not biological evidence.
- `bmnet-local`: loads a local checkpoint through a MobileNetV3-small BM-Net-like
head. Use this only when a compatible BM-Net checkpoint is available.
- `bmnet-like-trainable`: MobileNetV3-small + classifier head for local
training/fine-tuning experiments.
- `hf-pathology-backbone`: uses a Hugging Face pathology backbone such as
`1aurent/vit_small_patch8_224.lunit_dino` or `wisdomik/QuiltNet-B-32` as a
surrogate feature extractor and writes `pathology__...` features rather than
BM-Net probabilities.

## PDC workflow

From the staged repo on Dardel:

```bash
export PDC_ROOT=/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04
export PDC_XENIUM_ROOT=/cfs/klemming/scratch/h/hutaobo/topolink_cci_benchmark_2026-04/data/source_cache/breast/WTA_Preview_FFPE_Breast_Cancer_outs

bash benchmarking/bmnet_pdc/scripts/bootstrap_pdc_env.sh
bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh --backend deterministic-smoke --include-full
```

For a real BM-Net checkpoint:

```bash
bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh \
--backend bmnet-local \
--checkpoint /cfs/klemming/scratch/h/hutaobo/models/bmnet/bmnet.pt \
--include-full
```

For the Hugging Face surrogate backbone discovered during setup:

```bash
bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh \
--backend hf-pathology-backbone \
--hf-model 1aurent/vit_small_patch8_224.lunit_dino \
--smoke-max-contours 20
```

The smoke job limits the run to 50 contours by default. The full job is
submitted with an `afterok` dependency when `--include-full` is used.

## Outputs

Each run directory writes:

- `contour_features_with_bmnet.csv`
- `bmnet_patch_predictions.csv`
- `program_scores.csv`
- `xenium_native_morphology.csv`
- `he_morphology_features.csv`
- `feature_redundancy.csv`
- `incremental_prediction.csv`
- `partial_associations.csv`
- `matched_review_table.csv`
- `morphology_increment_summary.json`
- `bmnet_pdc_run_summary.json`

`morphology_increment_summary.json` includes `model_metadata` so downstream
reports can distinguish trained BM-Net, Hugging Face surrogate, and smoke-only
outputs.
34 changes: 34 additions & 0 deletions benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: pyx-bmnet
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- git
- numpy
- pandas
- scipy
- scikit-learn
- seaborn
- anndata
- scanpy
- pyarrow
- h5py
- click
- matplotlib
- shapely
- statsmodels
- tifffile
- imagecodecs
- zarr
- fsspec
- requests
- pyyaml
- aiohttp
- pillow
- pip:
- -e ../../..
- torch
- timm>=1.0
- transformers>=4.40
- huggingface_hub>=0.24
34 changes: 34 additions & 0 deletions benchmarking/bmnet_pdc/scripts/bootstrap_pdc_env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/usr/bin/env bash
set -euo pipefail

PDC_ROOT="${PDC_ROOT:-/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04}"
REPO_DIR="${REPO_DIR:-${PDC_ROOT}/repo}"
CONDA_PREFIX="${CONDA_PREFIX:-${PDC_ROOT}/conda/envs/pyx-bmnet}"
CONDA_PKGS_DIR="${CONDA_PKGS_DIR:-${PDC_ROOT}/conda/pkgs}"
LOG_DIR="${PDC_ROOT}/logs"

mkdir -p "${LOG_DIR}" "${PDC_ROOT}/conda/envs" "${CONDA_PKGS_DIR}" "${PDC_ROOT}/tmp"
exec > >(tee -a "${LOG_DIR}/bootstrap_pdc_env.log") 2>&1

echo "[bmnet-pdc] bootstrap started $(date -Is)"
echo "[bmnet-pdc] pdc_root=${PDC_ROOT}"
echo "[bmnet-pdc] repo_dir=${REPO_DIR}"
echo "[bmnet-pdc] conda_prefix=${CONDA_PREFIX}"

module load PDC/24.11
module load miniconda3/25.3.1-1-cpeGNU-24.11

export CONDA_PKGS_DIRS="${CONDA_PKGS_DIR}"
export TMPDIR="${PDC_ROOT}/tmp"

cd "${REPO_DIR}"

if [[ -d "${CONDA_PREFIX}" ]]; then
echo "[bmnet-pdc] updating conda prefix"
conda env update --prefix "${CONDA_PREFIX}" --file benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml --prune
else
echo "[bmnet-pdc] creating conda prefix"
conda env create --prefix "${CONDA_PREFIX}" --file benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml
fi

echo "[bmnet-pdc] bootstrap completed $(date -Is)"
79 changes: 79 additions & 0 deletions benchmarking/bmnet_pdc/scripts/run_bmnet_morphology_increment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env python
from __future__ import annotations

import argparse
import json
from pathlib import Path

from pyXenium.multimodal import run_bmnet_morphology_increment_pilot


def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Run BM-Net/H&E morphology increment pilot.")
parser.add_argument("--dataset-root", required=True)
parser.add_argument("--output-dir", required=True)
parser.add_argument("--contour-geojson", default=None)
parser.add_argument("--contour-key", default="s1_s5_contours")
parser.add_argument("--contour-id-key", default="polygon_id")
parser.add_argument("--contour-coordinate-space", default="xenium_pixel")
parser.add_argument("--contour-pixel-size-um", type=float, default=None)
parser.add_argument("--he-image-key", default="he")
parser.add_argument("--cells-parquet", default="cells.parquet")
parser.add_argument("--clusters-relpath", default="WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv")
parser.add_argument("--cluster-column-name", default="cluster")
parser.add_argument(
"--backend",
default="deterministic-smoke",
choices=["deterministic-smoke", "bmnet-local", "bmnet-like-trainable", "hf-pathology-backbone"],
)
parser.add_argument("--checkpoint", default=None)
parser.add_argument("--hf-model", default="1aurent/vit_small_patch8_224.lunit_dino")
parser.add_argument("--timm-architecture", default="mobilenetv3_small_100")
parser.add_argument("--timm-pretrained", action="store_true")
parser.add_argument("--max-contours", type=int, default=None)
parser.add_argument("--inner-rim-um", type=float, default=20.0)
parser.add_argument("--outer-rim-um", type=float, default=30.0)
parser.add_argument("--skip-pathomics", action="store_true")
parser.add_argument("--include-transcripts", action="store_true")
parser.add_argument("--program-library", default="breast_boundary_bmnet_v1")
parser.add_argument("--random-state", type=int, default=0)
parser.add_argument("--min-contours", type=int, default=8)
return parser


def main(argv: list[str] | None = None) -> int:
args = build_parser().parse_args(argv)
result = run_bmnet_morphology_increment_pilot(
dataset_root=args.dataset_root,
output_dir=args.output_dir,
contour_geojson=args.contour_geojson,
contour_key=args.contour_key,
contour_id_key=args.contour_id_key,
contour_coordinate_space=args.contour_coordinate_space,
contour_pixel_size_um=args.contour_pixel_size_um,
he_image_key=args.he_image_key,
cells_parquet=args.cells_parquet,
clusters_relpath=args.clusters_relpath,
cluster_column_name=args.cluster_column_name,
backend=args.backend,
checkpoint=args.checkpoint,
hf_model=args.hf_model,
timm_architecture=args.timm_architecture,
timm_pretrained=args.timm_pretrained,
max_contours=args.max_contours,
inner_rim_um=args.inner_rim_um,
outer_rim_um=args.outer_rim_um,
include_pathomics=not args.skip_pathomics,
include_transcripts=args.include_transcripts,
program_library=args.program_library,
random_state=args.random_state,
min_contours=args.min_contours,
)
summary = result["summary"]
print(json.dumps({"artifact_dir": result["artifact_dir"], "summary": summary}, indent=2, default=str))
summary_path = Path(summary["artifact_files"]["run_summary"])
return 0 if summary_path.exists() else 1


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading