hutaobo · hutaobo · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/.gitignore b/.gitignore
@@ -8,14 +8,14 @@ build/
 dist/
 docs/_build/
 docs/api/generated/
-benchmarking/lr_2026_atera/data/*
-!benchmarking/lr_2026_atera/data/.gitkeep
-benchmarking/lr_2026_atera/logs/*
-!benchmarking/lr_2026_atera/logs/.gitkeep
-benchmarking/lr_2026_atera/results/*
-!benchmarking/lr_2026_atera/results/.gitkeep
-benchmarking/lr_2026_atera/reports/*
-!benchmarking/lr_2026_atera/reports/.gitkeep
-benchmarking/lr_2026_atera/runs/*
-!benchmarking/lr_2026_atera/runs/.gitkeep
-benchmarking/lr_2026_atera/pdc_collected/
+benchmarking/cci_2026_atera/data/*
+!benchmarking/cci_2026_atera/data/.gitkeep
+benchmarking/cci_2026_atera/logs/*
+!benchmarking/cci_2026_atera/logs/.gitkeep
+benchmarking/cci_2026_atera/results/*
+!benchmarking/cci_2026_atera/results/.gitkeep
+benchmarking/cci_2026_atera/reports/*
+!benchmarking/cci_2026_atera/reports/.gitkeep
+benchmarking/cci_2026_atera/runs/*
+!benchmarking/cci_2026_atera/runs/.gitkeep
+benchmarking/cci_2026_atera/pdc_collected/
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 <h1 align="center">pyXenium</h1>
 
 <p align="center">
-  Xenium I/O, multimodal analysis, topology workflows, contour-native spatial profiling, GMI inference, mechanostress analysis, and AI-driven spatial pathology handoff.
+  Xenium I/O, multimodal analysis, topology workflows, contour-native spatial profiling, GMI inference, mechanostress analysis, and optional external workflow bridges.
 </p>
 
 <p align="center">
@@ -28,21 +28,22 @@
   <a href="https://github.com/hutaobo/pyXenium/releases">Releases</a>
 </p>
 
-pyXenium is a Python toolkit for **10x Genomics Xenium** with eight feature areas:
+pyXenium is a Python toolkit for **10x Genomics Xenium** with nine feature areas:
 
 - `pyXenium.io`: Xenium artifact loading, partial export recovery, SData I/O, and SpatialData-compatible export.
 - `pyXenium.multimodal`: canonical RNA + protein loading, joint analysis, immune-resistance scoring, and packaged workflows.
-- `pyXenium.ligand_receptor`: topology-native ligand-receptor analysis.
+- `pyXenium.cci`: topology-native cell-cell interaction analysis.
 - `pyXenium.pathway`: pathway topology analysis and pathway activity scoring.
 - `pyXenium.contour`: contour import, contour expansion, and contour-aware density profiling around polygon annotations.
 - `pyXenium.gmi`: contour-level GMI modeling for sparse main-effect and interaction discovery in spatial transcriptomics.
 - `pyXenium.mechanostress`: morphology-derived mechanical stress states, including fibroblast axis strength, tumor-stroma growth patterning, and cell polarity.
 - AI-Driven Spatial Pathologist via `spatho`: an optional external workflow layer for AI-driven spatial pathology, built on the Xenium data foundation provided by pyXenium's `XeniumSData` structure.
+- `pyXenium.perturb`: SpatialPerturb Bridge for optional Perturb-seq reference projection onto Xenium tissue through the external `SpatialPerturb` package.
 
 Legacy compatibility entry points under `pyXenium.analysis`, `pyXenium.validation`, and
 `pyXenium.io.load_xenium_gene_protein(...)` remain importable, but new code should target the
-canonical pyXenium namespaces above. The `spatho` workflow is installed and run separately; pyXenium
-does not vendor it or add it as a runtime dependency.
+canonical pyXenium namespaces above. The `spatho` and `SpatialPerturb` workflows are installed
+and run separately; pyXenium does not vendor them or add them as core runtime dependencies.
 
 ## Release & Build
 
@@ -73,6 +74,12 @@ For documentation work:
 pip install -e ".[docs]"
 ```
 
+For the optional SpatialPerturb Bridge runtime on Python 3.9+:
+
+```bash
+pip install -e ".[perturb]"
+```
+
 ## Quick examples
 
 ### Xenium I/O
@@ -166,11 +173,39 @@ spatho doctor --config workflow.json
 spatho run --config workflow.json
 ```
 
-In pyXenium, this is documented as the eighth feature area rather than a new package namespace.
+In pyXenium, this is documented as an optional external workflow bridge rather than a new
+`pyXenium.spatho` namespace.
 The handoff is possible because `XeniumSData` keeps the cell table, transcript points,
 cell/nucleus boundaries, H&E image metadata, and SpatialData-compatible organization together
 for downstream tools.
 
+### SpatialPerturb Bridge via SpatialPerturb
+
+[`SpatialPerturb`](https://github.com/hutaobo/SpatialPerturb) is an external workflow package
+for combining spatial transcriptomics with Perturb-seq references. pyXenium exposes a lightweight
+`pyXenium.perturb` bridge that writes a handoff JSON and stable external CLI commands without
+vendoring the SpatialPerturb algorithms.
+
+```python
+from pyXenium.perturb import SpatialPerturbBridgeConfig, write_spatialperturb_handoff
+
+spec = write_spatialperturb_handoff(
+    SpatialPerturbBridgeConfig(
+        xenium_path="/path/to/Xenium_outs",
+        output_dir="spatialperturb_reports/breast_case_01",
+        cell_group_path="/path/to/cell_groups.csv",
+        roi_geojson_path="/path/to/xenium_explorer_annotations.geojson",
+        sample_name="breast_case_01",
+    ),
+    "spatialperturb_bridge.json",
+)
+print(spec["command_text"]["run_reference_benchmark"])
+```
+
+SpatialPerturb Bridge scores mean Perturb-seq-derived program similarity projected onto Xenium
+tissue. They do not mean the tissue cell contains the corresponding knockout, guide, or drug
+perturbation.
+
 ## Documentation structure
 
 The docs mirror the package surfaces, high-level workflows, and external handoffs:
@@ -180,6 +215,7 @@ The docs mirror the package surfaces, high-level workflows, and external handoff
 - Workflows
 - API Reference
 - AI-Driven Spatial Pathologist via `spatho`
+- SpatialPerturb Bridge via `SpatialPerturb`
 - Changelog
 
 Start here: [pyxenium.readthedocs.io](https://pyxenium.readthedocs.io/en/latest/)

diff --git a/benchmarking/bmnet_pdc/README.md b/benchmarking/bmnet_pdc/README.md
@@ -0,0 +1,85 @@
+# BM-Net/H&E morphology increment pilot on PDC
+
+This scaffold runs a breast H&E morphology increment pilot for the aligned
+Xenium RNA + H&E contour workflow. It is intentionally separate from the
+published tutorial path so existing `run_contour_boundary_ecology_pilot`
+behavior stays unchanged.
+
+Default dataset:
+
+```text
+/cfs/klemming/scratch/h/hutaobo/topolink_cci_benchmark_2026-04/data/source_cache/breast/WTA_Preview_FFPE_Breast_Cancer_outs
+```
+
+Default BM-Net pilot root:
+
+```text
+/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04
+```
+
+## Backends
+
+- `deterministic-smoke`: dependency-free BM-Net-like H&E proxy for validating
+  contour cropping, schema, artifacts, Slurm, and downstream increment tests.
+  This is not biological evidence.
+- `bmnet-local`: loads a local checkpoint through a MobileNetV3-small BM-Net-like
+  head. Use this only when a compatible BM-Net checkpoint is available.
+- `bmnet-like-trainable`: MobileNetV3-small + classifier head for local
+  training/fine-tuning experiments.
+- `hf-pathology-backbone`: uses a Hugging Face pathology backbone such as
+  `1aurent/vit_small_patch8_224.lunit_dino` or `wisdomik/QuiltNet-B-32` as a
+  surrogate feature extractor and writes `pathology__...` features rather than
+  BM-Net probabilities.
+
+## PDC workflow
+
+From the staged repo on Dardel:
+
+```bash
+export PDC_ROOT=/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04
+export PDC_XENIUM_ROOT=/cfs/klemming/scratch/h/hutaobo/topolink_cci_benchmark_2026-04/data/source_cache/breast/WTA_Preview_FFPE_Breast_Cancer_outs
+
+bash benchmarking/bmnet_pdc/scripts/bootstrap_pdc_env.sh
+bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh --backend deterministic-smoke --include-full
+```
+
+For a real BM-Net checkpoint:
+
+```bash
+bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh \
+  --backend bmnet-local \
+  --checkpoint /cfs/klemming/scratch/h/hutaobo/models/bmnet/bmnet.pt \
+  --include-full
+```
+
+For the Hugging Face surrogate backbone discovered during setup:
+
+```bash
+bash benchmarking/bmnet_pdc/scripts/submit_pdc_bmnet_pilot.sh \
+  --backend hf-pathology-backbone \
+  --hf-model 1aurent/vit_small_patch8_224.lunit_dino \
+  --smoke-max-contours 20
+```
+
+The smoke job limits the run to 50 contours by default. The full job is
+submitted with an `afterok` dependency when `--include-full` is used.
+
+## Outputs
+
+Each run directory writes:
+
+- `contour_features_with_bmnet.csv`
+- `bmnet_patch_predictions.csv`
+- `program_scores.csv`
+- `xenium_native_morphology.csv`
+- `he_morphology_features.csv`
+- `feature_redundancy.csv`
+- `incremental_prediction.csv`
+- `partial_associations.csv`
+- `matched_review_table.csv`
+- `morphology_increment_summary.json`
+- `bmnet_pdc_run_summary.json`
+
+`morphology_increment_summary.json` includes `model_metadata` so downstream
+reports can distinguish trained BM-Net, Hugging Face surrogate, and smoke-only
+outputs.
diff --git a/benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml b/benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml
@@ -0,0 +1,34 @@
+name: pyx-bmnet
+channels:
+  - conda-forge
+dependencies:
+  - python=3.11
+  - pip
+  - git
+  - numpy
+  - pandas
+  - scipy
+  - scikit-learn
+  - seaborn
+  - anndata
+  - scanpy
+  - pyarrow
+  - h5py
+  - click
+  - matplotlib
+  - shapely
+  - statsmodels
+  - tifffile
+  - imagecodecs
+  - zarr
+  - fsspec
+  - requests
+  - pyyaml
+  - aiohttp
+  - pillow
+  - pip:
+      - -e ../../..
+      - torch
+      - timm>=1.0
+      - transformers>=4.40
+      - huggingface_hub>=0.24
diff --git a/benchmarking/bmnet_pdc/scripts/bootstrap_pdc_env.sh b/benchmarking/bmnet_pdc/scripts/bootstrap_pdc_env.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+PDC_ROOT="${PDC_ROOT:-/cfs/klemming/scratch/h/hutaobo/pyxenium_bmnet_morphology_2026-04}"
+REPO_DIR="${REPO_DIR:-${PDC_ROOT}/repo}"
+CONDA_PREFIX="${CONDA_PREFIX:-${PDC_ROOT}/conda/envs/pyx-bmnet}"
+CONDA_PKGS_DIR="${CONDA_PKGS_DIR:-${PDC_ROOT}/conda/pkgs}"
+LOG_DIR="${PDC_ROOT}/logs"
+
+mkdir -p "${LOG_DIR}" "${PDC_ROOT}/conda/envs" "${CONDA_PKGS_DIR}" "${PDC_ROOT}/tmp"
+exec > >(tee -a "${LOG_DIR}/bootstrap_pdc_env.log") 2>&1
+
+echo "[bmnet-pdc] bootstrap started $(date -Is)"
+echo "[bmnet-pdc] pdc_root=${PDC_ROOT}"
+echo "[bmnet-pdc] repo_dir=${REPO_DIR}"
+echo "[bmnet-pdc] conda_prefix=${CONDA_PREFIX}"
+
+module load PDC/24.11
+module load miniconda3/25.3.1-1-cpeGNU-24.11
+
+export CONDA_PKGS_DIRS="${CONDA_PKGS_DIR}"
+export TMPDIR="${PDC_ROOT}/tmp"
+
+cd "${REPO_DIR}"
+
+if [[ -d "${CONDA_PREFIX}" ]]; then
+  echo "[bmnet-pdc] updating conda prefix"
+  conda env update --prefix "${CONDA_PREFIX}" --file benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml --prune
+else
+  echo "[bmnet-pdc] creating conda prefix"
+  conda env create --prefix "${CONDA_PREFIX}" --file benchmarking/bmnet_pdc/envs/pyx-bmnet-pdc.yml
+fi
+
+echo "[bmnet-pdc] bootstrap completed $(date -Is)"
diff --git a/benchmarking/bmnet_pdc/scripts/run_bmnet_morphology_increment.py b/benchmarking/bmnet_pdc/scripts/run_bmnet_morphology_increment.py
@@ -0,0 +1,79 @@
+#!/usr/bin/env python
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+
+from pyXenium.multimodal import run_bmnet_morphology_increment_pilot
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Run BM-Net/H&E morphology increment pilot.")
+    parser.add_argument("--dataset-root", required=True)
+    parser.add_argument("--output-dir", required=True)
+    parser.add_argument("--contour-geojson", default=None)
+    parser.add_argument("--contour-key", default="s1_s5_contours")
+    parser.add_argument("--contour-id-key", default="polygon_id")
+    parser.add_argument("--contour-coordinate-space", default="xenium_pixel")
+    parser.add_argument("--contour-pixel-size-um", type=float, default=None)
+    parser.add_argument("--he-image-key", default="he")
+    parser.add_argument("--cells-parquet", default="cells.parquet")
+    parser.add_argument("--clusters-relpath", default="WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv")
+    parser.add_argument("--cluster-column-name", default="cluster")
+    parser.add_argument(
+        "--backend",
+        default="deterministic-smoke",
+        choices=["deterministic-smoke", "bmnet-local", "bmnet-like-trainable", "hf-pathology-backbone"],
+    )
+    parser.add_argument("--checkpoint", default=None)
+    parser.add_argument("--hf-model", default="1aurent/vit_small_patch8_224.lunit_dino")
+    parser.add_argument("--timm-architecture", default="mobilenetv3_small_100")
+    parser.add_argument("--timm-pretrained", action="store_true")
+    parser.add_argument("--max-contours", type=int, default=None)
+    parser.add_argument("--inner-rim-um", type=float, default=20.0)
+    parser.add_argument("--outer-rim-um", type=float, default=30.0)
+    parser.add_argument("--skip-pathomics", action="store_true")
+    parser.add_argument("--include-transcripts", action="store_true")
+    parser.add_argument("--program-library", default="breast_boundary_bmnet_v1")
+    parser.add_argument("--random-state", type=int, default=0)
+    parser.add_argument("--min-contours", type=int, default=8)
+    return parser
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = build_parser().parse_args(argv)
+    result = run_bmnet_morphology_increment_pilot(
+        dataset_root=args.dataset_root,
+        output_dir=args.output_dir,
+        contour_geojson=args.contour_geojson,
+        contour_key=args.contour_key,
+        contour_id_key=args.contour_id_key,
+        contour_coordinate_space=args.contour_coordinate_space,
+        contour_pixel_size_um=args.contour_pixel_size_um,
+        he_image_key=args.he_image_key,
+        cells_parquet=args.cells_parquet,
+        clusters_relpath=args.clusters_relpath,
+        cluster_column_name=args.cluster_column_name,
+        backend=args.backend,
+        checkpoint=args.checkpoint,
+        hf_model=args.hf_model,
+        timm_architecture=args.timm_architecture,
+        timm_pretrained=args.timm_pretrained,
+        max_contours=args.max_contours,
+        inner_rim_um=args.inner_rim_um,
+        outer_rim_um=args.outer_rim_um,
+        include_pathomics=not args.skip_pathomics,
+        include_transcripts=args.include_transcripts,
+        program_library=args.program_library,
+        random_state=args.random_state,
+        min_contours=args.min_contours,
+    )
+    summary = result["summary"]
+    print(json.dumps({"artifact_dir": result["artifact_dir"], "summary": summary}, indent=2, default=str))
+    summary_path = Path(summary["artifact_files"]["run_summary"])
+    return 0 if summary_path.exists() else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())