Skip to content

simpleshinobu/UGRN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UGRN: Towards Universal Gene Regulatory Network Inference

Official PyTorch implementation of "Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models", accepted to ICML 2026.


Overview

Gene Regulatory Network (GRN) inference from single-cell RNA sequencing data remains a fundamental challenge in computational biology. Existing methods typically operate in a closed-world setting: a specialized model is optimized on a fixed gene set and struggles to generalize to unseen genes or heterogeneous datasets due to dimension mismatches.

UGRN introduces a universal, transfer-learning-based framework that leverages frozen single-cell foundation models (scFMs) for generalizable feature extraction. A lightweight downstream "translator" $f_\phi$ is then trained to predict regulatory interactions, enabling seamless generalization to open-world scenarios with unseen genes and cross-dataset transfer.

UGRN Teaser

Figure 1. (a) Traditional GRN inference operates in a closed-world setting, where optimized $f_\theta$ struggles with dimension mismatches on unseen genes from heterogeneous datasets. (b) Our UGRN setting utilizes frozen scFMs for universal feature extraction, enabling the generalization of regulatory predictions by "translator" $f_\phi$ to open-world scenarios involving unseen genes and datasets.


Method

UGRN features three complementary feature extraction strategies:

Mode Description
Embedding Concatenated gene embeddings from the frozen scFM
Perturbation In-silico expression perturbation to probe model response
Gradient (Ours) Integrated gradients for directional regulatory signals

The full UGRN method ensembles perturbation and gradient features via weighted late fusion, achieving state-of-the-art cross-dataset generalization on the GENELink benchmark.


Installation

git clone https://github.com/simpleshinobu/UGRN.git
cd UGRN
pip install -r requirements.txt

Requirements

  • Python >= 3.9
  • PyTorch >= 2.0.0
  • See requirements.txt for full dependencies

Data & Model Preparation

1. Pre-trained Single-cell Foundation Model

Download the pre-trained model checkpoint and place under ./checkpoints/:

mkdir -p checkpoints
# Download model.pt and args.json

Download: Google Drive

Expected structure:

checkpoints/
├── model.pt
└── args.json

vocab.json is included in this repository and will be loaded automatically.

2. GENELink Benchmark Dataset

Download and extract to ./data/:

mkdir -p data
# Download Benchmark Dataset.zip and extract to data/

Download: Google Drive

Expected structure:

data/
└── GENELink/
    └── Dataset/
        └── Benchmark Dataset/
            ├── STRING Dataset/
            │   └── hESC/TFs+500/
            │       ├── Label.csv
            │       ├── TF.csv
            │       ├── Target.csv
            │       └── BL--ExpressionData.csv
            ├── Non-Specific Dataset/
            ├── Lofgof Dataset/
            └── Specific Dataset/

Quick Start

Transfer Base (Single Feature Mode)

Run with perturbation features (default):

python transfer_base.py \
  --model_ckpt checkpoints/model.pt \
  --genelink_root "data/GENELink/Dataset/Benchmark Dataset" \
  --train_source STRING:hESC \
  --save_root results/transfer_base

Run with embedding features:

python transfer_base.py \
  --model_ckpt checkpoints/model.pt \
  --genelink_root "data/GENELink/Dataset/Benchmark Dataset" \
  --feature_type embedding \
  --train_source STRING:hESC \
  --save_root results/transfer_base_emb

Transfer Ours (Full Ensemble)

Run the complete UGRN ensemble (perturbation + gradient):

python transfer_ours.py \
  --model_ckpt checkpoints/model.pt \
  --genelink_root "data/GENELink/Dataset/Benchmark Dataset" \
  --train_source STRING:hESC \
  --save_root results/transfer_ours

Run individual modes:

# Perturbation only
python transfer_ours.py \
  --model_ckpt checkpoints/model.pt \
  --genelink_root "data/GENELink/Dataset/Benchmark Dataset" \
  --mode perturbation \
  --train_source STRING:hESC \
  --save_root results/transfer_ours_pert

# Gradient only
python transfer_ours.py \
  --model_ckpt checkpoints/model.pt \
  --genelink_root "data/GENELink/Dataset/Benchmark Dataset" \
  --mode grad \
  --train_source STRING:hESC \
  --save_root results/transfer_ours_grad

Key Arguments

Argument Description Default
--model_ckpt Path to pre-trained model checkpoint ./checkpoints/model.pt
--genelink_root Path to GENELink dataset root ./data/GENELink/Dataset/Benchmark Dataset
--train_source Training source as Network:Dataset STRING:hESC
--feature_type Feature type (embedding / perturbation / combined) Script-dependent
--mode Modes for transfer_ours (perturbation / grad) perturbation grad
--runs Number of random runs 3
--seed Random seed 42
--save_root Directory to save results results/...

Repository Structure

UGRN/
├── ugrn/                      # Core package
│   ├── __init__.py
│   ├── model.py               # scModel: single-cell foundation model wrapper
│   ├── vocab.py               # GeneVocab: gene token vocabulary
│   ├── dataset.py             # GENELink data loading utilities
│   └── config.py              # Dataset and experiment configurations
├── transfer_base.py           # Baseline: embedding / perturbation features
├── transfer_ours.py           # UGRN: perturbation + gradient ensemble
├── vocab.json                 # Gene vocabulary
├── requirements.txt
└── README.md

Citation

If you find this work useful, please consider citing:

@inproceedings{qi2026ugrn,
  title={Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models},
  author={Qi, Jiaxin and Li, Hang and Cui, Yan and Zheng, Yuhua and Huang, Jianqiang},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}

License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages