Skip to content

simpsondl/argus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Argus

Ultra-fast exhaustive CRISPR off-target search engine

Version License CUDA C++

Argus delivers 10-800x speedup with GPU acceleration, searching 3.1 billion bases in seconds.

FeaturesQuick StartPerformanceDocumentationCitation


Features

Core Capabilities

  • Blazing Fast: Search entire human genome in 0.5-23 seconds (GPU) vs hours (traditional)
  • 🎯 100% Sensitivity: Exhaustive search with no heuristics—never misses an off-target
  • 📊 Rich Annotations: 23 biological features including CFD activity scores, MIT specificity, seed disruptions, PAM analysis
  • 🔄 Batch Processing: Screen 100+ guides simultaneously with near-linear scaling
  • 💾 Memory Efficient: <2 GB GPU VRAM for human genome at any batch size
  • 🧬 Flexible: Supports any genome, edit distance threshold, PAM filter, or strand
  • 🔬 RNA-Ready: Accepts uracil (U) in spacers with automatic U→T normalization
  • 📈 Summary Mode: Quick aggregated counts by distance without per-hit details

Technical Highlights

  • Myers' Bit-Parallel Algorithm: 64-way parallelism per thread, GPU-optimized
  • Sparse Output Optimization: Only transfers hits (~0.00001% of data), eliminating I/O bottleneck
  • Memory-Mapped I/O: Instant genome loading via .argus binary index format
  • Zero-Copy Architecture: GPU accesses genome via unified memory, no explicit transfers
  • Graceful Validation: Per-spacer error handling prevents single bad input from crashing batch

Quick Start

1. Set up environment

# Create conda environment
conda create -n argus python=3.10 -y
conda activate argus

# Install dependencies
conda install -c conda-forge cmake compilers gxx=11 spdlog catch2 -y
pip install pandas numpy matplotlib scikit-learn biopython pysam

2. Build

cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j$(nproc)

3. Test installation

# Run unit tests
ctest --test-dir build --output-on-failure

# GPU safety check
./tests/gpu_safety_harness.sh

4. 30-Second Demo

# Index a test genome
./build/src/argus --index-genome tests/data/synthetic/genome.fa

# Search for off-targets (single guide)
./build/src/argus \
  --genome tests/data/synthetic/genome.fa.argus \
  --pattern GAGTCCGAGCAGAAGAAGAA \
  --threshold 3 \
  --format tsv

# Expected: <1 second, finds all sites within 3 mismatches

Performance

Human Genome (hg38) Benchmarks:

Spacers CPU Time GPU Time Speedup
1 guide 67.9s 0.51s 133x
10 guides 70.5s 1.24s 57x
100 guides 240s 8.77s 27x
280 guides ~5 hours 23.3s 815x

Key Metrics:

  • Throughput: 12 spacers/second (large batches)
  • Memory: <2 GB GPU VRAM for any batch size
  • Speedup vs Traditional Tools: >3,000x (vs cas-offinder)

See Benchmark Results for comprehensive analysis.


Real-World Example

# Step 1: Index human genome (one-time, ~30 seconds)
./build/src/argus --index-genome hg38.fa

# Step 2: Screen CRISPR library (100 guides)
./build/src/argus \
  --genome hg38.fa.argus \
  --spacer-file my_guides.txt \
  --threshold 3 \
  --pam-filter both \
  --output offtargets.tsv

# Step 3: Analyze results
# Output includes: chromosome, position, edit distance, 
# seed disruptions, PAM type, alignment details

Use Cases:

  • ✅ Guide specificity screening (pre-synthesis)
  • ✅ Library design validation (100s-1000s of guides)
  • ✅ Therapeutic guide safety assessment
  • ✅ Multiplex editing optimization
  • ✅ Off-target prediction for GUIDE-seq/CIRCLE-seq validation


Documentation

User Guides

Developer Resources

Quick Links


System Requirements

Minimum

  • OS: Linux (Ubuntu 20.04+), macOS (x86_64 or ARM64 with Rosetta)
  • CPU: x86_64 with AVX2 support
  • Memory: 8 GB RAM
  • Storage: 10 GB for software + genome indices

Recommended

  • GPU: NVIDIA GPU with CUDA compute capability ≥7.0 (RTX 2000+, A100, V100)
  • CUDA: Toolkit 11.8 or later
  • Memory: 16+ GB RAM (32 GB for large genomes)
  • Storage: NVMe SSD for optimal genome loading

Command-Line Reference

Basic Usage

# Single spacer search
argus --genome GENOME.argus --pattern SEQUENCE --threshold N

# Batch search (multiple spacers)
argus --genome GENOME.argus --spacer-file GUIDES.txt --threshold N

Common Options

Option Description Default
--threshold N Maximum edit distance (0-255) Required
--format FMT Output format: tsv, bed tsv
--pam-filter FILTER PAM filter: none, ngg, nag, both, any none
--strand STRAND Search strand: plus, minus, both both
--forward-only Skip reverse-complement search Off
--treat-u-as-t Normalize uracil to thymine in spacers On
--summary Output aggregated hit counts by distance Off
--summary-format FMT Summary format: json, tsv json
--max-hits N Limit output per spacer Unlimited
--no-compute-mismatches Skip alignment annotations (faster) Compute
--output FILE Output file stdout

See User Guide for complete reference.


Output Format

Argus produces tab-separated output with 19 biological annotation columns:

spacer  chrom   start   end     pattern distance  strand  aligned_seq  mismatch_pos  ...
VEGFA   chr6    43737006 43737026 GAGTCCGAGCAG... 2  +  GAGTCCAAGCAG...  7  ...

Key Columns:

  • distance - Edit distance (Levenshtein)
  • pam_type - NGG, NAG, OTHER, or INCOMPLETE
  • seed_edits - Disruptions in seed region (positions 1-10)
  • distal_edits - Disruptions in distal region (positions 11-20)
  • alignment_ambiguous - Multiple optimal alignments exist

See Output Formats for complete description.


Citation

If you use Argus in your research, please cite:

@software{argus2026,
  author = {Simpson, Danny},
  title = {Argus: Ultra-fast exhaustive CRISPR off-target search with GPU acceleration},
  year = {2026},
  version = {0.4.0},
  url = {https://github.com/simpsondl/argus}
}

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Areas of Interest:

  • Multi-GPU support for larger genomes
  • Additional nuclease PAM filters (Cpf1, etc.)
  • Python/R API bindings
  • Web interface for cloud deployment
  • Performance optimizations

License

This project is released under the MIT License.


Acknowledgments

  • Myers' Algorithm: Eugene W. Myers (1999) - Bit-parallel approximate string matching
  • GPU Architecture: NVIDIA CUDA team
  • Testing: GeCKO CRISPR library for benchmark spacers
  • Community: Early adopters and beta testers

Contact

About

Ultra-fast exhaustive CRISPR off-target search engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors