Ultra-fast exhaustive CRISPR off-target search engine
Argus delivers 10-800x speedup with GPU acceleration, searching 3.1 billion bases in seconds.
Features • Quick Start • Performance • Documentation • Citation
- ⚡ Blazing Fast: Search entire human genome in 0.5-23 seconds (GPU) vs hours (traditional)
- 🎯 100% Sensitivity: Exhaustive search with no heuristics—never misses an off-target
- 📊 Rich Annotations: 23 biological features including CFD activity scores, MIT specificity, seed disruptions, PAM analysis
- 🔄 Batch Processing: Screen 100+ guides simultaneously with near-linear scaling
- 💾 Memory Efficient: <2 GB GPU VRAM for human genome at any batch size
- 🧬 Flexible: Supports any genome, edit distance threshold, PAM filter, or strand
- 🔬 RNA-Ready: Accepts uracil (U) in spacers with automatic U→T normalization
- 📈 Summary Mode: Quick aggregated counts by distance without per-hit details
- Myers' Bit-Parallel Algorithm: 64-way parallelism per thread, GPU-optimized
- Sparse Output Optimization: Only transfers hits (~0.00001% of data), eliminating I/O bottleneck
- Memory-Mapped I/O: Instant genome loading via
.argusbinary index format - Zero-Copy Architecture: GPU accesses genome via unified memory, no explicit transfers
- Graceful Validation: Per-spacer error handling prevents single bad input from crashing batch
# Create conda environment
conda create -n argus python=3.10 -y
conda activate argus
# Install dependencies
conda install -c conda-forge cmake compilers gxx=11 spdlog catch2 -y
pip install pandas numpy matplotlib scikit-learn biopython pysamcmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j$(nproc)# Run unit tests
ctest --test-dir build --output-on-failure
# GPU safety check
./tests/gpu_safety_harness.sh# Index a test genome
./build/src/argus --index-genome tests/data/synthetic/genome.fa
# Search for off-targets (single guide)
./build/src/argus \
--genome tests/data/synthetic/genome.fa.argus \
--pattern GAGTCCGAGCAGAAGAAGAA \
--threshold 3 \
--format tsv
# Expected: <1 second, finds all sites within 3 mismatchesHuman Genome (hg38) Benchmarks:
| Spacers | CPU Time | GPU Time | Speedup |
|---|---|---|---|
| 1 guide | 67.9s | 0.51s | 133x |
| 10 guides | 70.5s | 1.24s | 57x |
| 100 guides | 240s | 8.77s | 27x |
| 280 guides | ~5 hours | 23.3s | 815x |
Key Metrics:
- Throughput: 12 spacers/second (large batches)
- Memory: <2 GB GPU VRAM for any batch size
- Speedup vs Traditional Tools: >3,000x (vs cas-offinder)
See Benchmark Results for comprehensive analysis.
# Step 1: Index human genome (one-time, ~30 seconds)
./build/src/argus --index-genome hg38.fa
# Step 2: Screen CRISPR library (100 guides)
./build/src/argus \
--genome hg38.fa.argus \
--spacer-file my_guides.txt \
--threshold 3 \
--pam-filter both \
--output offtargets.tsv
# Step 3: Analyze results
# Output includes: chromosome, position, edit distance,
# seed disruptions, PAM type, alignment detailsUse Cases:
- ✅ Guide specificity screening (pre-synthesis)
- ✅ Library design validation (100s-1000s of guides)
- ✅ Therapeutic guide safety assessment
- ✅ Multiplex editing optimization
- ✅ Off-target prediction for GUIDE-seq/CIRCLE-seq validation
- User Guide - Complete installation, usage, and troubleshooting
- Benchmark Results - Performance analysis on hg38 genome
- Achievements Summary - Technical innovations for 3 audiences
- Testing Strategy - Test suite architecture and coverage
- Project Context - Architecture decisions and design rationale
- Active Tasks - Current development status and roadmap
- Installation Guide
- CLI Reference
- Output Format Documentation
- Performance Optimization Tips
- Troubleshooting
- OS: Linux (Ubuntu 20.04+), macOS (x86_64 or ARM64 with Rosetta)
- CPU: x86_64 with AVX2 support
- Memory: 8 GB RAM
- Storage: 10 GB for software + genome indices
- GPU: NVIDIA GPU with CUDA compute capability ≥7.0 (RTX 2000+, A100, V100)
- CUDA: Toolkit 11.8 or later
- Memory: 16+ GB RAM (32 GB for large genomes)
- Storage: NVMe SSD for optimal genome loading
# Single spacer search
argus --genome GENOME.argus --pattern SEQUENCE --threshold N
# Batch search (multiple spacers)
argus --genome GENOME.argus --spacer-file GUIDES.txt --threshold N| Option | Description | Default |
|---|---|---|
--threshold N |
Maximum edit distance (0-255) | Required |
--format FMT |
Output format: tsv, bed |
tsv |
--pam-filter FILTER |
PAM filter: none, ngg, nag, both, any |
none |
--strand STRAND |
Search strand: plus, minus, both |
both |
--forward-only |
Skip reverse-complement search | Off |
--treat-u-as-t |
Normalize uracil to thymine in spacers | On |
--summary |
Output aggregated hit counts by distance | Off |
--summary-format FMT |
Summary format: json, tsv |
json |
--max-hits N |
Limit output per spacer | Unlimited |
--no-compute-mismatches |
Skip alignment annotations (faster) | Compute |
--output FILE |
Output file | stdout |
See User Guide for complete reference.
Argus produces tab-separated output with 19 biological annotation columns:
spacer chrom start end pattern distance strand aligned_seq mismatch_pos ...
VEGFA chr6 43737006 43737026 GAGTCCGAGCAG... 2 + GAGTCCAAGCAG... 7 ...Key Columns:
distance- Edit distance (Levenshtein)pam_type- NGG, NAG, OTHER, or INCOMPLETEseed_edits- Disruptions in seed region (positions 1-10)distal_edits- Disruptions in distal region (positions 11-20)alignment_ambiguous- Multiple optimal alignments exist
See Output Formats for complete description.
If you use Argus in your research, please cite:
@software{argus2026,
author = {Simpson, Danny},
title = {Argus: Ultra-fast exhaustive CRISPR off-target search with GPU acceleration},
year = {2026},
version = {0.4.0},
url = {https://github.com/simpsondl/argus}
}Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Areas of Interest:
- Multi-GPU support for larger genomes
- Additional nuclease PAM filters (Cpf1, etc.)
- Python/R API bindings
- Web interface for cloud deployment
- Performance optimizations
This project is released under the MIT License.
- Myers' Algorithm: Eugene W. Myers (1999) - Bit-parallel approximate string matching
- GPU Architecture: NVIDIA CUDA team
- Testing: GeCKO CRISPR library for benchmark spacers
- Community: Early adopters and beta testers
- Author: Danny Simpson
- Issues: GitHub Issues
- Discussions: GitHub Discussions