Argus

Ultra-fast exhaustive CRISPR off-target search engine

Argus delivers 10-800x speedup with GPU acceleration, searching 3.1 billion bases in seconds.

Features • Quick Start • Performance • Documentation • Citation

Features

Core Capabilities

⚡ Blazing Fast: Search entire human genome in 0.5-23 seconds (GPU) vs hours (traditional)
🎯 100% Sensitivity: Exhaustive search with no heuristics—never misses an off-target
📊 Rich Annotations: 23 biological features including CFD activity scores, MIT specificity, seed disruptions, PAM analysis
🔄 Batch Processing: Screen 100+ guides simultaneously with near-linear scaling
💾 Memory Efficient: <2 GB GPU VRAM for human genome at any batch size
🧬 Flexible: Supports any genome, edit distance threshold, PAM filter, or strand
🔬 RNA-Ready: Accepts uracil (U) in spacers with automatic U→T normalization
📈 Summary Mode: Quick aggregated counts by distance without per-hit details

Technical Highlights

Myers' Bit-Parallel Algorithm: 64-way parallelism per thread, GPU-optimized
Sparse Output Optimization: Only transfers hits (~0.00001% of data), eliminating I/O bottleneck
Memory-Mapped I/O: Instant genome loading via .argus binary index format
Zero-Copy Architecture: GPU accesses genome via unified memory, no explicit transfers
Graceful Validation: Per-spacer error handling prevents single bad input from crashing batch

Quick Start

1. Set up environment

# Create conda environment
conda create -n argus python=3.10 -y
conda activate argus

# Install dependencies
conda install -c conda-forge cmake compilers gxx=11 spdlog catch2 -y
pip install pandas numpy matplotlib scikit-learn biopython pysam

2. Build

cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j$(nproc)

3. Test installation

# Run unit tests
ctest --test-dir build --output-on-failure

# GPU safety check
./tests/gpu_safety_harness.sh

4. 30-Second Demo

# Index a test genome
./build/src/argus --index-genome tests/data/synthetic/genome.fa

# Search for off-targets (single guide)
./build/src/argus \
  --genome tests/data/synthetic/genome.fa.argus \
  --pattern GAGTCCGAGCAGAAGAAGAA \
  --threshold 3 \
  --format tsv

# Expected: <1 second, finds all sites within 3 mismatches

Performance

Human Genome (hg38) Benchmarks:

Spacers	CPU Time	GPU Time	Speedup
1 guide	67.9s	0.51s	133x
10 guides	70.5s	1.24s	57x
100 guides	240s	8.77s	27x
280 guides	~5 hours	23.3s	815x

Key Metrics:

Throughput: 12 spacers/second (large batches)
Memory: <2 GB GPU VRAM for any batch size
Speedup vs Traditional Tools: >3,000x (vs cas-offinder)

See Benchmark Results for comprehensive analysis.

Real-World Example

# Step 1: Index human genome (one-time, ~30 seconds)
./build/src/argus --index-genome hg38.fa

# Step 2: Screen CRISPR library (100 guides)
./build/src/argus \
  --genome hg38.fa.argus \
  --spacer-file my_guides.txt \
  --threshold 3 \
  --pam-filter both \
  --output offtargets.tsv

# Step 3: Analyze results
# Output includes: chromosome, position, edit distance, 
# seed disruptions, PAM type, alignment details

Use Cases:

✅ Guide specificity screening (pre-synthesis)
✅ Library design validation (100s-1000s of guides)
✅ Therapeutic guide safety assessment
✅ Multiplex editing optimization
✅ Off-target prediction for GUIDE-seq/CIRCLE-seq validation

Documentation

User Guides

User Guide - Complete installation, usage, and troubleshooting
Benchmark Results - Performance analysis on hg38 genome
Achievements Summary - Technical innovations for 3 audiences
Testing Strategy - Test suite architecture and coverage

Developer Resources

Project Context - Architecture decisions and design rationale
Active Tasks - Current development status and roadmap

Quick Links

System Requirements

Minimum

OS: Linux (Ubuntu 20.04+), macOS (x86_64 or ARM64 with Rosetta)
CPU: x86_64 with AVX2 support
Memory: 8 GB RAM
Storage: 10 GB for software + genome indices

Command-Line Reference

Basic Usage

# Single spacer search
argus --genome GENOME.argus --pattern SEQUENCE --threshold N

# Batch search (multiple spacers)
argus --genome GENOME.argus --spacer-file GUIDES.txt --threshold N

Common Options

Option	Description	Default
`--threshold N`	Maximum edit distance (0-255)	Required
`--format FMT`	Output format: `tsv`, `bed`	`tsv`
`--pam-filter FILTER`	PAM filter: `none`, `ngg`, `nag`, `both`, `any`	`none`
`--strand STRAND`	Search strand: `plus`, `minus`, `both`	`both`
`--forward-only`	Skip reverse-complement search	Off
`--treat-u-as-t`	Normalize uracil to thymine in spacers	On
`--summary`	Output aggregated hit counts by distance	Off
`--summary-format FMT`	Summary format: `json`, `tsv`	`json`
`--max-hits N`	Limit output per spacer	Unlimited
`--no-compute-mismatches`	Skip alignment annotations (faster)	Compute
`--output FILE`	Output file	stdout

See User Guide for complete reference.

Output Format

Argus produces tab-separated output with 19 biological annotation columns:

spacer  chrom   start   end     pattern distance  strand  aligned_seq  mismatch_pos  ...
VEGFA   chr6    43737006 43737026 GAGTCCGAGCAG... 2  +  GAGTCCAAGCAG...  7  ...

Key Columns:

distance - Edit distance (Levenshtein)
pam_type - NGG, NAG, OTHER, or INCOMPLETE
seed_edits - Disruptions in seed region (positions 1-10)
distal_edits - Disruptions in distal region (positions 11-20)
alignment_ambiguous - Multiple optimal alignments exist

See Output Formats for complete description.

Citation

If you use Argus in your research, please cite:

@software{argus2026,
  author = {Simpson, Danny},
  title = {Argus: Ultra-fast exhaustive CRISPR off-target search with GPU acceleration},
  year = {2026},
  version = {0.4.0},
  url = {https://github.com/simpsondl/argus}
}

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Areas of Interest:

Multi-GPU support for larger genomes
Additional nuclease PAM filters (Cpf1, etc.)
Python/R API bindings
Web interface for cloud deployment
Performance optimizations

License

This project is released under the MIT License.

Acknowledgments

Myers' Algorithm: Eugene W. Myers (1999) - Bit-parallel approximate string matching
GPU Architecture: NVIDIA CUDA team
Testing: GeCKO CRISPR library for benchmark spacers
Community: Early adopters and beta testers

Contact

Author: Danny Simpson
Issues: GitHub Issues
Discussions: GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
include		include
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus

Features

Core Capabilities

Technical Highlights

Quick Start

1. Set up environment

2. Build

3. Test installation

4. 30-Second Demo

Performance

Real-World Example

Documentation

User Guides

Developer Resources

Quick Links

System Requirements

Minimum

Recommended

Command-Line Reference

Basic Usage

Common Options

Output Format

Citation

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus

Features

Core Capabilities

Technical Highlights

Quick Start

1. Set up environment

2. Build

3. Test installation

4. 30-Second Demo

Performance

Real-World Example

Documentation

User Guides

Developer Resources

Quick Links

System Requirements

Minimum

Recommended

Command-Line Reference

Basic Usage

Common Options

Output Format

Citation

Contributing

License

Acknowledgments

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages