Comparative genomics of effector proteins across multiple genome assemblies.
When studying plant pathogens, a key question is: which effector proteins are shared across isolates, and which are unique? Understanding effector repertoires helps identify core virulence factors, track pathogen evolution, and discover candidate avirulence genes.
paneffectR solves this by:
- Clustering proteins into orthogroups - Finding equivalent proteins across assemblies using sequence similarity
- Building presence/absence matrices - Creating structured data showing which proteins exist in which assemblies
- Filtering by effector scores - Focusing on high-confidence effector predictions (when using omnieff output)
- Generating publication-ready visualizations - Heatmaps, UpSet plots, and dendrograms
While designed for effector analysis, paneffectR works with any protein sets for general pan-genome comparisons.
paneffectR requires Bioconductor packages. Install them first:
install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "Biostrings"))Then install paneffectR from GitHub:
# install.packages("devtools")
devtools::install_github("TeamMacLean/paneffectR")paneffectR uses DIAMOND for fast protein sequence alignment. Install it via mamba/conda:
mamba install -c bioconda diamondHow paneffectR finds DIAMOND:
The package searches for DIAMOND in this order:
- Explicit path - Pass
tool_path = "/path/to/diamond"tocluster_proteins() - Conda/mamba prefix - Pass
conda_prefix = "./my_env"for project-local environments - System PATH - Falls back to
Sys.which("diamond")
This flexibility supports various installation scenarios:
# System-wide installation (found via PATH)
clusters <- cluster_proteins(proteins)
# Project-local mamba environment
clusters <- cluster_proteins(proteins, conda_prefix = "./this_project_env")
# Explicit path
clusters <- cluster_proteins(proteins, tool_path = "/opt/diamond/bin/diamond")To create a project-local environment:
mamba create -p ./this_project_env -c bioconda diamondlibrary(paneffectR)
# Load proteins from multiple assemblies
proteins <- load_proteins(
fasta_dir = "path/to/fastas/",
score_dir = "path/to/scores/"
)
# Cluster into orthogroups
clusters <- cluster_proteins(proteins, method = "diamond_rbh")
# Build presence/absence matrix (filter to high-scoring effectors)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)
# Visualize
ht <- plot_heatmap(pa)
ComplexHeatmap::draw(ht)
plot_upset(pa, min_size = 2)Take output from the omnieff pipeline, find orthologous effectors across assemblies, and filter by prediction confidence:
# Load omnieff output (FASTAs + scores)
proteins <- load_proteins(
fasta_dir = "omnieff_output/reformatted/",
score_dir = "omnieff_output/scored/"
)
# Cluster and build matrix with score threshold
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)
# Visualize effector repertoires
plot_heatmap(pa) |> ComplexHeatmap::draw()Compare any protein sets without effector scores:
# Load raw FASTAs
proteins <- load_proteins(fasta_dir = "my_assemblies/")
# Binary presence/absence analysis
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, type = "binary")
# Identify core vs accessory proteins
plot_upset(pa, min_size = 2)
plot_dendro(pa, distance_method = "jaccard")- Getting Started - Core workflow tutorial
- Effector Analysis - Working with omnieff output
- Pan-Genome Analysis - General protein comparisons
- Algorithm Deep Dive - Technical details for bioinformaticians
- Function Reference - Complete API documentation
If you use paneffectR in your research, please cite:
MacLean, D. (2026). paneffectR: Comparative Genomics of Effector Proteins. R package version 0.1.0. https://github.com/TeamMacLean/paneffectR
MIT
