Skip to content

collaborativebioinformatics/Proteolinker

Repository files navigation

ProteO-Linker

Image

ProteO-Linker is a web-based protein expression analysis tool. It uses a Shiny interface built in R to examine Olink data generated by their proteomics platform. Users are able to upload their data and generate custom plots of expression data.

Olink's Proximity Extension Assay (PEA) uses a high-throughout panel of antibodies conjugated to oligonucleotide barcodes to target proteins of interest. Binding of antibodies is counted by barcode using Illumina NGS to calculate the abundance of the proteins in a sample, giving a measure of target protein expression.

How to Use

  1. Clone the repository

    git clone https://github.com/collaborativebioinformatics/Proteolinker.git
    cd Proteolinker
  2. Install dependencies

    Make sure you have R (≥ 4.0) and Shiny installed.

    Install required packages from within R:

    install.packages(c("shiny", "ggplot2", "dplyr", "tidyr", "readr", "ComplexHeatmap", "OlinkAnalyze", "gridExtra", "grid", "purrr", "DT", "plotly"))
  3. Launch the app

    From R or RStudio, run:

    shiny::runApp("app.R")
  4. Upload data

    • Select your Olink NPX or parquet file(s)
    • The app will validate input files and provide a summary of Sample/Assay failures or warnings.
    • Select a LOD filtering algorithm (optional)
    • Upload a differential expression (DE) analysis OR a manifest to base the DE analysis on
  5. Explore results

    • Run pathway enrichment analysis
    • Generate datafiles and various plots
    • Download plots (PNG) or results (CSV/PDF)

Pathway Enrichment Analysis Pipeline

Overview & Objective

The goal of ProteO-Linker is to provide a straightforward workflow for examining protein expression data generated by Olink assays. The tool performs quality checks, applies filtering, and supports downstream analysis such as differential expression and pathway enrichment. Users can visualize results through multiple plot types and export figures or tables for reporting. The objective is to lower the barrier for exploratory analysis of Olink data while keeping the workflow modular and easy to adapt.

Methodology

ProteO-Linker implements a stepwise pipeline:

  1. Quality Control (QC) of Input Files

    • For each uploaded parquet file, two checks are performed:
      1. Verify that the file is complete and correctly formatted
      2. Report failed or warned samples/assays
    • Provides users with a summary of QC status for each file
  2. LOD Filtering

    • Two filtering methods are available:
      1. Fixed LOD (values provided by Olink)
      2. Standard deviation–based LOD (customizable, current default is 3 SD from mean)
    • If applied, a volcano plot of all 6 sample controls is generated, showing proteins above and below LOD
  3. Differential Expression (DE) Analysis

    • Users may upload a list of differentially expressed proteins with DE analysis
    • Or the app will do DE analysis, users need to upload a manifest: male vs. female, treated vs. nontreated
    • DE will be done via t-test
    • DE results can be visualized with volcano plots (protein name, fold-change, p-value)
    • These outputs will be used as input for pathway enrichment
  4. Pathway Enrichment

    • Enrichment analysis highlights pathways associated with significant proteins
    • Will use GSEA; additional methods (FDR, Fisher’s test) planned for future development
  5. Visualization

    • Histograms/heatmaps of protein expression linked to KEGG, Go terms, Reactome or all three
    • Interactive volcano plots for DE analysis
    • Volcano plots visualizing LOD filtering results
  6. Outputs

    • PNG plots for visualization
    • CSV files of selected proteins, samples, or enriched pathways
    • Future Developement: Optional PDF report summarizing results

Workflow

Flowchart

Future Directions

We plan to enhance the differential expression analysis by incorporating multivariate approaches and expanding the selection of pathway enrichment methods. More broadly, this is one of several analytical pipelines we aim to develop further.

Additional ideas include:

  • Multi-omics integration: Combine proteomics with RNA-Seq or genomics data to uncover regulatory mechanisms.
  • Protein-protein interaction (PPI) network analysis: Visualize and analyze protein interaction networks to identify hub proteins that may serve as key regulators within biological pathways.
  • Biomarker signature discovery: Apply machine learning techniques to identify panels of proteins that can predict or diagnose disease states or treatment responses.

Contributors:

  • Kimberly Walker, HGSC-BCM (Team Leader)
  • Tomilola Aderupoko, Hood College, MD
  • Vijetha Balakundi, McLean Hospital, Boston | Broad Institute of MIT and Harvard, Cambridge
  • Neda Ghohabi Esfahani, Department of Bioengineering, Northeastern University, Boston, MA
  • Michael Muchow, DMV Petri Dish
  • Aniket Naik, University of Pittsburgh
  • Qiaoyan Wang, HGSC-BCM

Acronyms

Abbreviation Acronym
DE Differential Expression
FDR False Discovery Rate
GO Gene Ontology
GSEA Gene Set Enrichment Analysis
KEGG Kyoto Encyclopedia of Genes and Genomes
LOD Limit of Detection
LOQ Limit of Quantification
NGS Next-Generation Sequencing
NPX Normalized Protein eXpression
PEA Proximity Extension Assay
PPI Protein-Protein Interaction
SD Standard Deviation

NCBI Codeathon Disclaimer

This software was created as part of an NCBI codeathon, a hackathon-style event focused on rapid innovation. While we encourage you to explore and adapt this code, please be aware that NCBI does not provide ongoing support for it.

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8