Skip to content

UCLA-SEAL/WhyFlow

Repository files navigation

WhyFlow: Interrogative Debugger for Taint Analysis

ICSE 2026 License

Overview

WhyFlow is an interrogative debugging tool for taint analysis that enables developers to ask why, why-not, and what-if questions about dataflows. This artifact accompanies our ICSE 2026 paper: "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis".

WhyFlow addresses the challenge of making sense of taint analysis results by providing:

  • Interrogative Debugging: Ask questions about the existence or absence of specific dataflows
  • Speculative Analysis: Explore the impact of different third-party library models and configurations
  • Visual Sensemaking: Graph-based visualization with color-coded annotations for global connectivity reasoning
  • Interactive Q&A Interface: Template-based queries with contextualized selections for sources, sinks, and APIs

Key Features

  • Interactive question-answer debugging interface for taint analysis
  • Support for why, why-not, and what-if queries about dataflows
  • Integration with CodeQL and Souffle Datalog for static analysis
  • Visual graph representation of taint flows with color-coded paths
  • Efficient handling of large-scale analysis results using MongoDB
  • User study data and statistical analysis scripts included

Repository Structure

WhyFlow/
├── taint_debug_app/          # Main WhyFlow application
│   ├── taint_debug/          # Meteor web application
│   │   ├── client/           # Frontend UI components
│   │   ├── server/           # Backend API and data loading
│   │   └── imports/          # Shared code and collections
│   ├── analysis_files/       # Analysis data and fact files
│   ├── app_souffle_queries/  # Souffle Datalog query files
│   └── souffle_output/       # Generated query outputs
├── Subject_Prog_CodeQL_Taint/# Subject program and CodeQL results
│   ├── src/                  # Source code (Apache Dubbo)
│   ├── codeql-custom-queries-java/ # Custom CodeQL queries
│   └── *.json, *.csv         # CodeQL analysis results
├── statistical_tests/        # User study statistical analysis
│   ├── statistical_tests.py  # Python scripts for analysis
│   └── *.csv                 # User study data and results
├── data/                     # User study materials
│   ├── data/                 # Questionnaire responses
│   ├── extension_queries/    # Additional query examples
│   ├── tutorials/            # Tutorial materials
│   └── *.png, *.ipynb        # Plots and analysis notebooks
└── souffle_output/           # Additional Souffle outputs

Prerequisites

  • Meteor (v2.13 or higher)
  • Node.js (v14 or higher)
  • MongoDB (installed with Meteor)
  • Souffle (optional, for running custom Datalog queries)
  • CodeQL (optional, for analyzing new programs)

Installation

1. Install Meteor

# macOS/Linux
curl https://install.meteor.com/ | sh

# Windows
# Download installer from https://www.meteor.com/install

2. Clone the Repository

git clone https://github.com/yourusername/WhyFlow.git
cd WhyFlow

3. Install Dependencies

# Install root-level dependencies
npm install

# Install WhyFlow app dependencies
cd taint_debug_app/taint_debug
meteor npm install
cd ../..

Running WhyFlow

Start the Application

cd taint_debug_app/taint_debug
meteor run

The application will be available at http://localhost:3000

Environment Variables (Optional)

Set these variables for custom configurations:

export PWD=/path/to/WhyFlow
export SOURCE_CODE_ROOT_DIR=/path/to/subject/program

Using WhyFlow

  1. Access the Interface: Open http://localhost:3000 in your browser
  2. Select Query Type: Choose from templated why, why-not, or what-if questions
  3. Contextualize Query: Select specific sources, sinks, and third-party APIs from dropdowns
  4. View Results: Explore results in the graph view with color-coded annotations
  5. Iterate: Refine queries based on initial results for deeper investigation

Reproducing the User Study

Statistical Analysis

The statistical_tests/ directory contains all user study data and analysis scripts:

cd statistical_tests
python3 statistical_tests.py

This will regenerate the statistical test results reported in the paper.

Data Visualization

Generate plots from the user study data:

cd data
jupyter notebook plots.ipynb

Extending WhyFlow

Adding Custom Queries

Place Souffle Datalog query files in taint_debug_app/app_souffle_queries/

Analyzing New Programs

  1. Run CodeQL analysis on your target program
  2. Export results in JSON/CSV format
  3. Place results in Subject_Prog_CodeQL_Taint/
  4. Update paths in the Meteor application configuration

Customizing the UI

Modify the Meteor application in taint_debug_app/taint_debug/:

  • client/ - Frontend React components
  • server/ - Backend API methods
  • imports/ - Shared collections and utilities

Data Availability

This repository includes:

  • ✅ User study questionnaire responses
  • ✅ Statistical analysis scripts and results
  • ✅ Subject program (Apache Dubbo) with CodeQL results
  • ✅ Tutorial materials and task descriptions
  • ✅ NASA-TLX and accuracy data

Citation

If you use WhyFlow in your research, please cite our paper:

@inproceedings{yetistiren2026whyflow,
  title={WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis},
  author={Yetiştiren, Burak and Kang, Hong Jin and Kim, Miryung},
  booktitle={Proceedings of the 48th International Conference on Software Engineering},
  year={2026},
  organization={ACM}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or issues, please:

Acknowledgments

This work is supported by the National Science Foundation under grant numbers 2426162, 2106838, and 2106404, with additional support from Amazon and Samsung.

About

WhyFlow: Interrogative Debugger for Sensemaking Static Taint Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published