Skip to content

GilRaitses/indysim

Repository files navigation

InDySim: Stimulus-Driven Behavioral Simulation of Drosophila Larvae

Event-Hazard Modeling and Design of Experiments for Predictive Simulation

Project Status Documentation


🎯 Project Overview

InDySim (Interface Dynamics Simulation) is a simulation modeling framework that predicts how Drosophila larvae respond to time-varying LED stimuli. The project develops event-hazard models using generalized linear models (GLMs) with temporal kernels to simulate behavioral trajectories under controlled experimental conditions.

Core Simulation Capabilities

  • Event-hazard modeling of behavioral events (reorientations, pauses, reversals)
  • Stimulus-response dynamics captured via raised-cosine temporal kernels
  • Parameterized stimulus simulation (intensity, pulse duration, gap duration, ramp)
  • Statistical validation with turn rate, PSTH, and IEI comparisons
  • Within-condition trajectory generation for power analysis and exploration

⚠️ Scope and Limitations

IMPORTANT: See docs/PROJECT_LIMITATIONS.md for full details.

Current Scope: Within-Condition Validation

The LNP (Linear-Nonlinear-Poisson) model has been trained and validated on a single experimental condition:

Parameter Value
LED1 intensity 0-250 PWM (ramp)
Pulse duration 10 seconds
Gap duration 20 seconds
Cycle period 30 seconds

Validation Status (December 2025)

Model refitted with triphasic kernel (early + intermediate + late):

Metric Empirical Simulated Threshold Status
PSTH correlation - 0.888 β‰₯ 0.645 PASS
W-ISE - 0.067 ≀ 0.304 PASS
Early suppression 0.55 0.55 - PASS (1% diff)
Late suppression 0.32 0.36 - PASS (4% diff)
Suppression pattern Building Building - PASS

Bootstrap validation thresholds derived from empirical self-consistency (100 iterations).

Key improvements (2025-12-11):

  • Added intermediate kernel bases at [2.0, 2.5]s to bridge early-late gap
  • Relaxed early kernel constraint (only first basis non-negative)
  • Added W-ISE and bootstrap threshold validation
  • Early suppression improved from 27% off to 1% off

See docs/MODEL_SUMMARY.md for full model details.

What This Project Does NOT Claim

  • Cross-condition prediction: The model cannot predict behavior at different pulse durations (e.g., 30s), gap durations (e.g., 60s), or intensities (e.g., 500 PWM) without additional data
  • GMR61 circuit interpretation: The genotype's behavioral phenotype is novel and not characterized in prior literature
  • Optimal condition identification: The model cannot rank untested DOE conditions

Note: This is a solo project. All work by Gil Raitses.

Novel Biological Finding

GMR61 > CsChrimson activation is associated with a net reduction in detected reorientations (79% of turns during LED OFF). This is a novel observation with unknown mechanism (freezing? adaptation? detection artifact?). See docs/PROJECT_LIMITATIONS.md for details.


πŸ”¬ Simulation & Modeling Approach

Event-Hazard GLM Framework

The core model predicts time-varying hazard rates for behavioral events:

Ξ»_E(t) = exp{Ξ²β‚€,E + Ο†_E^T[s ⋆ ΞΊ](t) + x(t)^TΞ²_E}

Where:

  • Ξ»_E(t): Hazard rate for event type E ∈ {turn, stop, reverse} at time t
  • Ξ²β‚€,E: Baseline log-hazard for event type E
  • s(t): Stimulus feature vector (intensity, on/off state, recent history)
  • ΞΊ: Temporal kernel (raised-cosine basis expansion)
  • s ⋆ ΞΊ: Convolution of stimulus with kernel (captures latency and adaptation)
  • x(t): Contextual features (speed, orientation, wall distance)
  • Ξ²_E: Feature coefficients (estimated from data)

Temporal Kernel Design

Raised-cosine kernels capture:

  • Latency effects: Peak response at delay Ο„β‚€ β‰ˆ 0.5-2 seconds
  • Adaptation: Decay over longer delays (Ο„ > 5 seconds)
  • Anticipation: Pre-stimulus effects (if any)

Design of Experiments (DOE)

Factors:

  • Stimulus Intensity: 3 levels (PWM 250, 500, 1000)
  • Pulse Duration: 5 levels (10s, 15s, 20s, 25s, 30s)
  • Inter-Pulse Interval: 3 levels (5s, 10s, 20s)

Design: Full factorial (3 Γ— 5 Γ— 3 = 45 conditions) with 30 replications each

Response Variables (KPIs):

  • Turn rate (reorientations per minute)
  • Latency to first turn
  • Stop fraction
  • Pause rate
  • Path tortuosity
  • Spatial dispersal
  • Mean spine curve energy
  • Reversal rate and duration

πŸ“Š Recent Achievements (December 2025)

βœ… Platform Liberation: MATLAB β†’ Python Pipeline

Status: Validated and Production-Ready

Successfully transferred the entire analysis pipeline from MATLAB to Python with numerical equivalence validation:

Validation Layer Status Details
H5 Schema Validation βœ… PASS 10/10 experiments validated
Camera Calibration βœ… PASS 7/7 fields exact match
SpeedRunVel Computation βœ… PASS Identical values (< 1e-10 tolerance)
Reversal Detection βœ… PASS Identical results (count, timing, duration)
Turn Detection βœ… PASS 45Β° threshold, identical event counts

Key Resolution: Critical fix identified that using raw position data (points/loc) instead of smoothed data (derived_quantities/sloc) caused 5-7x errors in SpeedRunVel computation. Python pipeline now uses correct smoothed position data.

βœ… Enhanced Analysis Pipeline

New Capabilities:

  • Stimulus-window analysis: Per-track and population-level metrics within LED on/off windows
  • Reversal detection: SpeedRunVel < 0 for > 3s duration
  • Turn detection: 45Β° angle threshold with directional classification
  • Concurrency estimation: Active tracks per time bin
  • Master H5 export: Combined analysis ready for simulator input

Output Structure:

  • Per-file JSON analysis (track-level, window-level, population aggregates)
  • Combined analysis JSON (all experiments)
  • Master H5 file for simulation intake

πŸš€ What Can Be Done With the Validated Data

1. Fit Event-Hazard Models

With 10 validated H5 files containing behavioral trajectories and stimulus timing:

  • Estimate temporal kernels (Ο†_E) for each event type
  • Fit GLM coefficients (Ξ²β‚€,E, Ξ²_E) using stimulus-locked analysis
  • Cross-validate using leave-one-larva-out methodology
  • Characterize latency, adaptation, and intensity-response relationships

2. Run Simulation Experiments

Using the fitted models:

  • Generate simulated trajectories for all 45 DOE conditions
  • Produce 30 replications per condition (1,350 total simulations)
  • Compute KPIs with confidence intervals
  • Compare simulated vs. empirical behavioral metrics

3. Explore Stimulus Parameter Space

The DOE framework enables:

  • Systematic exploration of intensity Γ— duration Γ— interval effects
  • Response surface modeling to predict behavior at untested conditions
  • Optimization of stimulus protocols for desired behavioral outcomes
  • Sensitivity analysis of model parameters

4. Validate Model Predictions

Compare simulation outputs to empirical data:

  • Turn rate predictions vs. observed turn rates
  • Latency distributions vs. empirical latencies
  • Reversal patterns vs. observed reversal events
  • Spatial metrics (tortuosity, dispersal) vs. empirical trajectories

5. Extend to New Genotypes/Conditions

The validated pipeline supports:

  • New genotype analysis (beyond GMR61@GMR61)
  • Different stimulus protocols (varying waveforms, frequencies)
  • Environmental conditions (temperature, humidity effects)
  • Pharmacological interventions (drug effects on behavior)

πŸ“ Repository Structure

InDySim/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ h5_validated/              # βœ… 10 validated H5 files (ready for simulation)
β”‚   β”‚   β”œβ”€β”€ analysis/              # Enhanced analysis outputs
β”‚   β”‚   β”‚   β”œβ”€β”€ *_analysis.json   # Per-file analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ combined_analysis.json
β”‚   β”‚   β”‚   └── master_sim_input.h5
β”‚   β”‚   └── manifest.json          # Provenance tracking
β”‚   β”œβ”€β”€ matlab_data/               # Source MATLAB files (gitignored)
β”‚   └── engineered/                # Processed datasets
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ 2025-12-04/
β”‚   β”‚   β”œβ”€β”€ mat2h5/            # MATLAB β†’ H5 conversion pipeline
β”‚   β”‚   β”‚   β”œβ”€β”€ h5_export/      # MATLAB β†’ H5 conversion
β”‚   β”‚   β”‚   β”œβ”€β”€ validation/    # Validation framework
β”‚   β”‚   β”‚   └── engineer_dataset_from_h5.py  # Enhanced analysis
β”‚   β”‚   └── agent/              # Agent-related scripts
β”‚   β”œβ”€β”€ engineer_dataset_from_h5.py # Original analysis script
β”‚   └── queue/                     # Analysis scripts
β”‚
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ doe_table.csv              # 45-condition DOE specification
β”‚   └── model_config.json          # Model configuration (kernels, KPIs)
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ project-proposal-indysim.qmd  # Full project proposal
β”‚   β”œβ”€β”€ logs/                      # Daily progress logs
β”‚   └── work-trees/                # Task planning documents
β”‚
└── output/
    └── figures/                   # Generated visualizations

πŸ”§ Technical Stack

  • Python 3.11+: Data processing, analysis, and simulation
  • H5Py: HDF5 file format for trajectory data
  • NumPy/Pandas: Numerical computing and data manipulation
  • SciPy: Statistical modeling and optimization
  • MAGAT Segmentation: Larval track segmentation (runs, reorientations, head swings)
  • Quarto/LaTeX: Report generation and documentation
  • MATLAB: Reference implementation (validated against)

πŸ“š Key Documentation

Core Documentation

Recent Work Logs


πŸŽ“ Academic Context

Course: ECS630 - Simulation Modeling
Institution: [Your Institution]
Term: Fall 2025

This project applies simulation modeling methods from ECS630 to biological behavioral data, developing a stimulus-response model that simulates larval trajectories under different experimental conditions. The event-hazard framework models stochastic behavioral events, while the DOE methodology explores stimulus parameter space systematically.

Simulation Modeling Concepts Applied

  1. Stochastic Process Modeling: Event-hazard rates as time-varying stochastic processes
  2. Design of Experiments: Systematic exploration of parameter space
  3. Monte Carlo Simulation: Multiple replications for statistical inference
  4. Model Validation: Comparison of simulated vs. empirical data
  5. Statistical Inference: Confidence intervals and hypothesis testing

πŸ§ͺ Dataset

Genotype: GMR61@GMR61 (optogenetic variant)

Validated Experiments: 10 H5 files ready for simulation

  • Location: data/h5_validated/
  • Format: HDF5 with validated schema
  • Provenance: Tracked via manifest.json
  • Size: ~3GB total

Experimental Conditions:

  • ESET 1: T_Re_Sq_0to250PWM_30#C_Bl_7PWM (4 experiments)
  • ESET 2: T_Re_Sq_0to250PWM_30#T_Bl_Sq_5to15PWM_30 (4 experiments)
  • ESET 3: T_Re_Sq_50to250PWM_30#C_Bl_7PWM (4 experiments)
  • ESET 4: T_Re_Sq_50to250PWM_30#T_Bl_Sq_5to15PWM_30 (2 experiments)

🚦 Current Status

βœ… Completed

  • MATLAB to H5 Conversion: All 14 MATLAB ESET files converted to H5 format
  • Platform Liberation: Complete Python pipeline with numerical validation
  • Enhanced Analysis: Stimulus-window analysis, reversal/turn detection
  • Data Validation: 10 experiments validated and ready for simulation
  • Master H5 Export: Combined analysis ready for simulator intake

πŸ”„ Next Steps

  1. Fit Event-Hazard Models: Estimate GLM coefficients from validated data
  2. Run Simulation Experiments: Generate trajectories for 45 DOE conditions
  3. Validate Predictions: Compare simulated vs. empirical behavioral metrics
  4. Generate Reports: Statistical analysis and visualization of results

πŸ”— Quick Links


πŸ“ Usage Example

Run Enhanced Analysis on Validated H5 Files

cd scripts/2025-12-04/mat2h5
python engineer_dataset_from_h5.py data/h5_validated \
  -o data/h5_validated/analysis

Export Master H5 for Simulator

cd scripts/2025-12-04/agent/worktree
python export_master_h5.py \
  --combined data/h5_validated/analysis/combined_analysis.json \
  --output data/h5_validated/master_sim_input.h5

πŸ‘₯ Contributors

  • Gil Raitses - Project Lead

Last Updated: December 4, 2025
Status: βœ… Ready for Simulation - Validated data and pipeline complete


πŸ’‘ Discussion Points for Professor Meeting

What This Project Demonstrates

  1. Simulation Modeling: Event-hazard framework for stochastic behavioral processes
  2. Design of Experiments: Systematic parameter space exploration (45 conditions)
  3. Model Validation: Numerical equivalence between MATLAB and Python implementations
  4. Data Engineering: Robust pipeline from raw data to simulation-ready format
  5. Statistical Inference: Confidence intervals and replication-based validation

Potential Extensions

  1. Multi-genotype Comparison: Extend to additional genotypes beyond GMR61@GMR61
  2. Temporal Dynamics: Explore adaptation and habituation effects over longer timescales
  3. Spatial Modeling: Incorporate arena geometry and wall interactions
  4. Machine Learning: Deep learning approaches for event prediction
  5. Real-time Simulation: Interactive simulation for experimental design

Research Questions Addressable

  • How do stimulus intensity, duration, and interval interact to affect behavioral responses?
  • What are the optimal stimulus protocols for eliciting specific behaviors?
  • How do individual differences affect population-level predictions?
  • Can we predict behavior in novel experimental conditions?

About

InDySim: Interface Dynamics Simulation Model - Manuscript and Analysis Code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors