Production-ready research pipeline for video anomaly detection with:
- baseline temporal-visual model
- text-prior semantic guidance
- heatmap-prior spatial guidance
- dual-prior fusion with spike-robust event logic
-
single_video_anomaly_system.py
Main runnable system for a single MP4, with 4 overlays and metrics. -
dataset_loader.py
Dataset loading utilities for custom/Avenue/UCSD/ShanghaiTech flows. -
baseline_model.py
Baseline detector logic (frozen vision features + anomaly scoring). -
text_prior_model.py
Text-prior detector logic (CLIP/fallback text semantics + anomaly ranking). -
spatial_prior_models.py
Heatmap-prior and dual-prior detector logic. -
benchmark_runner.py
Multi-model benchmark runner for dataset-level comparisons. -
ARCHITECTURE.md
Detailed architecture notes and scoring logic. -
DATASET_GUIDE.py
Additional setup examples for real dataset layouts. -
outputs/theft_run_mv_final_guarded_f1/
Latest final overlays + metrics.
- Extract temporal/appearance cues per frame.
- Build anomaly score from learned classifier calibration.
- Apply spike filtering (minimum ON/OFF persistence) to reduce false bursts.
- Use detailed theft-focused prompts (forceful snatch/robbery language).
- Compute frame-text similarity (full frame + center crop).
- Blend semantic score with baseline score.
Potential: improves contextual understanding when visual motion alone is ambiguous (e.g., normal close interaction vs forceful theft).
- Build center-focused spatial saliency for likely theft interaction area.
- Generate spatial anomaly score from ROI dynamics.
- Blend with baseline score.
Potential: suppresses irrelevant background activity and emphasizes suspicious motion inside important regions.
- Combine baseline, text prior, and heatmap prior scores.
- Apply event-level spike suppression/hysteresis.
- Use guarded blending to avoid AUROC degradation relative to baseline.
Potential: best practical signal when event meaning (text) and location relevance (heatmap) are both needed.
.venv/bin/python single_video_anomaly_system.py \
--video-path "/Users/abhinavs/Desktop/mv.mp4" \
--anomaly-ranges "0.21.2-0.22.4,1.04.6-1.05.5,1.41.5-1.44.7,2.24.9-2.28.6,3.35.4-3.36.7,6.23.9-6.26.2,7.02.4-7.03.3,7.51.0-7.54."Outputs are written under:
outputs/theft_run_mv_final_guarded_f1/(or the--work-diryou provide)
Includes:
test_predictions_overlay_baseline.mp4test_predictions_overlay_text_prior.mp4test_predictions_overlay_heatmap_prior.mp4test_predictions_overlay_both_priors.mp4metrics.json
- Event decisions use spike filtering to ignore very short score spikes.
- Threshold/window selection is tuned for event-level robustness, not frame-level jitter.
- Guarded blending keeps AUROC stable (or improved) vs baseline when priors are weak.
Install once:
.venv/bin/pip install -r requirements.txtOptional for stronger text priors:
.venv/bin/pip install git+https://github.com/openai/CLIP.git