Skip to content

uoh-rislab/event-based_depth_estimation_lnn

Repository files navigation

event-based_depth_estimation_lnn

Event-based Depth Estimation using Liquid Neural Networks

Overview

This project implements a conditional pix2pix-style architecture enhanced with Liquid Neural Networks (LNNs) to reconstruct depth images from event camera data (DAVIS346) and optional modalities:

  • Event frame only
  • Event frame + grayscale DAVIS346
  • Event frame + grayscale DAVIS346 + IMU

The model is based on a U-Net generator, where the bottleneck is modulated by temporal context extracted with a Liquid Neural Network block. This allows the network to better capture temporal dynamics from events and IMU data.

Features

  • Training pipeline with main_train.py
  • Dataset loader handling DAVIS346 grayscale, events, IMU, and aligned RealSense depth
  • Metrics: L1, MSE, RMSE, MAE, SSIM, PSNR
  • Visualization: predicted vs. ground-truth samples saved every epoch
  • Checkpoint saving for model weights
  • Configurable output: depth as grayscale (with post-colorization) or direct RGB
  • Modular design (dataset, model, train step, training loop)

Requirements

  • Ubuntu 20.04 / 22.04
  • Python 3.10
  • CUDA 11.8 (with NVIDIA driver >= 520)
  • PyTorch with cu118 build
  • See requirements.txt for exact package versions.

Installation

Using Docker (recommended)

  1. Build the image:
    docker build -t event_depth_lnn .
  2. Run the container with GPU:
    docker run --gpus all -it --rm -v $(pwd):/app event_depth_lnn bash
    This mounts the project folder to /app inside the container.

Local (conda)

  1. Create environment:
    conda create -n event_depth python=3.10
    conda activate event_depth
    pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118

Dataset Structure

The dataset is expected to have the following structure:

processed_data/2025_smartcities/
  sequence_YYYYMMDD_HHMMSS/
    images/
      frame_000000.png
      frame_000001.png
      ...
    imu/
      frame_000000.txt
      frame_000001.txt
      ...

Each image contains 4 quadrants:

  • Top-left: RGB from RealSense
  • Top-right: grayscale from DAVIS346
  • Bottom-left: depth map from RealSense
  • Bottom-right: accumulated event frame from DAVIS346

Usage

Training

Run training with:

python main_train.py --data_root /path/to/processed_data/2025_smartcities --output_dir ./outputs --mode 3 --epochs 20

Arguments:

  • --data_root: path to dataset root
  • --output_dir: where to save checkpoints and sample images
  • --mode: 1=events only, 2=events+grayscale, 3=events+grayscale+IMU
  • --out_color: if set, model predicts RGB depth directly instead of 1-channel
  • --epochs: number of epochs
  • --batch_size: batch size
  • --lr: learning rate (default 2e-4)

Checkpoints are saved under <output_dir>/checkpoints/.
Sample predictions vs. ground truth are saved under <output_dir>/samples/.

Training with numpy events

1) EVENTS ONLY (mode=1)

python main_train_events.py
--data_root input/
--output_dir ./output/events_mode1
--mode 1
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 5 --workers 4

2) FRAMES (GRAY) + EVENTS (mode=2)

python main_train_events.py
--data_root input/
--output_dir ./output/events_mode2
--mode 2
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 6 --workers 4

3) FRAMES (GRAY) + EVENTS + IMU (mode=3)

python main_train_events.py
--data_root input/
--output_dir ./output/events_mode3
--mode 3
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 7 --workers 4

Inference

After training, use infer_save.py to run inference on sequences and save predictions:

python infer_save.py --data_root /path/to/data --checkpoint ./outputs/checkpoints/ckpt_epoch020.pth --output_dir ./outputs/infer

Visualization

  • Depth predictions can be visualized as grayscale and colorized (plasma colormap).
  • If --out_color was enabled, predictions are saved as RGB.

Considerations

  • Ensure calibration between DAVIS346 and RealSense is correct for supervised training.
  • Depth maps from RealSense may need masking (invalid pixels).
  • Event frames are accumulated from DAVIS346 raw events; adjust accumulation window depending on sequence dynamics.
  • Training can be memory intensive; adjust resize and batch_size if out-of-memory.
  • For pix2pix-like adversarial training, a PatchGAN discriminator can be added (not enabled by default).

License

This project is released under the MIT License.

About

Event-based Depth Estimation using Liquid Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors