Event-based Depth Estimation using Liquid Neural Networks
This project implements a conditional pix2pix-style architecture enhanced with Liquid Neural Networks (LNNs) to reconstruct depth images from event camera data (DAVIS346) and optional modalities:
- Event frame only
- Event frame + grayscale DAVIS346
- Event frame + grayscale DAVIS346 + IMU
The model is based on a U-Net generator, where the bottleneck is modulated by temporal context extracted with a Liquid Neural Network block. This allows the network to better capture temporal dynamics from events and IMU data.
- Training pipeline with
main_train.py - Dataset loader handling DAVIS346 grayscale, events, IMU, and aligned RealSense depth
- Metrics: L1, MSE, RMSE, MAE, SSIM, PSNR
- Visualization: predicted vs. ground-truth samples saved every epoch
- Checkpoint saving for model weights
- Configurable output: depth as grayscale (with post-colorization) or direct RGB
- Modular design (dataset, model, train step, training loop)
- Ubuntu 20.04 / 22.04
- Python 3.10
- CUDA 11.8 (with NVIDIA driver >= 520)
- PyTorch with cu118 build
- See
requirements.txtfor exact package versions.
- Build the image:
docker build -t event_depth_lnn . - Run the container with GPU:
This mounts the project folder to
docker run --gpus all -it --rm -v $(pwd):/app event_depth_lnn bash/appinside the container.
- Create environment:
conda create -n event_depth python=3.10 conda activate event_depth pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
The dataset is expected to have the following structure:
processed_data/2025_smartcities/
sequence_YYYYMMDD_HHMMSS/
images/
frame_000000.png
frame_000001.png
...
imu/
frame_000000.txt
frame_000001.txt
...
Each image contains 4 quadrants:
- Top-left: RGB from RealSense
- Top-right: grayscale from DAVIS346
- Bottom-left: depth map from RealSense
- Bottom-right: accumulated event frame from DAVIS346
Run training with:
python main_train.py --data_root /path/to/processed_data/2025_smartcities --output_dir ./outputs --mode 3 --epochs 20Arguments:
--data_root: path to dataset root--output_dir: where to save checkpoints and sample images--mode: 1=events only, 2=events+grayscale, 3=events+grayscale+IMU--out_color: if set, model predicts RGB depth directly instead of 1-channel--epochs: number of epochs--batch_size: batch size--lr: learning rate (default 2e-4)
Checkpoints are saved under <output_dir>/checkpoints/.
Sample predictions vs. ground truth are saved under <output_dir>/samples/.
python main_train_events.py
--data_root input/
--output_dir ./output/events_mode1
--mode 1
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 5 --workers 4
python main_train_events.py
--data_root input/
--output_dir ./output/events_mode2
--mode 2
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 6 --workers 4
python main_train_events.py
--data_root input/
--output_dir ./output/events_mode3
--mode 3
--epochs 100
--batch_size 128
--resize_h 256 --resize_w 352
--event_bins 6
--device cuda --gpu 7 --workers 4
After training, use infer_save.py to run inference on sequences and save predictions:
python infer_save.py --data_root /path/to/data --checkpoint ./outputs/checkpoints/ckpt_epoch020.pth --output_dir ./outputs/infer- Depth predictions can be visualized as grayscale and colorized (plasma colormap).
- If
--out_colorwas enabled, predictions are saved as RGB.
- Ensure calibration between DAVIS346 and RealSense is correct for supervised training.
- Depth maps from RealSense may need masking (invalid pixels).
- Event frames are accumulated from DAVIS346 raw events; adjust accumulation window depending on sequence dynamics.
- Training can be memory intensive; adjust
resizeandbatch_sizeif out-of-memory. - For pix2pix-like adversarial training, a PatchGAN discriminator can be added (not enabled by default).
This project is released under the MIT License.