VAE_lung_lesion_BMVC

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Work in Progress — University project, Spring 2026

Team: Kenmogne Loic, Brasa Franklin, Martins Soares Flavio

Documentation

See the docs/ folder for a full project guide, including:

What we're trying to do and the scientific motivation
How we plan to approach the experiments (GVAE vs DirVAE on NIH DeepLesion)
Environment setup and dataset instructions
Step-by-step implementation plan

paper link: https://arxiv.org/abs/2311.15719 and https://proceedings.bmvc2023.org/699/

Exploration of LIDC-IDRI lung lesion dataset (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254).

Converts the CT scans from DICOM format to numpy arrays (numerical vectors for Houndsfield Units of pixel intensities) and crop images to a region of interest (ROI) 64x64 pixels. Includes calculations for how to implement the bounding box cropping of the image based on the centre of the region annotated in the segmentation masks.

LIDC_datasplit.ipynb

View the metadata given with the LIDC-IDRI dataset and saves malignancy labels to a numpy array. Includes removal of the slices from the labels which were excluded.

mask_size.ipynb

Find the region of interest (ROI) size for the lesions based on the convex hull and minimum bounding box of the segmentation masks.

Train_Test_Split.ipynb

Split the patients into train/validation/test it produces the following two files also saved here meta_mal_ben.csv and meta_mal_nonmal.csv. These files hold the meta-data for both splits of the patients: malignant vs benign (mal_ben) with ambiguous excluded and malignant vs non-malignant (mal_nonmal).

VAE

Extract Latent Vectors and Reconstructions.ipynb

Extract the latent vectors from the VAE model using the model parameters saved and save the latent vectors.

RandomSearchVAE.py

Gaussian VAE with hyperparameter training combined with MLP predictor to assess classification quality of latent vectors. Note: includes splitting slices at the patient level.

RandomSearch_Dirichlet_VAE.py

VAE with Dirichlet latent space. Note: produces latent vectors with better disentanglement which may allow better latent exploration as each dimension in latent vector is encouraged to encode different features.

VAE_MLP_joint_loss_mal_nonmal.py

Gaussian VAE malignant vs non-malignant with joint VAE and classifier loss.

VAE_joint_loss_mal_benign.py

Gaussian VAE malignant vs benign with joint VAE and classifier loss.

VAE_Dirichlet_joint_loss.py:

Dirichlet VAE with joint VAE and classifier loss.

Clusering and MLP

Clustering_inital.ipynb

This file explores clustering of the latent vectors. Including extracting latent vectors, exploration using PCA and t-SNE and k-means clustering.

Clustering.ipynb

Grid search for best clustering with K-Means and CLASSIX (https://github.com/nla-group/classix).

Exploration_gaussian.ipynb

Latent space exploration and code to generate latent traversal figures. Also included under main are 5 example GIFs of latent traversals.

RandomSearchMLP.py

This file does a larger random hyperparameter search than my other random search files (in VAE). This script runs cross-validation on the latent vectors to find the best results of the classifier.

Dirichlet_RandomSearchMLP.py

This file does the larger random hyperparameter search for the Dirichlet VAE.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Clustering and MLP		Clustering and MLP
Preprocessing		Preprocessing
VAE		VAE
docs		docs
interesting_examples		interesting_examples
EM_diagram.png		EM_diagram.png
Ex1.gif		Ex1.gif
Ex2.gif		Ex2.gif
Ex3.gif		Ex3.gif
Ex4.gif		Ex4.gif
Ex5.gif		Ex5.gif
Irregular_border_dir.gif		Irregular_border_dir.gif
README.md		README.md
VAE_hyperparam_training.xlsx		VAE_hyperparam_training.xlsx
data_2classes.npy		data_2classes.npy
data_3classes.npy		data_3classes.npy
elbow.jpg		elbow.jpg
labels2.npy		labels2.npy
labels3.npy		labels3.npy
meta_mal_ben.csv		meta_mal_ben.csv
meta_mal_nonmal.csv		meta_mal_nonmal.csv
tumour_growth_dir.gif		tumour_growth_dir.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAE_lung_lesion_BMVC

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Documentation

Contents:

Preprocessing

LIDC_DICOM_to_Numpy.ipynb

LIDC_datasplit.ipynb

mask_size.ipynb

Train_Test_Split.ipynb

VAE

Extract Latent Vectors and Reconstructions.ipynb

RandomSearchVAE.py

RandomSearch_Dirichlet_VAE.py

VAE_MLP_joint_loss_mal_nonmal.py

VAE_joint_loss_mal_benign.py

VAE_Dirichlet_joint_loss.py:

Clusering and MLP

Clustering_inital.ipynb

Clustering.ipynb

Exploration_gaussian.ipynb

RandomSearchMLP.py

Dirichlet_RandomSearchMLP.py

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VAE_lung_lesion_BMVC

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

Documentation

Contents:

Preprocessing

LIDC_DICOM_to_Numpy.ipynb

LIDC_datasplit.ipynb

mask_size.ipynb

Train_Test_Split.ipynb

VAE

Extract Latent Vectors and Reconstructions.ipynb

RandomSearchVAE.py

RandomSearch_Dirichlet_VAE.py

VAE_MLP_joint_loss_mal_nonmal.py

VAE_joint_loss_mal_benign.py

VAE_Dirichlet_joint_loss.py:

Clusering and MLP

Clustering_inital.ipynb

Clustering.ipynb

Exploration_gaussian.ipynb

RandomSearchMLP.py

Dirichlet_RandomSearchMLP.py

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages