- Clinical AI Intelligence Platform
- Healthcare AI Challenges & Solutions
- Brain tumor segmentation using MRI
- Chest X-ray analysis
- Risk prediction models
The Clinical AI Intelligence Platform represents a comprehensive enterprise-grade solution for medical image analysis and clinical risk prediction. As a senior AI engineer, I have architected and implemented this platform to address critical challenges in modern healthcare delivery through advanced machine learning and deep learning methodologies.
This platform integrates three core AI systems that work synergistically to support clinical decision-making:
1. Medical Diagnosis - Advanced computer vision models for automated pathology detection and anatomical segmentation
2. Medical Prognosis - Predictive analytics for patient risk stratification and outcome prediction
3. Medical Treatment Support - Interpretable AI systems that provide actionable insights for treatment planning
The platform leverages state-of-the-art deep learning architectures and rigorous ML engineering practices to deliver production-ready solutions. Each component has been designed with clinical deployment in mind, incorporating model interpretability, robust evaluation metrics, and scalable inference pipelines.
AI algorithms can analyze large amounts of data through electronic health records for disease prevention and diagnosis. The platform supports healthcare organizations in their operational initiatives, enabling cost optimization, improved patient outcomes, and enhanced clinical workflow efficiency.
As a senior AI engineer, this platform demonstrates enterprise-level capabilities in:
Deep Learning & Computer Vision:
- Architecture design and implementation of state-of-the-art models (3D U-Net, DenseNet-121) for medical image analysis
- Production-grade handling of 3D medical imaging data and volumetric segmentation pipelines
- Custom loss function engineering (Soft Dice Loss) to address severe class imbalance in medical datasets
- Advanced transfer learning strategies and domain adaptation for medical imaging applications
- Model interpretability frameworks (GradCAM) for clinical validation and regulatory compliance
Machine Learning & Data Science:
- Enterprise-level feature engineering and data preprocessing pipelines for medical datasets
- Advanced missing data imputation strategies (multivariate feature imputation, iterative imputation)
- Systematic model selection, hyperparameter optimization, and ensemble methods
- Explainable AI implementation using SHAP (SHapley Additive exPlanations) for clinical transparency
- Rigorous performance evaluation using domain-specific metrics (Dice coefficient, C-index, ROC-AUC)
Medical AI Applications:
- Production-ready multi-class semantic segmentation systems for brain tumor detection
- Scalable multi-label classification pipelines for chest X-ray pathology detection
- Clinical risk prediction models with interpretable feature contributions
- Expertise in medical imaging standards (NIfTI, DICOM) and healthcare data formats
- Memory-efficient patch-based processing for large-scale medical volume analysis
In order to effectively train Machine Learning and use AI in healthcare, massive amounts of data must be gathered Acquiring this data, however, comes at the cost of patient privacy in most cases and is not well received publicly. For example, a survey conducted in the UK estimated that 63% of the population is uncomfortable with sharing their personal data in order to improve artificial intelligence technology. The scarcity of real, accessible patient data is a hindrance that deters the progress of developing and deploying more artificial intelligence in healthcare.
According to a recent study, AI can replace up to 35% of jobs in the UK within the next 10 to 20 years. However, of these jobs, it was concluded that AI has not eliminated any healthcare jobs so far. Though if AI were to automate healthcare related jobs, the jobs most susceptible to automation would be those dealing with digital information, radiology, and pathology, as opposed to those dealing with doctor to patient interaction. Automation can provide benefits alongside doctors as well. It is expected that doctors who take advantage of AI in healthcare will provide greater quality healthcare than doctors and medical establishments who do not. AI will likely not completely replace healthcare workers but rather give them more time to attend to their patients. AI may avert healthcare worker burnout and cognitive overload.
Since AI makes decisions solely on the data it receives as input, it is important that this data represents accurate patient demographics. In a hospital setting, patients do not have full knowledge of how predictive algorithms are created or calibrated. Therefore, these medical establishments can unfairly code their algorithms to discriminate against minorities and prioritize profits rather than providing optimal care. There can also be unintended bias in these algorithms that can exacerbate social and healthcare inequities. Since AI’s decisions are a direct reflection of its input data, the data it receives must have accurate representation of patient demographics. These biases are able to be eliminated through careful implementation and a methodical collection of representative data.
This header image represents the intersection of artificial intelligence and medical imaging, showcasing the advanced technology used in this project to assist healthcare professionals in diagnosing and treating patients.
- Objective: Develop a production-grade deep learning system for automated brain tumor segmentation from MRI volumes
- Architecture: Custom 3D U-Net model capable of multi-class segmentation with 4 distinct labels - background, edema, enhancing tumor, and non-enhancing tumor
- Loss Function Engineering: Implemented Soft Dice Loss to address severe class imbalance, significantly outperforming traditional cross-entropy optimizers
- Inference Pipeline: Patch-based processing architecture for memory-efficient handling of large volumes, with intelligent patch fusion to reconstruct full MRI scan predictions
Magnetic resonance imaging (MRI) is an advanced imaging technique that is used to observe a variety of diseases and parts of the body. Neural networks can analyze these images individually (as a radiologist would) or combine them into a single 3D volume to make predictions. At a high level, MRI works by measuring the radio waves emitting by atoms subjected to a magnetic field. We have built a multiclass segmentation model which identifies 3 abnormalities in an image: Edemas, non-enhancing tumors and enhancing tumors.
This MRI visualization demonstrates a cross-sectional view of a brain scan. The image shows the complex anatomical structures that our deep learning model analyzes to identify and segment different types of brain abnormalities. The model processes these multi-dimensional images to distinguish between healthy tissue and various pathological regions with high precision.
- In this project, I used data from a medical imaging challenge dataset
- The dataset is stored in the NifTI-1 format (Neuroimaging Informatics Technology Initiative), which is the standard format for storing neuroimaging data. We use the NiBabel library to interact with these files. Each training sample is composed of two separate files:
- The first file is an image file containing a 4D array of MR image in the shape of (240, 240, 155, 4). The second file in each training example is a label file containing a 3D array with the shape of (240, 240, 155).
- The integer values in this array indicate the "label" for each voxel in the corresponding image files:
- 0: Background.
- 1: Edema.
- 2: Non-enhancing tumor.
- 3: Enhancing tumor.
- First generate "patches" of our data which you can think of as sub-volumes of the whole MR images.
- The reason that we are generating patches is because a network that can process the entire volume at once will simply not fit inside our current environment's memory. Therefore we will be using this common technique to generate spatially consistent sub-volumes of our data, which can be fed into our network.
- Specifically, I generated randomly sampled sub-volumes of shape [160, 160, 16] from the images.
- Given that the values in MR images cover a very wide range, standardize the values to have a mean of zero and standard deviation of 1.
- The color corresponds to each class:
- Red is edema.
- Blue is enhancing tumor.
- Green is non enhancing tumor.
This animated GIF showcases a 3D MRI volume being processed through different slices. The animation demonstrates how our model analyzes the entire volumetric structure of the brain, processing each slice to build a comprehensive understanding of the tumor's location and extent. The color-coded segmentation overlay (red for edema, blue for enhancing tumor, green for non-enhancing tumor) shows how the model identifies and classifies different tissue types across the entire 3D volume, enabling precise localization of abnormalities that would be challenging to detect manually.
- U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg The network is based on the fully convolutional network and its architecture was modified and extended to work with fewer training images and to yield more precise segmentations.
- U-Net was created by Olaf Ronneberger, Philipp Fischer, Thomas Brox in 2015 in their paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". It's an improvement and development of FCN (Fully Convolutional Networks) introduced by Evan Shelhamer, Jonathan Long, and Trevor Darrell in 2014.
- The architecture contains two paths. First path is the contraction path (also called as the encoder) which is used to capture the context in the image. The encoder is just a traditional stack of convolutional and max pooling layers. The second path is the symmetric expanding path (also called as the decoder) which is used to enable precise localization using transposed convolutions.
- Thus it is an end-to-end fully convolutional network (FCN), i.e. it only contains Convolutional layers and does not contain any Dense layer because of which it can accept image of any size.
This detailed U-Net architecture diagram illustrates the complete network structure used in our brain tumor segmentation model. The U-shaped architecture shows the encoder (left side) that progressively downsamples the input to extract high-level features, and the decoder (right side) that upsamples these features to produce pixel-level predictions. The horizontal skip connections (shown as gray arrows) are crucial - they preserve fine-grained spatial details from the encoder layers and combine them with the upsampled features in the decoder, enabling the model to maintain precise localization while understanding global context. This architecture is specifically designed for biomedical image segmentation tasks where both local detail and global context are essential for accurate diagnosis.
- The U-Net model implemented here has a depth of 4. This implies that the model will have 4 contracting(analysis) paths and 4 expanding(synthesis) paths.
- In the contracting path, each layer contains two 3x3x3 convolutions each followed by a ReLU, and then a 2x2x2 maxpooling(except the last layer).
- In the expanding path, each layer consists of an up-convolution of 2×2×2, followed by two 3×3×3 convolutions each followed by a ReLU.
- Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path.
- In the last layer, a 1×1×1 convolution reduces the number of output channels to the number of labels which is 3, followed by a sigmoid activation layer.
This 3D U-Net architecture visualization demonstrates how our model processes volumetric MRI data. Unlike traditional 2D CNNs that analyze individual slices, the 3D U-Net processes the entire volume simultaneously, capturing spatial relationships across all three dimensions. The diagram shows the contracting path (encoder) that extracts increasingly abstract features, and the expanding path (decoder) that reconstructs the segmentation mask. The 3D convolutions (3×3×3) allow the model to understand how structures appear across adjacent slices, which is critical for accurately identifying tumor boundaries and distinguishing between different tumor types. This volumetric approach provides more accurate segmentation than processing slices independently.
- Aside from the architecture, one of the most important elements of any deep learning method is the choice of our loss function.A natural choice that you may be familiar with is the cross-entropy loss function.However, this loss function is not ideal for segmentation tasks due to heavy class imbalance (there are typically not many positive regions).
- A much more common loss for segmentation tasks is the Dice similarity coefficient, which is a measure of how well two contours overlap.
The Dice Similarity Coefficient formula is a critical metric for evaluating segmentation performance in medical imaging. This metric measures the overlap between the predicted segmentation and the ground truth mask, with values ranging from 0 (no overlap) to 1 (perfect overlap). In medical segmentation tasks, the Dice coefficient is particularly valuable because it accounts for both false positives and false negatives, making it more suitable than simple accuracy metrics when dealing with imbalanced classes (where tumor regions are much smaller than background). A Dice score above 0.7 is generally considered good for medical segmentation, and our model achieves strong performance across different tumor types.
- The model outputs probabilities that each pixel is, say, a tumor or not, and we want to be able to backpropagate through those outputs. Therefore, we need an analogue of the Dice loss which takes real valued input. This is where the Soft Dice loss comes in. The formula is:
The Soft Dice Loss formula is a differentiable version of the Dice coefficient that enables gradient-based optimization during training. Traditional Dice coefficient is not differentiable because it uses hard binary predictions, but Soft Dice Loss uses probability predictions, allowing the model to learn through backpropagation. This loss function is particularly effective for medical segmentation because it naturally handles class imbalance - it focuses on correctly identifying the small tumor regions rather than being dominated by the large background class. The formula shown here computes the soft Dice loss for each class, and we average across all classes to get the final loss value. This approach significantly improves training stability and final segmentation accuracy compared to traditional cross-entropy loss.
- The model covers some of the relevant areas, but it's definitely not perfect. To quantify its performance, we can use per-pixel sensitivity and specificity. Recall that in terms of the true positives, true negatives, false positives, and false negatives.
This metrics visualization shows the formulas for Sensitivity (Recall) and Specificity, two crucial performance metrics in medical diagnosis. Sensitivity measures the model's ability to correctly identify positive cases (tumors) - it answers "Of all the actual tumors, how many did we correctly detect?" High sensitivity is critical in medical applications because missing a tumor (false negative) can have serious consequences. Specificity measures the model's ability to correctly identify negative cases (healthy tissue) - it answers "Of all the healthy regions, how many did we correctly identify as healthy?" These metrics are calculated using True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). In our brain tumor segmentation model, we achieve high sensitivity and specificity across all tumor classes, ensuring both accurate detection and minimal false alarms.
This visualization demonstrates patch-level predictions from our 3D U-Net model. Due to memory constraints, we process the large MRI volumes (240×240×155 voxels) in smaller patches (160×160×16). The left side shows the original MRI slice with ground truth segmentation overlay, while the right side shows our model's predictions. The color coding helps visualize different tumor types: red regions indicate edema (swelling around the tumor), blue regions show enhancing tumor (active, vascularized tumor tissue), and green regions represent non-enhancing tumor (necrotic or less active tissue). The patch-based approach allows us to train and infer on high-resolution volumes while maintaining computational efficiency.
This metrics table shows the quantitative performance of our model at the patch level. The table displays key performance indicators including Dice coefficient, Sensitivity, and Specificity for each tumor class (Edema, Non-enhancing tumor, Enhancing tumor) as well as the overall "Complete" tumor (all classes combined). These metrics validate that our model achieves clinically relevant accuracy, with Dice scores indicating strong overlap between predictions and ground truth. The high sensitivity values ensure we don't miss tumors, while high specificity minimizes false positive detections. This quantitative validation is essential before deploying the model for clinical use.
- As of now, our model just runs on patches, but what we really want to see is our model's result on a whole MRI scan.
- To do this, generate patches for the scan, then we run the model on the patches and combine the results together to get a fully labeled MR image.
This visualization shows the complete whole-scan prediction result, where we've combined predictions from multiple patches to reconstruct the full MRI volume segmentation. The process involves: (1) generating overlapping patches across the entire volume, (2) running inference on each patch, (3) combining predictions using techniques like weighted averaging or majority voting at overlapping regions. The result demonstrates how our model successfully segments tumors across the entire brain volume, maintaining spatial consistency and accuracy. The visualization clearly shows the model's ability to identify and classify different tumor regions throughout the 3D structure, which is essential for treatment planning and monitoring tumor progression.
This metrics table presents the performance evaluation for whole-scan predictions, which is the final output that would be used in clinical practice. The metrics show how well our model performs when processing complete MRI volumes rather than individual patches. The whole-scan evaluation is more challenging because it requires maintaining consistency across patch boundaries and handling edge cases. The results demonstrate that our patch combination strategy successfully preserves accuracy at the full volume level, with Dice coefficients and other metrics remaining strong. This validates that our approach can be deployed for real-world clinical applications where complete volume analysis is required.
Multi-Pathology Detection System: Automated diagnosis of 14 pathologies from Chest X-Ray images using advanced deep learning. Integrated GradCAM-based interpretability for clinical validation.
This header image represents the chest X-ray analysis component of our AI healthcare system. Chest X-rays are one of the most common diagnostic imaging procedures, and our deep learning model can simultaneously detect 14 different pathologies from a single X-ray image, assisting radiologists in making faster and more accurate diagnoses.
- Objective: Develop a production-ready multi-label classification system for automated pathology detection from chest X-ray images
- Model Architecture: Fine-tuned DenseNet-121 model capable of simultaneous binary classification for 14 distinct pathologies (Cardiomegaly, Mass, Pneumothorax, Edema, etc.)
- Class Imbalance Mitigation: Implemented sophisticated weight normalization strategies to handle low prevalence of abnormalities in the dataset
- Clinical Interpretability: Integrated GradCAM visualization to provide spatial attention maps, enabling radiologists to understand model reasoning and validate predictions. This interpretability framework is essential for clinical deployment, error analysis, and regulatory compliance.
The project uses chest x-ray images from a publicly available medical imaging dataset. This dataset contains 108,948 frontal-view X-ray images of 32,717 unique patients. Each image in the data set contains multiple text-mined labels identifying 14 different pathological conditions. These in turn can be used by physicians to diagnose 8 different diseases. For the project we have been working with a ~1000 image subset of the images.
- 875 images to be used for training.
- 109 images to be used for validation.
- 420 images to be used for testing.
The dataset includes a CSV file that provides the ground truth labels for each X-ray.
With our dataset splits ready, we can now proceed with setting up our model to consume them. For this we use the ImageDataGenerator class from the Keras framework, which allows us to build a "generator" for images specified in a dataframe. This class also provides support for basic data augmentation such as random horizontal flipping of images. We also use the generator to transform the values in each batch so that their mean is 0 and their standard deviation is 1. This will facilitate model training by standardizing the input distribution. The generator also converts our single channel X-ray images (gray-scale) to a three-channel format by repeating the values in the image across all channels. We will want this because the pre-trained model that we'll use requires three-channel inputs.
DenseNet was introduced in 2017 in an award-winning paper that revolutionized convolutional neural network design. The model was able to outperform previous architectures like ResNet by introducing a novel dense connectivity pattern.
Regardless of the architectural designs of these networks, they all try to create channels for information to flow between the initial layers and the final layers. DenseNet, with the same objective, create paths between the layers of the network.
This diagram illustrates the fundamental difference between DenseNet and traditional CNN architectures. In a standard CNN (left), each layer only receives input from the previous layer. In DenseNet (right), each layer receives feature maps from all preceding layers and passes its own feature maps to all subsequent layers. This dense connectivity pattern creates a more efficient information flow, allowing the network to reuse features learned at different levels and achieve better performance with fewer parameters. This architecture is particularly well-suited for medical image analysis where we need to detect both fine-grained details and high-level patterns.
- DenseNet key novelty:
Densenet is a convolutional network where each layer is connected to all other layers that are deeper in the network
- The first layer is connected to the 2nd, 3rd, 4th etc.
- The second layer is connected to the 3rd, 4th, 5th etc.
Each layer in a dense block receives feature maps from all the preceding layers, and passes its output to all subsequent layers. Feature maps received from other layers are fused through concatenation, and not through summation (like in ResNets). Extracted feature maps are continuously added together with previous ones which avoids redundant and duplicate work.
This animated visualization demonstrates how dense connections work in a DenseNet block. The animation shows how each layer (represented as a block) receives inputs from all previous layers (shown as incoming arrows) and sends its output to all subsequent layers (shown as outgoing arrows). Unlike ResNet's additive connections, DenseNet uses concatenation, which preserves all feature information from previous layers. This creates an exponential growth in feature reuse - layer 4 has access to features from layers 1, 2, and 3, layer 5 has access to features from layers 1-4, and so on. This dense connectivity pattern enables the network to learn more efficiently and achieve better performance with fewer parameters, making it ideal for medical imaging applications where computational efficiency and accuracy are both critical.
This allows the network to re-use learned information and be more efficient. Such networks require fewer layers. State of the art results are achieved with as low as 12 channel feature maps. This also means the network has fewer parameters to learn and is therefore easier to train. Amongst all variants, DenseNet-121 is the standard one.
This comparison chart shows different DenseNet variants (DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-264) and their architectural differences. The numbers indicate the total number of layers in each network. DenseNet-121, which we use in this project, provides an optimal balance between model complexity and performance. Despite having fewer layers than deeper variants, DenseNet-121 achieves excellent results for medical image classification tasks while being computationally efficient. The chart helps illustrate that deeper networks don't always mean better performance - the dense connectivity pattern allows shallower networks to achieve state-of-the-art results by maximizing feature reuse.
Key contributions of the DenseNet architecture:
- Alleviates vanishing gradient problem ( as networks get deeper, gradients aren’t back-propagated sufficiently to the initial layers of the network. The gradients keep getting smaller as they move backwards into the network and as a result, the initial layers lose their capacity to learn the basic low-level features)
- Stronger feature propagation
- Feature re-use
- Reduced parameter count
DenseNet is composed of Dense blocks. In those blocks, the layers are densely connected together: Each layer receive in input all previous layers output feature maps. The DenseNet-121 comprises 4 dense blocks, which themselves comprise 6 to 24 dense layers.
- Dense block: A dense block comprises n dense layers. These dense layers are connected such that each dense layer receives feature maps from all preceding layers and passes it's feature maps to all subsequent layers. The dimensions of the features (width, height) stay the same in a dense block.
This diagram provides a detailed view of a Dense Block structure. Within a dense block, all layers maintain the same spatial dimensions (width and height), allowing feature maps to be directly concatenated. The diagram shows how each layer (L0, L1, L2, etc.) receives concatenated feature maps from all previous layers as input. The growth rate parameter (k=32 for DenseNet-121) controls how many new feature maps each layer produces. This design ensures that early layers' features are preserved and reused throughout the network, enabling the model to capture both low-level details (edges, textures) and high-level semantic information (pathological patterns) simultaneously. This is particularly valuable for chest X-ray analysis where we need to detect subtle abnormalities at various scales.
- Dense layer:
Each dense-layer consists of 2 convolutional operations.
- 1 X 1 CONV (conventional conv operation for extracting features)
- 3 X 3 CONV (bringing down the feature depth/channel count)
This diagram illustrates the internal structure of a single dense layer. Each dense layer uses a bottleneck design with two convolutional operations: first a 1×1 convolution that acts as a bottleneck to reduce the number of input feature maps (this makes the network more efficient), followed by a 3×3 convolution that performs the actual feature extraction. The 1×1 convolution reduces computational cost by compressing the concatenated feature maps before the more expensive 3×3 convolution. This bottleneck design is crucial for efficiency, especially as the number of input feature maps grows with each layer in the dense block. The BatchNorm→ReLU→Conv sequence (pre-activation) ensures stable training and better gradient flow, which is essential for training deep networks on medical imaging data.
The CONV layer corresponds to the sequence BatchNorm->ReLU->Conv. A layer has each sequence repeated twice, the first with 1x1 Convolution bottleneck producing: grow rate x 4 feature maps, the second with 3x3 convolution. The authors found that the pre-activation mode (BN and ReLU before the Conv) was more efficient than the usual post-activation mode.
The growth rate (k= 32 for DenseNet-121) defines the number of output feature maps of a layer. Basically the layers output 32 feature maps which are added to a number of 32 feature maps from previous layers. While the depth increases continuously, each layer bring back the depth to 32.
This visualization demonstrates how feature map depth grows within a dense block. The growth rate (k=32) means each layer produces exactly 32 new feature maps. However, because each layer concatenates its output with all previous layers' outputs, the total number of feature maps grows linearly: layer 0 produces 32 maps, layer 1 produces 32 more (total 64), layer 2 produces 32 more (total 96), and so on. Despite this growth, the bottleneck design (1×1 convolution) keeps the computational cost manageable. The diagram shows how this controlled growth allows the network to accumulate diverse features while maintaining efficiency. This feature accumulation is particularly powerful for medical image analysis, as it allows the model to simultaneously consider features at multiple levels of abstraction when making diagnostic predictions.
- Transition layer: In between dense blocks, you find Transition layer. Instead of summing the residual like in ResNet, DenseNet concatenates all the feature maps. A transition layer is made of: Batch Normalization -> 1x1 Convolution -> Average pooling. Transition layers between two dense blocks ensure the down-sampling role (x and y dimensions halved), essential to CNN. Transition layers also compress the feature map and reduce the channels by half. This contributes to the compactness of the network.
This diagram illustrates the transition layer that sits between dense blocks. The transition layer serves two critical functions: (1) spatial downsampling - it halves the width and height of feature maps using average pooling, which is essential for building hierarchical representations in CNNs, and (2) channel compression - it reduces the number of feature channels by half using 1×1 convolution, which prevents the network from becoming too wide and computationally expensive. The Batch Normalization ensures stable training. This design maintains the network's efficiency while allowing it to process features at multiple scales. For chest X-ray analysis, this multi-scale processing is crucial - we need to detect both large structures (like enlarged hearts in cardiomegaly) and small abnormalities (like small masses or nodules).
Although Concatenating generates a lot of input channels, DenseNet’s convolution generates a low number of feature maps (The authors recommend 32 for optimal performance but world-class performance was achieved with only 12 output channels).
Key benefits:
- Compactness. DenseNet-201 with 20M parameters yields similar validation error as a 101-layer ResNet with 45M parameters.
- The learned features are non-redundant as they are all shared through a common knowledge.
- Easier to train because the gradient is flowing back more easily thanks to the short connections.
In this project, the model uses 320 x 320 X-Rays images and outputs predictions for each of the 14 pathologies as illustrated below on a sample image.
This visualization demonstrates the multi-label classification capability of our DenseNet-121 model. For a single chest X-ray input, the model simultaneously predicts the probability of 14 different pathologies. The image shows a sample X-ray with predictions displayed for each condition, including Cardiomegaly (enlarged heart), Mass, Pneumothorax (collapsed lung), Edema (fluid accumulation), and others. Each prediction is a binary classification (present/absent) with an associated confidence score. This multi-label approach is more efficient and clinically relevant than training separate models for each pathology, as it allows the model to learn relationships between different conditions and leverages shared features across pathologies. The model can detect multiple conditions in a single image, which reflects real-world clinical scenarios where patients often present with multiple findings.
I used a pre-trained model which performance can be evaluated using the ROC curve shown at the bottom. The best results are achieved for Cardiomegaly (0.9 AUC), Edema (0.86) and Mass (0.82). Ideally we want to be significantly closer to 1. You can check out below the performance from the ChexNeXt paper and their model as well as radiologists on this dataset.
Looking at unseen X-Rays, the model correctly predicts the predominant pathology, generating a somehow accurate diagnotic, highlighting the key region underlying its predictions. In addition to the main diagnostic (highest prediction), the model also predicts secondary issues similarly to what a radiologist would comment as part of his analysis. This can be either false positive from noise captured in the X-rays or cumulated pathologies.
This result visualization demonstrates our model's diagnostic capabilities using GradCAM (Gradient-weighted Class Activation Mapping) heatmaps. The left panel shows the original chest X-ray, while the right panel shows a color-coded heatmap indicating where the model focuses its attention when making predictions. Warm colors (red/yellow) indicate regions of high importance, while cool colors (blue) indicate lower importance. The model correctly predicts Cardiomegaly (enlarged heart) and correctly identifies the absence of mass or edema. The heatmap shows the model is focusing on the cardiac silhouette region, which is exactly where a radiologist would look for signs of cardiomegaly. The probability for mass is higher than expected, and we can see from the heatmap that it may be influenced by anatomical structures in the middle of the chest cavity and around the shoulder, demonstrating how the model's attention can help identify potential false positives or areas requiring closer examination.
This visualization shows another example of our model's performance with GradCAM interpretation. The model successfully detects a mass near the center of the chest cavity on the right side, as indicated by the bright red/yellow regions in the heatmap. The heatmap clearly highlights the specific location where the model identified the abnormality, providing interpretable evidence for the diagnosis. Interestingly, the model also assigns a high score to Edema for this image, though the ground truth doesn't mention it. This could indicate either a false positive or an undetected finding in the ground truth labels. The GradCAM visualization helps clinicians understand the model's reasoning and can be used for quality assurance and error analysis.
This result demonstrates the model's ability to detect Edema (fluid accumulation in the lungs), which typically appears as increased opacity in the lower lung fields. The GradCAM heatmap clearly shows the model focusing on the bottom portion of the chest cavity where edema typically manifests. The bright regions in the heatmap correspond to areas of increased lung density that the model identified as indicative of edema. We can also notice that Cardiomegaly has a high score for this image, though the ground truth doesn't include it. This visualization is particularly valuable for error analysis - by examining where the model is looking, we can verify that it's focusing on anatomically appropriate regions. If the model were making predictions based on irrelevant regions, the heatmap would reveal this, allowing us to improve the model or understand its limitations.
This Receiver Operating Characteristic (ROC) curve provides a comprehensive evaluation of our model's performance across all 14 pathologies. The ROC curve plots the True Positive Rate (sensitivity) against the False Positive Rate (1-specificity) at various classification thresholds. The Area Under the Curve (AUC) values shown for each pathology indicate the model's discriminative ability, with 1.0 representing perfect classification and 0.5 representing random guessing. Our model achieves strong performance across multiple pathologies: Cardiomegaly (0.90 AUC), Edema (0.86 AUC), and Mass (0.82 AUC) show particularly good results. The ROC curve is essential for clinical deployment as it helps determine optimal decision thresholds that balance sensitivity (detecting all cases) and specificity (avoiding false alarms) based on clinical priorities. Different pathologies may require different thresholds - for example, we might want higher sensitivity for life-threatening conditions even if it means more false positives.
- Objective: Develop a clinical risk prediction system to estimate 10-year mortality risk based on 18 clinical and demographic factors (age, gender, systolic blood pressure, BMI, etc.)
- Modeling Approach: Systematic comparison of linear regression and random forest ensemble models with rigorous hyperparameter optimization
- Evaluation Framework: Comprehensive model comparison using concordance index (C-index), the gold standard metric for survival analysis and risk prediction models
Technical Skills Demonstrated: This component showcases senior-level expertise in traditional machine learning, advanced feature engineering, sophisticated missing data imputation strategies (multivariate iterative imputation), systematic model selection and comparison, and production-grade explainable AI using SHAP values. The implementation demonstrates proficiency in epidemiological data analysis, survival modeling, and clinical risk stratification systems.
- For this project, I will be using the NHANES I epidemiology dataset.
- Looking at our training and validation data, it is conclusive that some of the data is missing: some values in the output of the previous cell are marked as NaN ("not a number"). Missing data is a common occurrence in data analysis, that can be due to a variety of reasons, such as measuring instrument malfunction, respondents not willing or not able to supply information, and errors in the data collection process.
This missing data visualization (also known as a missingness plot) provides a comprehensive view of data quality in our epidemiological dataset. Each row represents a patient record, and each column represents a feature (age, gender, blood pressure, BMI, etc.). Black regions indicate present data, while light/white regions indicate missing values. This visualization reveals important patterns: we can see that many values are missing for systolic blood pressure (Systolic BP), which appears as a prominent white column. The pattern of missingness is crucial - if data is "Missing Completely At Random" (MCAR), simple imputation works well, but if missingness correlates with other variables (e.g., patients with high blood pressure might be less likely to have it measured), we need more sophisticated imputation strategies. This analysis informs our data preprocessing approach and ensures we handle missing data appropriately to avoid introducing bias into our risk prediction model.
- Seeing that our data is not missing completely at random, we can handle the missing values by replacing them with substituted values based on the other values that we have. This is known as imputation.
- The first imputation strategy that we will use is mean substitution: we will replace the missing values for each feature with the mean of the available values.
- Next, we will apply another imputation strategy, known as multivariate feature imputation, using scikit-learn's IterativeImputer class. With this strategy, for each feature that is missing values, a regression model is trained to predict observed values based on all of the other features, and the missing values are inferred using this model. As a single iteration across all features may not be enough to impute all missing values, several iterations may be performed, hence the name of the class IterativeImputer.
- shap is a library that explains predictions made by machine learning models.
- sklearn is one of the most popular machine learning libraries.
- itertools allows us to conveniently manipulate iterable objects such as lists.
- pydotplus is used together with IPython.display.Image to visualize graph structures such as decision trees.
- numpy is a fundamental package for scientific computing in Python.
- pandas is what we'll use to manipulate our data.
- seaborn is a plotting library which has some convenient functions for visualizing missing data.
- matplotlib is a plotting library.
- Linear regression is an appropriate analysis to use for predicting the risk value using multiple features.
- It is used to find the best fitting model to describe the relationship between a set of features (also referred to as input, independent, predictor, or explanatory variables) and an outcome value (also referred to as an output, dependent, or response variable).
- It is necessary transform the data so that the distributions are closer to standard normal distributions. First, remove some of the skew from the distribution by using the log transformation. Then "standardize" the distribution so that it has a mean of zero and standard deviation of 1.
This process flow diagram illustrates the data preprocessing pipeline for our risk prediction models. The workflow shows how raw epidemiological data is transformed through several stages: (1) Data loading and initial exploration to understand data quality and distributions, (2) Missing data imputation using advanced techniques (mean substitution and multivariate feature imputation), (3) Data transformation including log transformation to handle skewed distributions and standardization to normalize features, (4) Feature engineering including interaction terms to capture non-linear relationships, and (5) Model training and evaluation. This systematic approach ensures data quality and model reliability. The preprocessing steps are critical for risk prediction models because medical data often contains missing values, skewed distributions, and complex interactions between variables that must be properly handled to achieve accurate predictions.
- One possible way to improve the model is by adding interactions of the features.
- Interactions means combining two features by multiplying values of each row.
- I have tried to add all the interactions possible and calculate c_index to draw conclusions.
- Random forests combine predictions from different decision trees to create a robust classifier.
This decision tree visualization demonstrates how a single tree in our random forest makes predictions. The tree structure shows how the model recursively splits the data based on feature values (e.g., age > 50, systolic BP > 140) to arrive at risk predictions. Each node represents a decision point, and each leaf represents a final prediction. Decision trees are highly interpretable - we can trace the path from root to leaf to understand exactly why a particular risk score was assigned. However, individual trees can be unstable and prone to overfitting. The random forest addresses this by training many trees on different subsets of data and features, then combining their predictions. This ensemble approach provides both accuracy (through aggregation) and robustness (through diversity), making it ideal for risk prediction where we need reliable, generalizable models. The visualization helps explain the model's decision-making process to clinicians and patients.
- The fundamental concept behind random forest is a simple but powerful one — the wisdom of crowds. In data science speak, the reason that the random forest model works so well is: A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models.
- It is important to tune (or optimize) the hyperparameters, to find a model that both has good predictive performance and minimizes overfitting. The hyperparameters chosen to adjust would be:
- n_estimators: the number of trees used in the forest.
- max_depth: the maximum depth of each tree.
- min_samples_leaf: the minimum number (if int) or proportion (if float) of samples in a leaf.
- Concordance index(c_index) is used to evaluate risk model classifiers.The c-index measures the discriminatory power of a risk score.
- Intuitively, a higher c-index indicates that the model's prediction is in agreement with the actual outcomes of a pair of patients. The formula for the c-index is:
The Concordance Index (C-index), also known as the Harrell's C-statistic, is the gold standard metric for evaluating risk prediction models in survival analysis and medical risk assessment. The formula calculates the proportion of "concordant pairs" among all "permissible pairs" of patients. A permissible pair consists of two patients with different outcomes (one died, one survived). A concordant pair is one where the patient with the higher predicted risk actually had the worse outcome. The C-index ranges from 0 to 1, where 0.5 indicates random performance and 1.0 indicates perfect discrimination. In clinical practice, a C-index above 0.7 is considered acceptable, above 0.8 is good, and above 0.9 is excellent. This metric is particularly valuable for risk models because it evaluates the model's ability to correctly rank patients by risk, which is often more important than absolute risk predictions.
- A permissible pair is a pair of patients who have different outcomes.
- A concordant pair is a permissible pair in which the patient with the higher risk score also has the worse outcome.
- A tie is a permissible pair where the patients have the same risk score.
This visualization compares the performance of different risk prediction models using the C-index metric. The chart shows how various modeling approaches (linear models with different imputation strategies, random forest models with different hyperparameters) perform in terms of their ability to discriminate between high-risk and low-risk patients. Higher C-index values indicate better model performance. The comparison helps us select the best model for deployment. We can see that the random forest model with optimized hyperparameters achieves the highest C-index, demonstrating the value of ensemble methods for complex risk prediction tasks. This model comparison is essential for ensuring we deploy the most accurate and reliable risk prediction system.
- Using a random forest has improved results, but there is loss of the natural interpretability of trees.
- SHAP (SHapley Additive exPlanations), is a cutting edge method that explains predictions made by black-box machine learning models (i.e. models which are too complex to be understandable by humans as is).
- Given a prediction made by a machine learning model, SHAP values explain the prediction by quantifying the additive importance of each feature to the prediction. SHAP values have their roots in cooperative game theory, where Shapley values are used to quantify the contribution of each player to the game.
- Although it is computationally expensive to compute SHAP values for general black-box models, in the case of trees and forests there exists a fast polynomial-time algorithm that makes SHAP analysis practical for random forest models.
This SHAP summary plot provides a comprehensive view of feature importance and impact direction across all patients in our dataset. Each dot represents one patient, positioned horizontally based on the SHAP value (how much that feature contributed to the prediction for that patient) and colored by the feature value (red = high value, blue = low value). Features are sorted by their average absolute SHAP value (importance). The red sections on the left show features that push predictions toward higher risk (e.g., high age increases risk), while blue sections on the right show features that reduce risk (e.g., being female reduces risk). This visualization reveals that age is the most important feature, followed by systolic blood pressure and other factors. The spread of dots shows how feature importance varies across patients - some features are consistently important, while others only matter for certain patient subgroups. This interpretability is crucial for clinical deployment, as it helps clinicians understand and trust the model's predictions.
This SHAP bar plot provides a different perspective on feature importance, showing the mean absolute SHAP value for each feature. This gives us an overall ranking of which features are most important for risk prediction across all patients. It can be clearly observed that being a woman (sex = 2.0, as opposed to men for which sex = 1.0) has a negative SHAP value on average, meaning that it generally reduces the risk of dying within 10 years. High age and high systolic blood pressure have positive SHAP values on average, and are therefore related to increased mortality. This bar chart complements the summary plot by providing a clear ranking of feature importance, which is useful for understanding which risk factors the model considers most critical. This information can guide clinical decision-making and help identify modifiable risk factors.
These SHAP dependence plots reveal how features interact with each other in our risk prediction model. The left plot shows how the SHAP value for Age varies across different age values, with points colored by Gender. The right plot shows how the SHAP value for Poverty Index varies, with points colored by Age. These plots reveal important interactions: Age > 50 is generally associated with increased risk (positive SHAP value), but being a woman (red points) generally reduces the impact of age on risk. This makes biological sense since we know that women generally live longer than men, so age has a different risk implication depending on gender. The poverty index plot shows that the impact of poverty on risk drops off quickly for higher-income individuals, and for these individuals, age begins to explain much of the variation in how poverty affects risk. These interaction effects are crucial for understanding the model's behavior and ensuring it makes clinically sensible predictions. The ability to visualize and understand these complex interactions demonstrates the value of SHAP for model interpretability in healthcare applications.

































