Currently, there's no mechanism to balance the age distribution across the train/test split. This can cause very imbalanced splits.
I will implement some stratification by age by putting subjects in 20 percentile bins: https://github.com/SIMEXP/prevent-ad-benchmark/blob/7edf8c4c4d07a0524768ab7879fbdcf14b201beb/src/preventad_benchmark/evaluation/pipelines.py#L53C5-L59
Currently, there's no mechanism to balance the age distribution across the train/test split. This can cause very imbalanced splits.
I will implement some stratification by age by putting subjects in 20 percentile bins: https://github.com/SIMEXP/prevent-ad-benchmark/blob/7edf8c4c4d07a0524768ab7879fbdcf14b201beb/src/preventad_benchmark/evaluation/pipelines.py#L53C5-L59