Currently, the funciton is using train_test_split() from sklearn twice, but with large datasets, the functions becomes slow and memory demanding due to the fact that we are creating multiple dataframes.
The solution would be just to append a list with the split [train, selection, train, validation ... ]
|
def train_selection_validation_split(data: pd.DataFrame, |
Currently, the funciton is using
train_test_split()from sklearn twice, but with large datasets, the functions becomes slow and memory demanding due to the fact that we are creating multiple dataframes.The solution would be just to append a list with the split
[train, selection, train, validation ... ]cobra/cobra/preprocessing/preprocessor.py
Line 340 in 9141313