-
Notifications
You must be signed in to change notification settings - Fork 5
Closed as not planned
Labels
wontfixThis will not be worked onThis will not be worked on
Description
Is your feature request related to a problem?
Subtask of #710
Desired solution
Create method Baseline._clean(table: Table, target_column: str)->TabularDataset for baseline data cleaning
- Remove columns with high idness or stability (either above 90%), excluding the target column
- Remove columns with high missing value ratio (above 60%)
- Impute all remaining columns with missing values using highest (absolute) correlating column
- One hot encode all non-numerical columns with less than 20 different values, remove all other non-numerical columns
- Remove outliers
- Normalise columns with values greater than 100
Possible alternatives (optional)
No response
Screenshots (optional)
No response
Additional Context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
wontfixThis will not be worked onThis will not be worked on
Type
Projects
Status
✔️ Done