-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
releasedIncluded in a releaseIncluded in a release
Description
Describe the bug
The OneHotEncoder uses the schema <old_column_name>_<value> to name the created columns. This can lead to conflicts, however.
To Reproduce
Run this program:
from safeds.data.tabular.containers import Table
from safeds.data.tabular.transformation import OneHotEncoder
if __name__ == '__main__':
table = Table.from_dict({"a_b": ["c"], "a": ["b_c"]})
transformed_table = OneHotEncoder().fit_and_transform(table)
print(transformed_table)It raises an exception:
ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements
The issue is that two columns with the same name (a_b_c) get created.
Expected behavior
No exception. The names of all created columns should be unique. They should also not conflict with existing columns in the Table. This can be done by detecting conflicts between two created columns or between a created column and an existing, unchanged column and appending a suffix _<counter> to the names of the created columns (e.g. a_b_c_1 vs. a_b_c_2).
Screenshots (optional)
No response
Additional Context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Labels
releasedIncluded in a releaseIncluded in a release
Type
Projects
Status
✔️ Done