-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Hello
I'm a newcomer and not quite sure about the library usage. I tried to find some documentation about it but failed.
I have a dataset in CSV file where one column(let's call it colour) is a string category. I'd like to get indices instead of text_lines to pass it inside algorithm.
I tried to set column_types in ConvertOptions in
{{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(), arrow::utf8()) }} but it seems to be not right api usage, a wild run-time error appears: NotImplemented: CSV conversion to dictionary<values=string, indices=int32, ordered=0> is not supported
Also I find a merged PR #5785 but not quite sure that's applicable for my case.
So, my question is: can I get indices inside a category column only w/ library API. And if yes, what I doing wrong. :)
In other word, I'd like to something like such python pandas code:
df[column] = df[column].cat.codes # if str(column_data_type) == "category"
Thank you!