Skip to content

[C++] CSV string column category to dictionary/indices? #5927

@ntfshard

Description

@ntfshard

Hello

I'm a newcomer and not quite sure about the library usage. I tried to find some documentation about it but failed.

I have a dataset in CSV file where one column(let's call it colour) is a string category. I'd like to get indices instead of text_lines to pass it inside algorithm.
I tried to set column_types in ConvertOptions in
{{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(), arrow::utf8()) }} but it seems to be not right api usage, a wild run-time error appears: NotImplemented: CSV conversion to dictionary<values=string, indices=int32, ordered=0> is not supported
Also I find a merged PR #5785 but not quite sure that's applicable for my case.

So, my question is: can I get indices inside a category column only w/ library API. And if yes, what I doing wrong. :)

In other word, I'd like to something like such python pandas code:
df[column] = df[column].cat.codes # if str(column_data_type) == "category"

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions