Describe the enhancement requested
Joining two tables where 1 has any column of type list (even if it's not the join column) results in an exception. For example:
import pyarrow as pa
import random
NUM_ITEMS = 30
t1 = pa.Table.from_pydict({
'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
'array_column': [[z for z in range(3)] for x in range(NUM_ITEMS)],
})
t2 = pa.Table.from_pydict({
'id': [x.to_bytes(4, 'big') for x in range (NUM_ITEMS)],
'value': [x for x in range(NUM_ITEMS)]
})
t1.join(t2, 'id', join_type='inner')
Results in the following exception:
ArrowInvalid: Data type list<item: int64> is not supported in join non-key field
This exception is fairly unintuitive (I spent a few hours today trying to understand what was causing this exception) and could be made a lot clearer by providing the field name if it's available (I'm new to Arrow but I believe the name should be available?
|
const std::string* name() const { |
|
return IsName() ? &std::get<std::string>(impl_) : NULLPTR; |
|
} |
)
Component(s)
C++