-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13089: [Python] Allow creating RecordBatch from Python dict #10854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyarrow/table.pxi
Outdated
| v = mapping[field.name] | ||
| except KeyError: | ||
| try: | ||
| v = mapping[tobytes(field.name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure allowing bytes keys is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently we do support that in Table.from_pydict (but I agree this doesn't seem needed ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to bytes function was removed.
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kharoc thanks for the PR!
Since the implementation is almost identical between Recordbatch and Table, I think we should try to share the code. Both classes don't share a base class at the moment, so maybe a helper function is the easiest (the helper function can return the arguments to be passed to RecordBatch/Table.from_arrays)
python/pyarrow/table.pxi
Outdated
| v = mapping[field.name] | ||
| except KeyError: | ||
| try: | ||
| v = mapping[tobytes(field.name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently we do support that in Table.from_pydict (but I agree this doesn't seem needed ..)
python/pyarrow/tests/test_table.py
Outdated
| assert result.equals(expected) | ||
|
|
||
|
|
||
| def test_recordbatch_from_pydict(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this next to the existing test for table (to keep the similar tests close to each other). With some parametrization, we might also be able to reduce the duplication (the actual test doesn't seem to rely on anything RecordBatch/Table specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's solved. Parametrization was added.
|
@kharoc can you also take a look at my non-inline comment (#10854 (review)) about avoiding duplication between the Table and RecordBatch implementation? |
… _from_pydict function
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, looks good!
Create a from_pydict function in RecordBatch class.
Create unit test for from_pydict