-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-33346: [Python] DataFrame Interchange Protocol for pyarrow Table #14804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jorisvandenbossche
merged 49 commits into
apache:master
from
AlenkaF:ARROW-18152-second
Jan 13, 2023
Merged
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
ca526a7
Produce a __dataframe__ object - squshed commits from #14613
AlenkaF d0ca2b1
Add column convert methods
AlenkaF c356cd1
Fix linter errors
AlenkaF 8fb50c5
Add from_dataframe method details
AlenkaF 4fab43b
Add tests for from_dataframe / pandas roundtrip
AlenkaF 47ce2d6
Skip from_dataframe tests for older pandas versions
AlenkaF 1c2955f
Add support for LargeStringArrays in Column class
AlenkaF 0517a6d
Add test for uint and make changes to test_offset_of_sliced_array() a…
AlenkaF a7313fb
Prefix table metadata with pyarrow.
AlenkaF 9583672
Update from_dataframe method
AlenkaF 9e8733c
Try to add warnings to places where copies of data are being made
AlenkaF 855ec8a
Update python/pyarrow/interchange/column.py
AlenkaF beec5aa
Expose from_dataframe in interchange/__init__.py
AlenkaF b11d84e
Add lost whitespace lines
AlenkaF 6c9dce4
Revert commented categories in CategoricalDescription, column.py
AlenkaF 8d91b67
Add _dtype attribute to __inti__ of the Column class and move all the…
AlenkaF 4643f9b
Raise an error if nan_as_null=True
AlenkaF a93a46e
Linter corrections
AlenkaF 0b231ea
Add better test coverage for test_mixed_dtypes and test_dtypes
AlenkaF d8ab902
Add better test coverage for test_pandas_roundtrip and add large_memo…
AlenkaF 21af8fb
Add pyarrow roundtrip tests and make additional corrections to the co…
AlenkaF d6140d4
Correct large string handling and make smaller corrections in convert…
AlenkaF e0d1e63
Change dict arguments in protocol_df_chunk_to_pyarrow
AlenkaF 6067fb3
Update dataframe.num_chunks() method to use to_batches
AlenkaF c6eb5f3
Check for sentinel values in the datetime more efficently
AlenkaF 1a67177
Make bigger changes to how masks and arrays are constructed
AlenkaF 51dcc49
Import from pandas.api.interchange
AlenkaF 4879ef2
Add a check for use_nan, correct test using np.nan and put back check…
AlenkaF 1cbd594
Add test coverage for pandas -> pyarrow conversion
AlenkaF a6b6e54
Rename test_extra.py to test_conversion.py
AlenkaF 2e36185
Skip pandas -> pyarrow tests for older versions of pandas
AlenkaF 4ca948d
Add test coverage for sliced table in pyarrow roundtrip
AlenkaF 719ab88
Correct the handling of bitpacked booleans
AlenkaF 91ea335
Small change in slicing parametrization
AlenkaF c74eb45
Add a RuntimeError for boolean and categorical columns in from_datafr…
AlenkaF c137337
Optimize datetime handling in from_dataframe
AlenkaF 1e9cef9
Optimize buffers_to_array in from_dataframe.py
AlenkaF 0c539a0
Apply suggestions from code review - Joris
AlenkaF 6399be3
Add string column back to test_pandas_roundtrip for pandas versions 2…
AlenkaF 9f68fe7
Fix linter error
AlenkaF b926066
Remove pandas specific comment for nan_as_null in dataframe.py
AlenkaF 5c5d25e
Fix typo boolen -> categorical in categorical_column_to_dictionary
AlenkaF f2a65a6
Add a comment for float16 NotImplementedError in validity_buffer_nan_…
AlenkaF 075e888
Update validity_buffer_nan_sentinel in python/pyarrow/interchange/fro…
AlenkaF efa12d6
Make change to the offset buffers part of buffers_to_array
AlenkaF 858cadb
Linter correction
AlenkaF e937b4c
Update the handling of allow_copy keyword
AlenkaF 1b5f248
Fix failing nightly test
AlenkaF 9139444
Fix the fix for the failing test
AlenkaF File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| # flake8: noqa | ||
|
|
||
| from .from_dataframe import from_dataframe | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| from __future__ import annotations | ||
| import enum | ||
|
|
||
| import pyarrow as pa | ||
|
|
||
|
|
||
| class DlpackDeviceType(enum.IntEnum): | ||
| """Integer enum for device type codes matching DLPack.""" | ||
|
|
||
| CPU = 1 | ||
| CUDA = 2 | ||
| CPU_PINNED = 3 | ||
| OPENCL = 4 | ||
| VULKAN = 7 | ||
| METAL = 8 | ||
| VPI = 9 | ||
| ROCM = 10 | ||
|
|
||
|
|
||
| class _PyArrowBuffer: | ||
| """ | ||
| Data in the buffer is guaranteed to be contiguous in memory. | ||
|
|
||
| Note that there is no dtype attribute present, a buffer can be thought of | ||
| as simply a block of memory. However, if the column that the buffer is | ||
| attached to has a dtype that's supported by DLPack and ``__dlpack__`` is | ||
| implemented, then that dtype information will be contained in the return | ||
| value from ``__dlpack__``. | ||
|
|
||
| This distinction is useful to support both data exchange via DLPack on a | ||
| buffer and (b) dtypes like variable-length strings which do not have a | ||
| fixed number of bytes per element. | ||
| """ | ||
|
|
||
| def __init__(self, x: pa.Buffer, allow_copy: bool = True) -> None: | ||
| """ | ||
| Handle PyArrow Buffers. | ||
| """ | ||
| self._x = x | ||
|
|
||
| @property | ||
| def bufsize(self) -> int: | ||
| """ | ||
| Buffer size in bytes. | ||
| """ | ||
| return self._x.size | ||
|
|
||
| @property | ||
| def ptr(self) -> int: | ||
| """ | ||
| Pointer to start of the buffer as an integer. | ||
| """ | ||
| return self._x.address | ||
|
|
||
| def __dlpack__(self): | ||
| """ | ||
| Produce DLPack capsule (see array API standard). | ||
|
|
||
| Raises: | ||
| - TypeError : if the buffer contains unsupported dtypes. | ||
| - NotImplementedError : if DLPack support is not implemented | ||
|
|
||
| Useful to have to connect to array libraries. Support optional because | ||
| it's not completely trivial to implement for a Python-only library. | ||
| """ | ||
| raise NotImplementedError("__dlpack__") | ||
|
|
||
| def __dlpack_device__(self) -> tuple[DlpackDeviceType, int | None]: | ||
| """ | ||
| Device type and device ID for where the data in the buffer resides. | ||
| Uses device type codes matching DLPack. | ||
| Note: must be implemented even if ``__dlpack__`` is not. | ||
| """ | ||
| if self._x.is_cpu: | ||
| return (DlpackDeviceType.CPU, None) | ||
| else: | ||
| raise NotImplementedError("__dlpack_device__") | ||
|
|
||
| def __repr__(self) -> str: | ||
| return ( | ||
| "PyArrowBuffer(" + | ||
| str( | ||
| { | ||
| "bufsize": self.bufsize, | ||
| "ptr": self.ptr, | ||
| "device": self.__dlpack_device__()[0].name, | ||
| } | ||
| ) + | ||
| ")" | ||
| ) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.