Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
48b7ea5
POC: use dataset API in existing parquet tests
jorisvandenbossche Jan 28, 2020
7dcd960
support old-style filters
jorisvandenbossche Feb 6, 2020
e502735
add ParquetDatasetV2 shim and use in tests
jorisvandenbossche Mar 23, 2020
81314f7
parametrize read_table tests
jorisvandenbossche Mar 23, 2020
d0e33ec
do not disallow null characters in strings in filters when use_datase…
jorisvandenbossche Mar 24, 2020
9d02fde
add pytest.mark.dataset mark
jorisvandenbossche Mar 24, 2020
5fd6d9e
non-deterministic cases due to use_threads
jorisvandenbossche Mar 24, 2020
599192c
move dataset creation into helper function
jorisvandenbossche Mar 24, 2020
8a780d1
add support for use_pandas_metadata + some cleanup of the tests
jorisvandenbossche Mar 26, 2020
31a2c8f
rename use_dataset -> use_legacy_dataset
jorisvandenbossche Mar 26, 2020
86498a1
consolidate read_table/ParquetDataset code + add errors for unsupport…
jorisvandenbossche Mar 30, 2020
22c0e54
fix expression syntax + add docstring
jorisvandenbossche Mar 30, 2020
c63d185
fix paths test on Windows
jorisvandenbossche Mar 30, 2020
63d5acd
Update python/pyarrow/parquet.py
jorisvandenbossche Mar 31, 2020
ce5166c
Update python/pyarrow/parquet.py
jorisvandenbossche Mar 31, 2020
cd972ba
Update python/pyarrow/tests/test_parquet.py
jorisvandenbossche Mar 31, 2020
be7125b
Update python/pyarrow/tests/test_parquet.py
jorisvandenbossche Mar 31, 2020
c5176d7
consolidate filters docstring
jorisvandenbossche Mar 31, 2020
9e028be
support memory_map
jorisvandenbossche Mar 31, 2020
126e023
feedback
jorisvandenbossche Mar 31, 2020
16de776
enable different partitioning schemes
jorisvandenbossche Apr 2, 2020
9650f65
remove ARROW:schema removal from metadata in read_table for new API
jorisvandenbossche Apr 2, 2020
608b6d4
Apply suggestions from code review
jorisvandenbossche Apr 9, 2020
a2c80f8
Apply suggestions from code review
jorisvandenbossche Apr 9, 2020
9e721f4
update docstrings
jorisvandenbossche Apr 9, 2020
9cbaf3c
deterministic_row_order helper function
jorisvandenbossche Apr 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions python/pyarrow/_dataset.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,8 @@ cdef class ParquetReadOptions:
buffer_size : int, default 8192
Size of buffered stream, if enabled. Default is 8KB.
dictionary_columns : list of string, default None
Names of columns which should be read as dictionaries.
Names of columns which should be dictionary encoded as
they are read.
"""

cdef public:
Expand All @@ -632,9 +633,11 @@ cdef class ParquetReadOptions:
set dictionary_columns

def __init__(self, bint use_buffered_stream=False,
uint32_t buffer_size=8192,
buffer_size=8192,
dictionary_columns=None):
self.use_buffered_stream = use_buffered_stream
if buffer_size <= 0:
raise ValueError("Buffer size must be larger than zero")
self.buffer_size = buffer_size
self.dictionary_columns = set(dictionary_columns or set())

Expand Down Expand Up @@ -1191,7 +1194,8 @@ cdef class FileSystemDatasetFactory(DatasetFactory):
c_options
)
else:
raise TypeError('Must pass either paths or a FileSelector')
raise TypeError('Must pass either paths or a FileSelector, but '
'passed {}'.format(type(paths_or_selector)))

self.init(GetResultValue(result))

Expand Down
Loading