Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
d3a57b6
baby steps for converting SEXP to Array through the Converter api
romainfrancois Nov 12, 2020
f4bc2f0
using RConverter = Converter<RObject*, RConversionOptions>
romainfrancois Nov 13, 2020
3967f46
going a little further with an initial conversion from INTSXP -> int32
romainfrancois Nov 13, 2020
1b2cfcc
handle double
romainfrancois Nov 13, 2020
89e77f5
+ raw
romainfrancois Nov 13, 2020
5d9adf5
rename RObject -> RScalar, + handling of bool
romainfrancois Nov 23, 2020
eeb50c7
skeleton for RPrimitiveConverter for cases:
romainfrancois Nov 23, 2020
d78999b
initial handling of strings
romainfrancois Nov 23, 2020
e07795b
intermediate RBytesView struct to deal with both string and binary (l…
romainfrancois Nov 23, 2020
ad26b41
+ binary/fixed binary handling
romainfrancois Nov 23, 2020
d51f08e
dictionary<string>
romainfrancois Nov 24, 2020
1ecab02
+ list
romainfrancois Nov 24, 2020
6c0dad7
integer64
romainfrancois Nov 24, 2020
0164da6
struct converter
romainfrancois Nov 26, 2020
618aa0c
binary
romainfrancois Nov 26, 2020
505f33b
date32 and date64 (only handle REALSXP backed for now)
romainfrancois Nov 26, 2020
8a10ee2
handle null for bool
romainfrancois Nov 27, 2020
63f95ae
improved RValue::Convert<float/double>
romainfrancois Nov 27, 2020
c102f65
enable_if_integer<T, Result<typename T::c_type>> RValue::Convert
romainfrancois Dec 8, 2020
6f41b74
DATE -> DATE_INT + DATE_DBL because dates in R can be either backed b…
romainfrancois Dec 8, 2020
ea97e21
+ RScalar_to_days (handling date int&dbl for now)
romainfrancois Dec 8, 2020
09dd5a9
using this->struct_builder_
romainfrancois Dec 8, 2020
39bdaa4
POSIXct -> date32 and date64
romainfrancois Dec 8, 2020
488a082
time32 + time64
romainfrancois Dec 8, 2020
a674b70
timestamp
romainfrancois Dec 8, 2020
c2a28ba
virtual RConverter::AppendRange(start, size) + custom impl for RStruc…
romainfrancois Dec 9, 2020
772e3be
AppendRange() for RPrimitiveConverter<null>
romainfrancois Dec 9, 2020
2ef6d0b
work in progress to use AppendRange(), currently fails to compile ...
romainfrancois Jan 5, 2021
1464649
minor fixes (still not compiling)
romainfrancois Jan 5, 2021
88ea123
RConvert::Convert() are static
romainfrancois Jan 6, 2021
ffc3ec1
at least this compiles
romainfrancois Jan 6, 2021
a4dff16
only use RVectorVisitor::Visit()
romainfrancois Jan 6, 2021
bb86b02
RListConverter::AppendRange()
romainfrancois Jan 6, 2021
c972526
class RConverter : public Converter<SEXP, RConversionOptions>
romainfrancois Jan 6, 2021
376888e
remove unused Rscalar concept
romainfrancois Jan 6, 2021
d9c84e7
reuse the short circuit code from the previous api, i.e. this does no…
romainfrancois Jan 7, 2021
f302d96
paths that don't use RConvert::Convert in RPrimitiveConverter< floati…
romainfrancois Jan 7, 2021
82bffd8
lint
romainfrancois Jan 7, 2021
c7a6c0d
reuse code from previous approach StringVectorConverter in RPrimitive…
romainfrancois Jan 7, 2021
9fef824
define DATAPTR for versions of R < 3.5
romainfrancois Jan 8, 2021
8d14d25
include Rdynload, ARROW-10803
romainfrancois Jan 8, 2021
2893679
Add Extend and ExtendMasked to the converter interface
kszucs Dec 10, 2020
a984543
vec_to_arrow() calling Extend() from https://github.com/apache/arrow/…
romainfrancois Jan 8, 2021
8c5eb46
replace the various AppendRange() with Extend()
romainfrancois Jan 8, 2021
0951de9
RStructConverter using this->Reserve() so that potential capacity err…
romainfrancois Jan 8, 2021
fb83aba
adapt code so that can call it from previous api
romainfrancois Jan 20, 2021
f75821a
less restrictive when ingesting binary
romainfrancois Jan 20, 2021
dac3708
rebasing
romainfrancois Feb 5, 2021
f336eab
lint
romainfrancois Feb 5, 2021
b15d96d
change of message
romainfrancois Feb 5, 2021
089c932
needs to Append() on each list element
romainfrancois Feb 5, 2021
2b4c1ae
use vctrs::short_vec_size() because value may be a data frame and XLE…
romainfrancois Feb 5, 2021
fede379
look out for degenerated data frames
romainfrancois Feb 5, 2021
aac9079
update error message to match old api
romainfrancois Feb 5, 2021
91afc37
remove tests that no longer fail
romainfrancois Feb 5, 2021
a4ab89f
insert levels first into memo when ingesting factors
romainfrancois Feb 5, 2021
33f8cd7
re-enable the POSIXlt to strut type thing
romainfrancois Feb 5, 2021
f2d4f8a
tweak error message when Extend() fails on a column of a sruct converter
romainfrancois Feb 8, 2021
24d3334
handle ordered dictionaries
romainfrancois Feb 8, 2021
86a4d3a
lint
romainfrancois Feb 8, 2021
a25c299
Visit always starts at 0, so remove start parameter
romainfrancois Feb 8, 2021
ef6c910
avoid full specialisation inside the class
romainfrancois Feb 8, 2021
90cd418
merge both AppendRangeSameTypeALTREP impl
romainfrancois Feb 8, 2021
784173d
fix "dereferencing type-punned pointer will break strict-aliasing rules"
romainfrancois Feb 9, 2021
6265989
comparison between signed and unsigned integer expressions
romainfrancois Feb 9, 2021
fddcdc7
simplify AppendRangeSameTypeALTREP by improving RVectorVisitor inner …
romainfrancois Feb 9, 2021
a5313f8
comparison between signed and unsigned integer expressions [-Werror=s…
romainfrancois Feb 9, 2021
60af00c
type_inferred was misused
romainfrancois Feb 9, 2021
2ae1598
rename Array__from_vector_reuse_memory
romainfrancois Feb 9, 2021
1e1463f
switch to call vec_to_arrow() from Array$create()
romainfrancois Feb 9, 2021
5e950f9
Table__from_dots() evetually calls vec_to_arrow() when converting R v…
romainfrancois Feb 9, 2021
c4bc7c6
calling vec_to_arrow() from recordbatch.cpp
romainfrancois Feb 9, 2021
c990cf8
restrict Array__from_vector() to its file
romainfrancois Feb 9, 2021
fc5883d
- R callable Array__from_vector()
romainfrancois Feb 10, 2021
5fd175d
move DictionaryArray__FromArrays() to r_to_arrow.cpp
romainfrancois Feb 10, 2021
1372f7b
rm obslete MakeFactorArray
romainfrancois Feb 10, 2021
84d1805
rm MakeStructArray
romainfrancois Feb 10, 2021
ea0c381
rm VectorToArrayConverter
romainfrancois Feb 10, 2021
97f9f93
- GetConverter
romainfrancois Feb 10, 2021
3bce51e
-Array__from_vector()
romainfrancois Feb 10, 2021
7c45774
... end remove array_from_vector.cpp altgether
romainfrancois Feb 10, 2021
fe1c774
revert to using Rf_length() for lists that are not data frames, at le…
romainfrancois Mar 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 41 additions & 41 deletions cpp/src/arrow/python/python_to_arrow.cc
Original file line number Diff line number Diff line change
Expand Up @@ -388,36 +388,36 @@ class PyValue {
}
};

template <typename T>
Status Extend(T* converter, PyObject* values, int64_t size) {
/// Ensure we've allocated enough space
RETURN_NOT_OK(converter->Reserve(size));
// Iterate over the items adding each one
return internal::VisitSequence(values, [converter](PyObject* item, bool* /* unused */) {
return converter->Append(item);
});
}

// Convert and append a sequence of values masked with a numpy array
template <typename T>
Status ExtendMasked(T* converter, PyObject* values, PyObject* mask, int64_t size) {
/// Ensure we've allocated enough space
RETURN_NOT_OK(converter->Reserve(size));
// Iterate over the items adding each one
return internal::VisitSequenceMasked(
values, mask, [converter](PyObject* item, bool is_masked, bool* /* unused */) {
if (is_masked) {
return converter->AppendNull();
} else {
// This will also apply the null-checking convention in the event
// that the value is not masked
return converter->Append(item); // perhaps use AppendValue instead?
}
});
}

// The base Converter class is a mixin with predefined behavior and constructors.
using PyConverter = Converter<PyObject*, PyConversionOptions>;
class PyConverter : public Converter<PyObject*, PyConversionOptions> {
public:
// Iterate over the input values and defer the conversion to the Append method
Status Extend(PyObject* values, int64_t size) override {
/// Ensure we've allocated enough space
RETURN_NOT_OK(this->Reserve(size));
// Iterate over the items adding each one
return internal::VisitSequence(values, [this](PyObject* item, bool* /* unused */) {
return this->Append(item);
});
}

// Convert and append a sequence of values masked with a numpy array
Status ExtendMasked(PyObject* values, PyObject* mask, int64_t size) override {
/// Ensure we've allocated enough space
RETURN_NOT_OK(this->Reserve(size));
// Iterate over the items adding each one
return internal::VisitSequenceMasked(
values, mask, [this](PyObject* item, bool is_masked, bool* /* unused */) {
if (is_masked) {
return this->AppendNull();
} else {
// This will also apply the null-checking convention in the event
// that the value is not masked
return this->Append(item); // perhaps use AppendValue instead?
}
});
}
};

template <typename T, typename Enable = void>
class PyPrimitiveConverter;
Expand Down Expand Up @@ -669,7 +669,7 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> {
Status AppendSequence(PyObject* value) {
int64_t size = static_cast<int64_t>(PySequence_Size(value));
RETURN_NOT_OK(this->list_builder_->ValidateOverflow(size));
return Extend(this->value_converter_.get(), value, size);
return this->value_converter_->Extend(value, size);
}

Status AppendNdarray(PyObject* value) {
Expand All @@ -684,12 +684,12 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> {
switch (value_type->id()) {
// If the value type does not match the expected NumPy dtype, then fall through
// to a slower PySequence-based path
#define LIST_FAST_CASE(TYPE_ID, TYPE, NUMPY_TYPE) \
case Type::TYPE_ID: { \
if (PyArray_DESCR(ndarray)->type_num != NUMPY_TYPE) { \
return Extend(this->value_converter_.get(), value, size); \
} \
return AppendNdarrayTyped<TYPE, NUMPY_TYPE>(ndarray); \
#define LIST_FAST_CASE(TYPE_ID, TYPE, NUMPY_TYPE) \
case Type::TYPE_ID: { \
if (PyArray_DESCR(ndarray)->type_num != NUMPY_TYPE) { \
return this->value_converter_->Extend(value, size); \
} \
return AppendNdarrayTyped<TYPE, NUMPY_TYPE>(ndarray); \
}
LIST_FAST_CASE(BOOL, BooleanType, NPY_BOOL)
LIST_FAST_CASE(UINT8, UInt8Type, NPY_UINT8)
Expand All @@ -707,7 +707,7 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> {
LIST_FAST_CASE(DURATION, DurationType, NPY_TIMEDELTA)
#undef LIST_FAST_CASE
default: {
return Extend(this->value_converter_.get(), value, size);
return this->value_converter_->Extend(value, size);
}
}
}
Expand Down Expand Up @@ -1041,18 +1041,18 @@ Result<std::shared_ptr<ChunkedArray>> ConvertPySequence(PyObject* obj, PyObject*
// the overflow and automatically creates new chunks.
ARROW_ASSIGN_OR_RAISE(auto chunked_converter, MakeChunker(std::move(converter)));
if (mask != nullptr && mask != Py_None) {
RETURN_NOT_OK(ExtendMasked(chunked_converter.get(), seq, mask, size));
RETURN_NOT_OK(chunked_converter->ExtendMasked(seq, mask, size));
} else {
RETURN_NOT_OK(Extend(chunked_converter.get(), seq, size));
RETURN_NOT_OK(chunked_converter->Extend(seq, size));
}
return chunked_converter->ToChunkedArray();
} else {
// If the converter can't overflow spare the capacity error checking on the hot-path,
// this improves the performance roughly by ~10% for primitive types.
if (mask != nullptr && mask != Py_None) {
RETURN_NOT_OK(ExtendMasked(converter.get(), seq, mask, size));
RETURN_NOT_OK(converter->ExtendMasked(seq, mask, size));
} else {
RETURN_NOT_OK(Extend(converter.get(), seq, size));
RETURN_NOT_OK(converter->Extend(seq, size));
}
return converter->ToChunkedArray();
}
Expand Down
38 changes: 37 additions & 1 deletion cpp/src/arrow/util/converter.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,15 @@ class Converter {
return Init(pool);
}

virtual Status Append(InputType value) = 0;
virtual Status Append(InputType value) { return Status::NotImplemented("Append"); }

virtual Status Extend(InputType values, int64_t size) {
return Status::NotImplemented("Extend");
}

virtual Status ExtendMasked(InputType values, InputType mask, int64_t size) {
return Status::NotImplemented("ExtendMasked");
}

const std::shared_ptr<ArrayBuilder>& builder() const { return builder_; }

Expand Down Expand Up @@ -294,6 +302,34 @@ class Chunker {
return status;
}

// we could get bit smarter here since the whole batch of appendable values
// will be rejected if a capacity error is raised
Status Extend(InputType values, int64_t size) {
auto status = converter_->Extend(values, size);
if (ARROW_PREDICT_FALSE(status.IsCapacityError())) {
if (converter_->builder()->length() == 0) {
return status;
}
ARROW_RETURN_NOT_OK(FinishChunk());
return Extend(values, size);
}
length_ += size;
return status;
}

Status ExtendMasked(InputType values, InputType mask, int64_t size) {
auto status = converter_->ExtendMasked(values, mask, size);
if (ARROW_PREDICT_FALSE(status.IsCapacityError())) {
if (converter_->builder()->length() == 0) {
return status;
}
ARROW_RETURN_NOT_OK(FinishChunk());
return ExtendMasked(values, mask, size);
}
length_ += size;
return status;
}

Status FinishChunk() {
ARROW_ASSIGN_OR_RAISE(auto chunk, converter_->ToArray(length_));
chunks_.push_back(chunk);
Expand Down
2 changes: 1 addition & 1 deletion r/R/array.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ Array$create <- function(x, type = NULL) {
if (!is.null(type)) {
type <- as_type(type)
}
Array__from_vector(x, type)
vec_to_arrow(x, type)
}

#' @rdname array
Expand Down
32 changes: 16 additions & 16 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading