Skip to content

Conversation

@ZMZ91
Copy link
Contributor

@ZMZ91 ZMZ91 commented Apr 13, 2021

No description provided.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

kszucs and others added 27 commits June 29, 2021 14:30
We can restore if we're going to have arm GHA runners again.

Closes apache#10618 from kszucs/ARROW-13211

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…ost release script

Closes apache#9322 from kszucs/python-post-release

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#10583 from ianmcook/ARROW-11675

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Ian Cook <ianmcook@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
… file

Some Python versions have a bug where `signal.getsignal` creates a reference cycle holding execution frames alive (https://bugs.python.org/issue42248).

This would cause excessive lifetimes of the PyArrow table returned by `read_csv`.

Closes apache#10609 from pitrou/ARROW-13187-signal-refcycle

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#10586 from lidavidm/arrow-12716

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#10615 from pachadotdev/arrow12967v3

Lead-authored-by: Mauricio Vargas <mavargas11@uc.cl>
Co-authored-by: Pachá <mvargas@dcc.uchile.cl>
Signed-off-by: Ian Cook <ianmcook@gmail.com>
Closes apache#10620 from pitrou/ARROW-13134

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#10596 from pitrou/ARROW-13104-unsafe-cast

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#10530 from lidavidm/arrow-13072

Lead-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Add a bytes_read() to the StreamingReader interface so the progress of the stream can be determined easily and accurately by a user.

Closes apache#10509 from n3world/ARROW-12996-stream_progress

Lead-authored-by: Nate Clark <nate@neworld.us>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…kernels

This change adds a `Bitmap::VisitWordsAndWrite` method, that outputs the values of the visitor lambda function to a provided bitmap.

Closes apache#10487 from nirandaperera/ARROW-13010

Authored-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…rAsync

WriteFooterAsync is private, so the example doesn't compile.
This method was probably public in an earlier version of the library.
WriteEndAsync seems to be the proper replacement.

Closes apache#10399 from royalstream/patch-1

Authored-by: Steven Burns <royalstream@hotmail.com>
Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
Adds sin/cos/tan and their inverses. Checked variants check for what would be domain errors (this does not apply to atan/atan2).

Closes apache#10544 from lidavidm/arrow-13095

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…lize

This is a draft of adding more utility methods to FunctionOptions. It's not fully implemented (it needs rebasing + serialization isn't implemented for most options, plus there are various TODOs scattered). But before I proceed further, I wanted to get some feedback.

Some concerns I have:
- I don't like adding protected methods to a struct, and it's inconsistent with how equality is implemented for other structs (via a visitor or otherwise centralized in a single location). However ARROW-8891 will require that we be able to define kernels - and presumably their options - in a separate shared library, so I don't think we can do much better than this.
- But for (de)serialization, we'll still need some way to dynamically register the mapping between a type_name and the actual struct, so maybe this is a moot point.
- I've exposed the fact that serialization uses StructScalars to support Expression - but maybe this is too much to commit to in the API?

Closes apache#10511 from lidavidm/arrow-13025

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…is wrong

Closes apache#10561 from 0x0L/0x0L-patch-1

Authored-by: nullptr <3621629+0x0L@users.noreply.github.com>
Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
Closes apache#10619 from bkietz/BindFunction-cython-utility

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
So far this involved a lot of refactoring of Expressions to be compatible with ExecBatches. The next step is to add a ScanNode wrapping a ScannerBuilder

Closes apache#10397 from bkietz/11930-Refactor-Dataset-scans-to

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Also ensure that the llvm-symbolizer path is correctly set, for useful tracebacks.

Closes apache#10632 from pitrou/ARROW-13223-tsan-failures

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
… differently than other regions

Added special case for us-east-1 in CreateBucket.

Note: I'm not sure how to go about testing this.  I don't think minio is going to have the same quirk.

Closes apache#10637 from westonpace/bugfix/ARROW-13228--c-s3-createbucket-fails-because-aws-treats-us-

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#10639 from lidavidm/arrow-13234

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Generate a signature for compute functions that better reflects the accepted arguments.

Example before:
```python
>>> pc.sum?
Signature: pc.sum(array, *, options=None, memory_pool=None, **kwargs)
Docstring:
Compute the sum of a numeric array.
[...]
```

Same example after:
```python
>>> ?pc.sum
Signature:
pc.sum(
    array,
    *,
    memory_pool=None,
    options=None,
    skip_nulls=True,
    min_count=1,
)
Docstring:
Compute the sum of a numeric array.
[...]
```

One caveat is that the individual options are not explicitly documented (yet):
```
Parameters
----------
array : Array-like
    Argument to compute function
memory_pool : pyarrow.MemoryPool, optional
    If not passed, will allocate memory from the default memory pool.
options : pyarrow.compute.ScalarAggregateOptions, optional
    Parameters altering compute function semantics
**kwargs : optional
    Parameters for ScalarAggregateOptions constructor. Either `options`
    or `**kwargs` can be passed, but not both at the same time.
```

Closes apache#10581 from pitrou/ARROW-10316-wrapped-compute-func

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Also fixes ArithmeticOptions being unbound.

Closes apache#10640 from lidavidm/arrow-13235

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…n instead of yml

Closes apache#10572 from kszucs/ARROW-6513

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
The JNI build gets stopped due to build timeout. Seems like the docker cache isn't valid anymore so it must build the docker image as well, but doesn't have the opportunity to push at the and of the build.

Closes apache#10631 from kszucs/jni-build-timeout

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#10641 from lidavidm/arrow-13236

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…heels

With [configuration](https://github.com/ursacomputing/crossbow/blob/master/.github/workflows/cache_vcpkg.yml) on crossbow's main branch. Posting the results once the build are finished.

Closes apache#10635 from kszucs/gha-vcpkg-cache

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Closes apache#10626 from kou/cpp-pc-libs-private

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
pitrou and others added 11 commits August 4, 2021 14:26
Remove APIs that have been deprecated for long enough.

Closes apache#10868 from pitrou/ARROW-13552-cpp-deprecated-apis

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
This PR adds support for both scalar and group-by aggregation via dplyr::summarize(). Only the functions sum, any, and all are wired up. Followup issues (both bugs and features):

* [C++] Aggregation nodes seem not to respect FunctionOptions, or else I'm not passing them in correctly (ARROW-13497)
* [C++] ScanNode takes filter but doesn't filter (ARROW-13498)
* [R] Aggregation on expression doesn't NSE correctly (ARROW-13499)
* [R] Bindings for mean, var, sd aggregation (ARROW-13528)
* [R] Bindings for count aggregation (ARROW-13501)
* [R] Bindings for min/max aggregation (ARROW-13502)
* [R] Handle summarize() with 0 arguments or no aggregate functions (ARROW-13543)
* [R] Support .groups argument to summarize() (ARROW-13550)
* [C++] MakeScalarAggregateNode and MakeGroupByNode have quite different function signatures, which makes working with the API confusing; GroupBy doesn't let you specify the names of the output columns (ARROW-13482)
* [C++] Grouped aggregation functions all have to be invoked with a `hash_` prefix to the name, which seems unnecessary because you can't call a non-hash-aggregation function in GroupBy and you can't call a hash_ function in ScalarAggregate (ARROW-13451)

Closes apache#10722 from nealrichardson/scalar-aggregate-node

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
A test to see if we can (for now) build r-debug before using it

Closes apache#10849 from jonkeane/ARROW-13507-r-lto

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#10851 from thisisnic/ARROW-13519_noisy_docs

Lead-authored-by: Nic <thisisnic@gmail.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Various updates to dataset.Rmd including:
* separating out dense text chunks
* rephrasing based on suggestions by Grammarly to simplify phrasing
* rephrasing "we" to "you"

Closes apache#10765 from thisisnic/ARROW_13399_dataset_vignette

Lead-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Nic <thisisnic@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Create a from_pydict function in RecordBatch class.
Create unit test for from_pydict

Closes apache#10854 from kharoc/ARROW-13089

Authored-by: kharoc <kharoly.cs@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Also request the correct version of duckdb now that it's been released.

Closes apache#10861 from jonkeane/ARROW-13538-gate-duckdb-tests

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Closes apache#10873 from n3world/ARROW-13556_link_protobuf

Authored-by: Nate Clark <nate@neworld.us>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Adds styling tasks to the Makefile (for 🦖  like me; I found that the styling-on-save from vscode was not reliable). Also makes codegen.R generate styled R code.

Closes apache#10879 from nealrichardson/styler2

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
It reached EOL.

Closes apache#10881 from kou/linux-ubuntu-drop-20.10

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ents

Update shared_ptr<Scalar> and shared_ptr<Arrow> to Datum in CheckScalar* functions

Closes apache#10878 from diegodfrf/ARROW-12953-Refactor-CheckScalar-to-take-Datum-argum

Authored-by: Fernando Rodriguez <diegodfrf@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Aug 5, 2021

Hi there, this pr's been open for quite a while. Could someone help check it? Thanks a lot.

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Aug 5, 2021

Closed this one and created new one #10884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment