Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
59dcbde
working implementation but lacks case-insensitivity and more unit tests
wmalpica Sep 1, 2021
4cb862b
different algorithm. Added more tests and benchmarks
wmalpica Sep 2, 2021
d03b7eb
uncommented tests
wmalpica Sep 2, 2021
69972dd
ARROW-13792 [Java]: The toString representation is incorrect for unsi…
liyafan82 Sep 2, 2021
b76caf4
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 2, 2021
111f0c7
ARROW-13823 [Java]: Exclude .factorypath
laurentgo Sep 2, 2021
09497a9
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 2, 2021
e380c1a
ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test
lidavidm Sep 2, 2021
bbecb6a
ARROW-13067: [C++][Compute] Implement integer to decimal cast
cyb70289 Sep 2, 2021
495c734
ARROW-13846: [C++] Fix crashes on invalid IPC file
pitrou Sep 2, 2021
425b1cb
ARROW-13850: [C++] Fix crashes on invalid Parquet data
pitrou Sep 2, 2021
f0879a5
ARROW-13164: [R] altrep vectors from Array with nulls
romainfrancois Sep 2, 2021
8c70a5f
ARROW-13459: [C++][Docs]Missing param docs for RecordBatch::SetColumn
zhjwpku Sep 2, 2021
a1d207e
ARROW-13831: [GLib][Ruby] Add support for writing by Arrow Dataset
kou Sep 2, 2021
1440d5a
ARROW-13768: [R] Allow JSON to be an optional component
karldw Sep 3, 2021
a45fc3f
ARROW-13782: [C++] Add skip_nulls/min_count to tdigest/mode/quantile
lidavidm Sep 3, 2021
5ead375
ARROW-13855: [C++][Python] Implement C data interface support for ext…
pitrou Sep 3, 2021
e9251b0
ARROW-13740: [R] summarize() should not eagerly evaluate
nealrichardson Sep 3, 2021
858ac57
ARROW-13874: [R] Implement TrimOptions
thisisnic Sep 3, 2021
a49048b
ARROW-13543: [R] Handle summarize() with 0 arguments or no aggregate …
nealrichardson Sep 3, 2021
f12c18e
ARROW-13899: [Ruby] Implement slicer by compute kernels
kou Sep 4, 2021
882e8b4
MINOR: [Doc][Python] Fix a typo (#11085)
jjyao Sep 4, 2021
5d38723
ARROW-13909: [GLib] Add GArrowVarianceOptions
kou Sep 5, 2021
2588e17
ARROW-13909: [GLib] Add tests for GArrowVarianceOptions
kou Sep 5, 2021
c83db7e
ARROW-13793: [C++] Migrate ORCFileReader to Result<T>
zhjwpku Sep 6, 2021
5c5af6c
ARROW-13871: [C++] JSON reader can fail if a list array key is presen…
westonpace Sep 6, 2021
a8953de
ARROW-13845: [C++] Reconcile RandomArrayGenerator::ArrayOf implementa…
pitrou Sep 6, 2021
4390a64
ARROW-13857: [R][CI] Remove checkbashisms download
nealrichardson Sep 6, 2021
cf0e5e4
ARROW-13803: [C++] Don't read past end of buffer in BitUtil::SetBitmap
cyb70289 Sep 6, 2021
b1cfa7d
ARROW-13912: [R] TrimOptions implementation breaks test-r-minimal-bui…
nealrichardson Sep 6, 2021
303b7f4
ARROW-13915: [R][CI] R UCRT C++ bundles are incomplete
nealrichardson Sep 6, 2021
fd47183
ARROW-13913: [C++] Don't segfault if IndexOptions omitted
lidavidm Sep 6, 2021
02343c8
ARROW-13684: [C++][Compute] Strftime kernel follow-up
rok Sep 6, 2021
5876e3f
ARROW-13403: [R] Update developing.Rmd vignette
thisisnic Sep 6, 2021
4cb77a2
ARROW-13910: [Ruby] Arrow::Table#[]/Arrow::RecordBatch#[] accepts Ran…
kou Sep 6, 2021
67b5bd2
ARROW-13743: [CI] OSX job fails due to incompatible git and libcurl
kszucs Sep 7, 2021
6dc272a
ARROW-13810: [C++][Compute] Predicate IsAsciiCharacter allows invalid…
edponce Sep 7, 2021
6c7c4f0
ARROW-13671: [Dev] Fix conda recipe on Arm 64k page system
cyb70289 Sep 7, 2021
9064fa0
ARROW-12981: [R] Install source package from CRAN alone
karldw Sep 7, 2021
080a86b
Implemented review feedback and added more unit tests
wmalpica Sep 7, 2021
f40856a
ARROW-13925: [R] Remove system installation devdocs jobs
jonkeane Sep 7, 2021
85d8175
ARROW-13919: [GLib] Add GArrowFunctionDoc
kou Sep 7, 2021
e396d4f
ARROW-13872: [Java] ExtensionTypeVector does not work with RangeEqual…
BryanCutler Sep 8, 2021
57e76e8
ARROW-13921: [Python][Packaging] Pin minimum setuptools version for t…
kszucs Sep 8, 2021
97135bc
Docs + lintr fix (#11107)
jonkeane Sep 8, 2021
a081a05
checked for empty hex falues. added scalar tests
wmalpica Sep 8, 2021
170a24f
ARROW-13820: [R] Rename na.min_count to min_count and na.rm to skip_n…
nealrichardson Sep 8, 2021
7a23a07
fixed style with clang-format
wmalpica Sep 8, 2021
e5db0fc
MINOR: [R] Fix broken doc example (#11110)
nealrichardson Sep 8, 2021
9dd8b6a
implemented some improvements
wmalpica Sep 8, 2021
31f80e5
fixed clang format
wmalpica Sep 8, 2021
4666073
fixed unit test
wmalpica Sep 8, 2021
b0d89db
ARROW-13680: [C++] Create an asynchronous nursery to simplify capture…
westonpace Sep 9, 2021
4b5ed4e
ARROW-13138: [C++][R] Implement extract temporal components (year, mo…
aucahuasi Sep 9, 2021
bb1ef85
ARROW-13033: [C++] Kernel to localize naive timestamps to a timezone …
rok Sep 9, 2021
9aee524
ARROW-11885: [R] Turn off some capabilities when LIBARROW_MINIMAL=true
nealrichardson Sep 9, 2021
0c41e0b
ARROW-13842: [C++] Bump vendored date library
pitrou Sep 9, 2021
946bdcf
ARROW-13963: [Go] Minor: Add bitmap reader/writer impl from go Parque…
Sep 9, 2021
4fe6fae
ARROW-13961: [C++] Fix use of non-const references, declaration witho…
lidavidm Sep 9, 2021
66d7dd4
ARROW-13962: [R] Catch up on the NEWS
nealrichardson Sep 9, 2021
04515de
MINOR: [R] Exclude some paths from the cpp rsync
nealrichardson Sep 9, 2021
56411f5
ARROW-13940: [R] Turn on multithreading with Arrow engine queries
nealrichardson Sep 9, 2021
42d10c3
ARROW-13964: MINOR: [Go][Parquet] remove base bitmap reader/writer fr…
Sep 9, 2021
3bbec3f
ARROW-13942: [Dev] Update cmake_format usage in autotune comment bot
kou Sep 10, 2021
3db4854
ARROW-13778: [R] Handle complex summarize expressions
nealrichardson Sep 10, 2021
fa7cff6
ARROW-1565: [C++] Implement TopK/BottomK
aocsa Sep 10, 2021
bae7e2b
MINOR: [Doc][Python] Fix typo ParquetFileForma (#11137)
domoritz Sep 11, 2021
db5b848
ARROW-13979: [Go] Enable -race for go tests
Sep 12, 2021
c091e6d
ARROW-13859: [Java] Add code coverage support
laurentgo Sep 12, 2021
e8ab3ae
ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots
liyafan82 Sep 12, 2021
1049dde
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 12, 2021
74f020d
ARROW-13974: [C++] Resolve follow-up reviews for TopK/BottomK
aocsa Sep 13, 2021
293f856
ARROW-13966: [C++] Support decimals in comparisons
lidavidm Sep 13, 2021
9122149
ARROW-13937: [C++][Compute] Add explicit output values to sign functi…
edponce Sep 13, 2021
f2cb977
ARROW-13646: [Go][Parquet] adding the parquet metadata package
Sep 13, 2021
dfaa415
ARROW-13983: [C++] Avoid raising error if fadvise() isn't supported
pitrou Sep 13, 2021
0610998
ARROW-13978: [C++] Bump gtest to 1.11 to unbreak builds with recent c…
pitrou Sep 13, 2021
52904d6
ARROW-13958: [Python] Migrate Python ORC bindings to use new Result-b…
jorisvandenbossche Sep 13, 2021
376cb45
ARROW-12744: [C++][Compute] Add rounding kernel
edponce Sep 13, 2021
87b2fcd
ARROW-12087: [C++] Allow sorting durations, timestamps with timezones
lidavidm Sep 13, 2021
1cbc4a2
ARROW-13904: [R] Implement ModeOptions
thisisnic Sep 13, 2021
f3d3c68
ARROW-13905: [R] Implement ReplaceSliceOptions
thisisnic Sep 14, 2021
0b6f531
ARROW-13906: [R] Implement PartitionNthOptions
thisisnic Sep 14, 2021
672149b
ARROW-13869: [R] Implement options for non-bound MatchSubstringOption…
thisisnic Sep 14, 2021
8875d5c
ARROW-13908: [R] Implement ExtractRegexOptions
thisisnic Sep 14, 2021
f1d6811
working implementation but lacks case-insensitivity and more unit tests
wmalpica Sep 1, 2021
925b2a7
different algorithm. Added more tests and benchmarks
wmalpica Sep 2, 2021
68ec4db
Implemented review feedback and added more unit tests
wmalpica Sep 7, 2021
a538072
checked for empty hex falues. added scalar tests
wmalpica Sep 8, 2021
400d886
fixed style with clang-format
wmalpica Sep 8, 2021
9cac060
implemented some improvements
wmalpica Sep 8, 2021
6053936
fixed clang format
wmalpica Sep 8, 2021
68a6844
fixed unit test
wmalpica Sep 8, 2021
7aee4f0
Merge branch 'wmalpica/ARROW-12657' of github.com:wmalpica/arrow into…
wmalpica Sep 14, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/comment_bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,8 @@ jobs:
set -ex
export PATH=/home/runner/.local/bin:$PATH
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -r dev/archery/requirements-lint.txt
python3 run-cmake-format.py
python3 -m pip install -e dev/archery[lint]
archery lint --cmake-format --fix
- name: Run clang-format on cpp
if: env.CLANG_FORMAT_CPP == 'true' || endsWith(github.event.comment.body, 'everything')
run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,7 @@ jobs:
Sys.setenv(
RWINLIB_LOCAL = file.path(Sys.getenv("GITHUB_WORKSPACE"), "libarrow.zip"),
MAKEFLAGS = paste0("-j", parallel::detectCores()),
ARROW_R_DEV = TRUE,
"_R_CHECK_FORCE_SUGGESTS_" = FALSE
)
rcmdcheck::rcmdcheck("r",
Expand Down
2 changes: 2 additions & 0 deletions c_glib/arrow-dataset-glib/arrow-dataset-glib.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

#include <arrow-dataset-glib/dataset-factory.h>
#include <arrow-dataset-glib/dataset.h>
#include <arrow-dataset-glib/enums.h>
#include <arrow-dataset-glib/file-format.h>
#include <arrow-dataset-glib/fragment.h>
#include <arrow-dataset-glib/partitioning.h>
#include <arrow-dataset-glib/scanner.h>
1 change: 1 addition & 0 deletions c_glib/arrow-dataset-glib/arrow-dataset-glib.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@
#include <arrow-dataset-glib/dataset.hpp>
#include <arrow-dataset-glib/file-format.hpp>
#include <arrow-dataset-glib/fragment.hpp>
#include <arrow-dataset-glib/partitioning.hpp>
#include <arrow-dataset-glib/scanner.hpp>
68 changes: 68 additions & 0 deletions c_glib/arrow-dataset-glib/dataset-factory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <arrow-dataset-glib/dataset-factory.hpp>
#include <arrow-dataset-glib/dataset.hpp>
#include <arrow-dataset-glib/file-format.hpp>
#include <arrow-dataset-glib/partitioning.hpp>

G_BEGIN_DECLS

Expand Down Expand Up @@ -142,13 +143,16 @@ gadataset_dataset_factory_finish(GADatasetDatasetFactory *factory,
typedef struct GADatasetFileSystemDatasetFactoryPrivate_ {
GADatasetFileFormat *format;
GArrowFileSystem *file_system;
GADatasetPartitioning *partitioning;
GList *files;
arrow::dataset::FileSystemFactoryOptions options;
} GADatasetFileSystemDatasetFactoryPrivate;

enum {
PROP_FORMAT = 1,
PROP_FILE_SYSTEM,
PROP_PARTITIONING,
PROP_PARTITION_BASE_DIR,
};

G_DEFINE_TYPE_WITH_PRIVATE(GADatasetFileSystemDatasetFactory,
Expand All @@ -175,6 +179,11 @@ gadataset_file_system_dataset_factory_dispose(GObject *object)
priv->file_system = NULL;
}

if (priv->partitioning) {
g_object_unref(priv->partitioning);
priv->partitioning = NULL;
}

if (priv->files) {
g_list_free_full(priv->files, g_object_unref);
priv->files = NULL;
Expand Down Expand Up @@ -205,6 +214,29 @@ gadataset_file_system_dataset_factory_set_property(GObject *object,
case PROP_FORMAT:
priv->format = GADATASET_FILE_FORMAT(g_value_dup_object(value));
break;
case PROP_PARTITIONING:
{
auto partitioning = g_value_get_object(value);
if (partitioning == priv->partitioning) {
break;
}
auto old_partitioning = priv->partitioning;
if (partitioning) {
g_object_ref(partitioning);
priv->partitioning = GADATASET_PARTITIONING(partitioning);
priv->options.partitioning =
gadataset_partitioning_get_raw(priv->partitioning);
} else {
priv->options.partitioning = arrow::dataset::Partitioning::Default();
}
if (old_partitioning) {
g_object_unref(old_partitioning);
}
}
break;
case PROP_PARTITION_BASE_DIR:
priv->options.partition_base_dir = g_value_get_string(value);
break;
default:
G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec);
break;
Expand All @@ -226,6 +258,12 @@ gadataset_file_system_dataset_factory_get_property(GObject *object,
case PROP_FILE_SYSTEM:
g_value_set_object(value, priv->file_system);
break;
case PROP_PARTITIONING:
g_value_set_object(value, priv->partitioning);
break;
case PROP_PARTITION_BASE_DIR:
g_value_set_string(value, priv->options.partition_base_dir.c_str());
break;
default:
G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec);
break;
Expand Down Expand Up @@ -279,6 +317,35 @@ gadataset_file_system_dataset_factory_class_init(
GARROW_TYPE_FILE_SYSTEM,
static_cast<GParamFlags>(G_PARAM_READABLE));
g_object_class_install_property(gobject_class, PROP_FILE_SYSTEM, spec);

/**
* GADatasetFileSystemDatasetFactory:partitioning:
*
* Partitioning used by #GADatasetFileSystemDataset.
*
* Since: 6.0.0
*/
spec = g_param_spec_object("partitioning",
"Partitioning",
"Partitioning used by GADatasetFileSystemDataset",
GADATASET_TYPE_PARTITIONING,
static_cast<GParamFlags>(G_PARAM_READWRITE));
g_object_class_install_property(gobject_class, PROP_PARTITIONING, spec);

/**
* GADatasetFileSystemDatasetFactory:partition-base-dir:
*
* Partition base directory used by #GADatasetFileSystemDataset.
*
* Since: 6.0.0
*/
spec = g_param_spec_string("partition-base-dir",
"Partition base directory",
"Partition base directory "
"used by GADatasetFileSystemDataset",
NULL,
static_cast<GParamFlags>(G_PARAM_READWRITE));
g_object_class_install_property(gobject_class, PROP_PARTITION_BASE_DIR, spec);
}

/**
Expand Down Expand Up @@ -454,6 +521,7 @@ gadataset_file_system_dataset_factory_finish(
"dataset", &arrow_dataset,
"file-system", priv->file_system,
"format", priv->format,
"partitioning", priv->partitioning,
NULL));
}

Expand Down
Loading