Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1303 commits
Select commit Hold shift + click to select a range
d0b0b74
ARROW-1529: [GLib] Use Xcode 8.3 on Travis CI
kou Sep 13, 2017
96d451f
ARROW-1537: [C++] Support building with full path install_name on macOS
kou Sep 15, 2017
72ad07e
ARROW-1542: [C++] Install packages in temporary directory in MSVC bui…
wesm Sep 15, 2017
59b24ba
ARROW-559: Add release verification script for Linux
wesm Sep 15, 2017
bf73d27
ARROW-1545: Remove deprecated args of builder
rvernica Sep 17, 2017
bfe6579
ARROW-1546: [GLib] Support GLib 2.40 again
kou Sep 17, 2017
e093772
ARROW-1544: [JS] Export Vector types
trxcllnt Sep 17, 2017
b635d4c
Remove garbage ")"
kou Sep 17, 2017
63e7966
ARROW-1464: [GLib] Add "Common build problems" section into the READM…
wagavulin Sep 18, 2017
4a65fea
ARROW-1548: [GLib] Support bulk append in builder
kou Sep 18, 2017
e1d9c7f
ARROW-1550: [Python] Explicitly close owned file handles in ParquetWr…
wesm Sep 19, 2017
0d5e699
ARROW-1551: [Website] Website updates, blog post for 0.7.0
wesm Sep 19, 2017
b448f66
ARROW-1550: [Python] Followup: fix flake8 warning
wesm Sep 19, 2017
aebc412
ARROW-1551: [Website] Add 0.7.0 changelog
wesm Sep 19, 2017
2706b7f
ARROW-1533: [JAVA] realloc should consider the existing buffer capaci…
siddharthteotia Sep 19, 2017
d4685f4
ARROW-1547: [JAVA] Fix 8x memory over-allocation in BitVector
siddharthteotia Sep 19, 2017
c4f5a12
ARROW-1536:[C++] Do not transitively depend on libboost_system
Sep 20, 2017
2551050
ARROW-1554: [Python] Update Sphinx install page to note that VC14 run…
wesm Sep 20, 2017
903d03b
ARROW-1553: [JAVA] Implement setInitialCapacity for MapWriter
siddharthteotia Sep 20, 2017
9997a1a
ARROW-1557 [Python] Validate names length in Table.from_arrays
TomAugspurger Sep 20, 2017
975f32b
ARROW-1497: [Java] Fix JsonReader to initialize count correctly
icexelloss Sep 20, 2017
203fb63
ARROW-1500: [C++] Do not ignore return value from truncate in MemoryMa…
amirma Sep 21, 2017
d154c10
ARROW-1578: [C++] Run lint checks in Travis CI much earlier at before…
wesm Sep 21, 2017
c470c9c
ARROW-1591: C++: Xcode 9 is not correctly detected
xhochy Sep 21, 2017
cfcee74
ARROW-1347: [JAVA] Return consistent child field name for List Vectors
BryanCutler Sep 21, 2017
8fd73b4
ARROW-1595: [Python] Fix package dependency resolution issue causing …
wesm Sep 21, 2017
8996a4f
ARROW-1590: [JS] Flow TS Table method generics
trxcllnt Sep 22, 2017
c0a5019
ARROW-1592: [GLib] Add GArrowUIntArrayBuilder
kou Sep 22, 2017
b41a4ee
ARROW-1598: [C++] Fix diverged code comment in plasma tutorial
kenhys Sep 22, 2017
096b877
ARROW-1601: [C++] Do not read extra byte from validity bitmap, add in…
wesm Sep 25, 2017
39e487c
ARROW-1608: Support Release verification script on macOS
xhochy Sep 26, 2017
b640cc0
ARROW-1610: C++/Python: Only call python-prefix if the default PYTHON…
xhochy Sep 26, 2017
5da6b87
ARROW-1606: [Python] Copy .lib files in addition to .dll when bundlin…
wesm Sep 26, 2017
f9d1e1b
ARROW-1611: [C++] Add BitmapWriter, do not perform out of bounds read…
renesugar Sep 27, 2017
808a143
ARROW-1612:[GLib] Update readme for mac os
m-nakamura145 Sep 27, 2017
3a53f93
[Release] Update version to 0.7.1-SNAPSHOT
wesm Sep 27, 2017
fbabde5
[Release] Update CHANGELOG.md for 0.7.1
wesm Sep 27, 2017
0e21f84
[maven-release-plugin] prepare release apache-arrow-0.7.1
wesm Sep 27, 2017
686a8f7
ARROW-1607: [C++] Implement DictionaryBuilder for Decimals
cpcloud Sep 28, 2017
c358154
ARROW-1609: [Plasma] Xcode 9 compilation workaround
pcmoritz Sep 28, 2017
bdfa65e
ARROW-1620: Python: Download Boost in manylinux1 build from bintray
xhochy Sep 28, 2017
ac997fb
ARROW-1618: [JAVA] Reduce Heap Usage (Phase 1)
siddharthteotia Sep 29, 2017
545496c
ARROW-1619: [Java] Set lastSet in JsonFileReader
BryanCutler Sep 29, 2017
7045b42
ARROW-1615 Added BUILD_WARNING_LEVEL and BUILD_WARNING_FLAGS to Setup…
renesugar Sep 29, 2017
a03e093
ARROW-1600: [C++] Add Buffer constructor that wraps std::string
wesm Sep 30, 2017
ccbf644
ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not o…
wesm Sep 30, 2017
cc3b27c
ARROW-1626 Add make targets to run the inter-procedural static analys…
renesugar Sep 30, 2017
9aa6eb5
ARROW-1624: [C++] Fix build on LLVM 4.0, remove some clang warning su…
wesm Sep 30, 2017
811e668
ARROW-1629: [C++] Add miscellaneous DCHECKs and minor changes based o…
wesm Oct 2, 2017
af167fd
[Python] Update README.md to reflect that wheels are available on all…
ofek Sep 26, 2017
c905783
ARROW-1625: [Serialization] Support OrderedDict and defaultdict seria…
pcmoritz Oct 2, 2017
82eea49
[Java] Update pom versions to 0.8.0-SNAPSHOT
wesm Oct 3, 2017
988338c
ARROW-1634: [Website] Add release page for 0.7.1, update front page
wesm Oct 3, 2017
ff39cb5
ARROW-1640: Fix HTTPS failures in cmake / libcurl caused by ca-certif…
wesm Oct 4, 2017
8ceee56
ARROW-1543: [C++] Correct C++ tutorial to use std::unique_ptr instead…
wesm Oct 4, 2017
67c6317
ARROW-950: [Website] Add Google Analytics tag to site
wesm Oct 4, 2017
592c4e8
[Website] jekyll must be run with JEKYLL_ENV=production
wesm Oct 4, 2017
87fc577
ARROW-1584: [C++/Python] Support Null type in IPC round trips, fix se…
wesm Oct 4, 2017
31d33e0
[Website] Update website with new committers
wesm Oct 4, 2017
dc129d6
ARROW-1627: New class to handle collection of BufferLedger(s) within …
Oct 5, 2017
81319d9
ARROW-1647: [Plasma] Make sure to read length header as int64_t inste…
robertnishihara Oct 5, 2017
8b5b22b
ARROW-1525: [C++] New compare functions that return boolean instead o…
amirma Oct 5, 2017
0f819fa
ARROW-1603: [C++] Add BinaryArray::GetString helper method
wesm Oct 5, 2017
7511cfd
ARROW-226: [C++] If opening an HDFS file fails and it does not exist,…
wesm Oct 5, 2017
f0873a9
ARROW-942: Support running integration tests with both Python 2.7 and…
wesm Oct 5, 2017
9805ada
ARROW-1633: [Python] Support NumPy string and unicode types in pyarro…
wesm Oct 5, 2017
ab6aa9a
ARROW-1486: [C++] Make Column, RecordBatch, and Table non-copyable
wesm Oct 5, 2017
909a6f6
ARROW-1616: [Python] Add unit test for RecordBatchWriter.write dispat…
wesm Oct 5, 2017
bd73166
ARROW-1526: [Python] Add unit test for fix in PARQUET-1100
wesm Oct 5, 2017
51905e5
ARROW-1498: Add CONTRIBUTING.md to .github special directory
wesm Oct 5, 2017
bea3495
ARROW-1539: [C++] Remove APIs deprecated as of 0.7.0 or prior releases
wesm Oct 5, 2017
eaa9538
ARROW-1649: C++: Print number of nulls in PrettyPrint for NullArray
xhochy Oct 5, 2017
3ae4355
ARROW-1653: [Plasma] Use static cast to avoid compiler warning.
robertnishihara Oct 5, 2017
f8cdafa
ARROW-1541: [C++] Fix race conditions in arrow_gpu with generated Fla…
wesm Oct 6, 2017
ac1b66d
ARROW-1540: Add NO_VALGRIND option to ADD_ARROW_TEST and disable valg…
wesm Oct 6, 2017
898f5e2
ARROW-1602: [C++] Add IsValid method to pair with IsNull
wesm Oct 6, 2017
0a4c5b1
ARROW-1226: [C++] Docs cleaning in arrow/ipc. Doxyfile fixes, move ip…
wesm Oct 6, 2017
8309556
ARROW-1556: [C++] Move verbose AssertArraysEqual function used in PAR…
wesm Oct 6, 2017
b29b065
ARROW-1641: [C++] Hide std::mutex from public headers
wesm Oct 7, 2017
eaeb5d4
ARROW-1250: [Python] Add pyarrow.types module with useful type checki…
wesm Oct 7, 2017
e31c2e3
ARROW-1585/ARROW-1586: [PYTHON] serialize_pandas roundtrip loses colu…
cpcloud Oct 7, 2017
208e798
ARROW-1594: [Python] Multithreaded conversions to Arrow in from_pandas
wesm Oct 8, 2017
33d446d
ARROW-1656: [C++] Endianness Macro is Incorrect on Windows And Mac
cpcloud Oct 8, 2017
81a0e67
ARROW-1657: [C++] Multithreaded Read Test Failing on Arch Linux
cpcloud Oct 8, 2017
a0555c0
ARROW-1535: [Python] Enable sdist tarballs to be installed
wesm Oct 9, 2017
bf2e3ab
ARROW-1593: [Python] Pass through preserve_index to RecordBatch.from_…
wesm Oct 10, 2017
166f0a8
ARROW-1635: Add release management guide
wesm Oct 10, 2017
ee78cdc
ARROW-1503: [Python] Add default serialization context, callbacks for…
wesm Oct 10, 2017
4cb3e97
ARROW-1662: Move to using Homebrew/bundle and Brewfile
stephengroat Oct 11, 2017
60cb1c3
ARROW-905 [Docs] Dockerize document generation
heimir-sverrisson Oct 12, 2017
0d1e69c
ARROW-1630: [Serialization] Support Python datetime objects
pcmoritz Oct 12, 2017
434df8a
ARROW-1488: [C++] Implement ArrayBuilder::Finish in terms of FinishIn…
wesm Oct 12, 2017
47e6ff6
ARROW-1665: [Serialization] Support more custom datatypes in the defa…
pcmoritz Oct 12, 2017
dc53321
ARROW-1648: C++: Add cast from Dictionary[NullType] to NullType
xhochy Oct 13, 2017
894f740
ARROW-1670: [Serialization] Speed up deserialization by getting rid o…
pcmoritz Oct 14, 2017
a6a97a9
ARROW-1631 [C++] Add GRPC to ThirdpartyToolchain
maxhora Oct 14, 2017
e39b479
ARROW-1667: [GLib] Support Meson
kou Oct 16, 2017
1571fb4
ARROW-1677: [Blog] Post on ray and arrow serialization
pcmoritz Oct 16, 2017
2f2a0c1
ARROW-1676: [C++] Only pad null bitmap up to a factor of 8 bytes in F…
wesm Oct 16, 2017
1926bdc
ARROW-1613: [Java] Alternative ArrowReader close to free resources bu…
BryanCutler Oct 16, 2017
8eb2b0e
ARROW-1679: [GLib] Add garrow_record_batch_reader_read_next()
kou Oct 17, 2017
a043018
ARROW-1678: [Python] Implement numpy.float16 SerDe
Licht-T Oct 17, 2017
a4813bd
ARROW-1685: [GLib] Add GArrowTableBatchReader
kou Oct 18, 2017
298e343
ARROW-1673: [Python] Add support for numpy 'bool' type
pcmoritz Oct 18, 2017
d7bf5f2
ARROW-1690: [GLib] Add garrow_array_is_valid()
kou Oct 19, 2017
a8f5185
ARROW-1666: [GLib] Enable gtk-doc on Travis CI Mac environment
wagavulin Oct 20, 2017
971e99d
ARROW-1695: [Serialization] Fix reference counting of numpy arrays cr…
pcmoritz Oct 20, 2017
deaa0cf
ARROW-1686: [Docs] rsync contents of apidocs directory into site java…
wesm Oct 21, 2017
989aba6
ARROW-1698: [JS] File reader attempts to load the same dictionary bat…
Oct 21, 2017
2ee900c
ARROW-1702: Update jemalloc in manylinux1 build
xhochy Oct 21, 2017
3549fa8
ARROW-1683: [Python] Restore TimestampType to pyarrow namespace
wesm Oct 21, 2017
9d12c7c
ARROW-1697: [GitHub] Add ISSUE_TEMPLATE.md
wesm Oct 21, 2017
05788d0
ARROW-1701: [Serialization] Support zero copy PyTorch Tensor serializ…
pcmoritz Oct 21, 2017
fbeaeea
ARROW-1704: [GLib] Fix Go example failure
kou Oct 22, 2017
9ee5508
ARROW-1522: [Python] Zero copy buffer deserialization
pcmoritz Oct 22, 2017
61d8a76
ARROW-641: [C++] Do not build io-hdfs-test if ARROW_HDFS is off
wesm Oct 22, 2017
53dd0c8
ARROW-1087: [Python] Add pyarrow.get_include function. Bundle include…
wesm Oct 23, 2017
f40618d
ARROW-1671: [C++] Deprecate arrow::MakeArray that returns Status, ref…
wesm Oct 23, 2017
6209489
ARROW-1708: [JS] Fix linter error
xhochy Oct 23, 2017
8eb2a1b
ARROW-1707: Update dev README after movement to GitBox
xhochy Oct 23, 2017
4eb38a2
ARROW-571: [Python] Add unit test for incremental Parquet file buildi…
wesm Oct 23, 2017
2b77b7c
ARROW-507: [C++] Complete ListArray::FromArrays implementation, add u…
wesm Oct 23, 2017
8e00ee9
ARROW-1114: [C++] Add simple RecordBatchBuilder class
wesm Oct 23, 2017
935a3cf
ARROW-1654: [Python] Implement pickling for DataType, Field, Schema
wesm Oct 23, 2017
e876e17
ARROW-1720: [Python] Implement bounds check in chunk getter
Licht-T Oct 24, 2017
b08f7e3
ARROW-1711: [Python] Fix flake8 calls to lint the right directories
wesm Oct 24, 2017
ecb7605
ARROW-1134: [C++] Support for C++/CLI compilation, add NULLPTR define…
wesm Oct 25, 2017
b2596f6
ARROW-1588: [C++/Format] Harden Decimal Format
cpcloud Oct 25, 2017
8148b6d
ARROW-1726: [GLib] Add setup description to verify C GLib build
kou Oct 25, 2017
54d5c81
ARROW-1484: [C++/Python] Implement casts between date, time, timestam…
wesm Oct 26, 2017
48a6ff8
ARROW-1721: [Python] Implement null-mask check in places where it isn…
Licht-T Oct 26, 2017
238881f
ARROW-1675: [Python] Use RecordBatch.from_pandas in Feather write path
wesm Oct 26, 2017
c30a7e3
ARROW-1732: [Python] Permit creating record batches with no columns, …
wesm Oct 26, 2017
6b16cca
ARROW-1689: [Python] Allow user to request no data copies
njwhite Oct 26, 2017
7abaa00
ARROW-587: Add fix version to PR merge tool
wesm Oct 26, 2017
a385e2b
ARROW-1739: [Python] Fix broken build due to using unittest.TestCase …
wesm Oct 26, 2017
3596a43
ARROW-1737: [GLib] Use G_DECLARE_DERIVABLE_TYPE
kou Oct 26, 2017
2e04089
ARROW-1736: [GLib] Add GArrowCastOptions:allow-time-truncate
kou Oct 26, 2017
59030fe
ARROW-1730, ARROW-1738: [Python] Fix wrong datetime conversion
Licht-T Oct 27, 2017
2ed886e
ARROW-1723: [C++] add ARROW_STATIC to mark static libs on Windows
jjenkins278 Oct 27, 2017
4db0046
ARROW-1555 [Python] Implement Dask exists function
benjigoldberg Oct 27, 2017
2eb78b0
ARROW-1728: [C++] Run clang-format checks in Travis CI
wesm Oct 27, 2017
cc03a45
ARROW-1745: [Plasma] Include gtest after plasma/compat.h in tests.
robertnishihara Oct 28, 2017
74a934a
ARROW-1689: [Python] Implement zero-copy conversions for DictionaryArray
Licht-T Oct 28, 2017
b221a2c
ARROW-1751: [Python] Pandas 0.21.0 introduces a breaking API change f…
cpcloud Oct 30, 2017
f257b00
ARROW-1746: [Python] Add build dependencies for Arch Linux
xhochy Oct 30, 2017
ec22228
ARROW-1747: [C++] Don't export symbols of statically linked libraries
xhochy Oct 30, 2017
1d36dd2
ARROW-1748: [GLib] Add GArrowRecordBatchBuilder
kou Oct 30, 2017
30158ad
ARROW-1718: [C++/Python] Implement casts from timestamp to date32/64,…
wesm Oct 30, 2017
39243ff
ARROW-1409: [Format] Remove page id from Buffer metadata, increment m…
wesm Oct 30, 2017
72b50bc
[C++] Fix clang-format failure from ARROW-1409
wesm Oct 31, 2017
0880550
ARROW-1754: [Python] Fix buggy Parquet roundtrip when an index name i…
cpcloud Oct 31, 2017
eca9924
ARROW-1658: [Python] Add boundschecking of dictionary indices when cr…
wesm Oct 31, 2017
9dc4c58
ARROW-1753: [Python] Provide for matching subclasses with register_ty…
Oct 31, 2017
142e6ee
ARROW-1455 [Python] Add Dockerfile for validating Dask integration
heimir-sverrisson Nov 1, 2017
0373541
ARROW-1766: [GLib] Fix failing builds on OSX
cpcloud Nov 3, 2017
527af63
ARROW-1652: [JS] housekeeping, vector cleanup
trxcllnt Nov 3, 2017
82cd6e5
ARROW-1764: [Python] Add -c conda-forge for Windows dev installation …
Nov 3, 2017
5d66576
ARROW-1727: [Format] Expand Arrow streaming format to permit deltas /…
Nov 4, 2017
b9a2ce9
ARROW-1765: [Doc] Use dependencies from conda in C++ docker build
xhochy Nov 4, 2017
fc7104f
ARROW-1742: C++: clang-format is not detected correct on OSX anymore
xhochy Nov 4, 2017
62190d7
ARROW-1756: [Python] Fix large file read/write error
Licht-T Nov 4, 2017
b513c8d
ARROW-1762: [C++] Add note to readme about need to set LC_ALL on some…
wesm Nov 5, 2017
ea4a8f5
ARROW-1714: [Python] Fix invalid serialization/deserialization None n…
Licht-T Nov 5, 2017
1ee73ef
ARROW-1770: [GLib] Fix GLib compiler warning
cpcloud Nov 5, 2017
d7f1398
ARROW-1749: [C++] Handle range of Decimal128 values that require 39 d…
cpcloud Nov 5, 2017
b25b243
ARROW-1663: [Java] use consistent name for null and not-null in Fixed…
Nov 5, 2017
9721930
ARROW-480: [Python] Implement RowGroupMetaData.ColumnChunk
Licht-T Nov 6, 2017
0106f53
ARROW-1750: [C++] Remove the need for arrow/util/random.h
cpcloud Nov 6, 2017
99ea353
ARROW-1771: [C++] ARROW-1749 Breaks Public API test in parquet-cpp
cpcloud Nov 7, 2017
3995eb3
ARROW-1768: [Python] Fix suppressed exception in ParquetWriter.__del__
wesm Nov 7, 2017
e631119
[Format] Fix link to Flatbuffers project in IPC.md
vmuriart Nov 7, 2017
3188d70
ARROW-1716: [Format/JSON] Use string integer value for Decimals in JSON
cpcloud Nov 7, 2017
bfc0f24
ARROW-1776: [C++] Define arrow::gpu::CudaContext::bytes_allocated()
kou Nov 8, 2017
252a2a5
[GLib] Fix a typo in document
kou Nov 8, 2017
78872a1
ARROW-1775: Ability to abort created but unsealed Plasma objects
stephanie-wang Nov 8, 2017
dffa486
ARROW-1709: [C++] Decimal.ToString is incorrect for negative scale
cpcloud Nov 9, 2017
65a9055
ARROW-972: UnionArray in pyarrow
pcmoritz Nov 9, 2017
ed8aef2
ARROW-1793: fix a typo for README.md
luchy0120 Nov 10, 2017
2d34f34
ARROW-1788 Fix Plasma store abort bug on client disconnection
stephanie-wang Nov 10, 2017
7c205b0
ARROW-1787: [Python] Support reading parquet files into DataFrames in…
cpcloud Nov 11, 2017
21112f8
ARROW-1800: [C++] Fix and simplify random_decimals
cpcloud Nov 11, 2017
357eedc
ARROW-1781: Don't use brew when using the toolchain
xhochy Nov 12, 2017
550a39f
ARROW-1801: [Docs] Update install instructions to use red-data-tools …
rvernica Nov 12, 2017
7adadd8
ARROW-1763: [Python] Implement __hash__ for DataType
wesm Nov 12, 2017
e8331f4
ARROW-1794: [C++/Python] Rename DecimalArray to Decimal128Array
cpcloud Nov 13, 2017
4a33bad
ARROW-1767: [C++] Support file reads and writes over 2GB on Windows
Licht-T Nov 13, 2017
6f8e287
ARROW-1743: [Python] Avoid non-array writeable-flag check
Licht-T Nov 14, 2017
8f2d152
ARROW-1802: [GLib] Support arrow-gpu
kou Nov 14, 2017
b18bbeb
ARROW-1371: [Website] Add "Powered By" page to the website
xhochy Nov 14, 2017
e3db5da
ARROW-1806: [GLib] Add garrow_record_batch_writer_write_table()
kou Nov 14, 2017
9fb806c
ARROW-1811: [C++/Python] Rename all Decimal based APIs to Decimal128
cpcloud Nov 15, 2017
7255460
ARROW-1810: [Plasma] Remove unused Plasma test shell scripts
pcmoritz Nov 15, 2017
1d951b5
ARROW-1809: [GLib] Use .xml instead of .sgml for GTK-Doc main file
kou Nov 15, 2017
9812aea
ARROW-1812: [C++] Plasma store modifies hash table while iterating du…
stephanie-wang Nov 15, 2017
42353ba
ARROW-1473: ValueVector new hierarchy prototype (implementation phase 1)
siddharthteotia Oct 14, 2017
9ee838a
ARROW-1474:[JAVA] ValueVector hierarchy (Implementation Phase 2)
siddharthteotia Oct 16, 2017
5bea983
ARROW-1717: [Java] Refactor JsonReader for new class hierarchy and fix
icexelloss Nov 7, 2017
837150e
ARROW-1476: [JAVA] Implement Final ValueVector Updates
siddharthteotia Nov 14, 2017
ca3acdc
ARROW-1821: [INTEGRATION] Add integration test case for when Field ha…
BryanCutler Nov 16, 2017
ac26eb7
ARROW-1829: [Plasma] Fixes to eviction policy.
robertnishihara Nov 17, 2017
cacbacd
ARROW-1795: [Plasma] Create flag to make Plasma store use a single me…
robertnishihara Nov 17, 2017
f2806fa
ARROW-1559: [C++] Add Unique kernel and refactor DictionaryBuilder to…
wesm Nov 17, 2017
eb7be48
ARROW-1805: [Python] Ignore special private files when traversing Par…
manuvaldes Nov 18, 2017
202e650
ARROW-1791: Limit generated data range to physical limits for tempora…
wesm Nov 18, 2017
952ec05
ARROW-1773: [C++] Add casts from date/time types to compatible signed…
Licht-T Nov 18, 2017
37214ef
ARROW-1827: [Java] Add checkstyle file and license template
icexelloss Nov 18, 2017
9f9dc5b
ARROW-1575: [Python] Add tests for pyarrow.column factory function
wesm Nov 18, 2017
d92735e
ARROW-1834: [Doc] Build documentation in separate build folders
xhochy Nov 20, 2017
b3a3a74
ARROW-1693: [JS] Expand JavaScript implementation, build system, fix …
trxcllnt Nov 20, 2017
cb5da9c
ARROW-1778: [Python] Link parquet-cpp statically, privately in manyli…
xhochy Nov 20, 2017
284e6c9
ARROW-1826: [JAVA] Avoid branching in copyFrom for fixed width scalars
siddharthteotia Nov 20, 2017
d887d91
ARROW-1830: [Python] Relax restriction that Parquet files in a datase…
wesm Nov 21, 2017
e98adc3
ARROW-1840: [Website] The installation command failed on Windows10 an…
ksdevlife Nov 21, 2017
c436376
ARROW-1838: [C++] Conform kernel API to use Datum for input and output
wesm Nov 21, 2017
cac0912
ARROW-1841: [JS] Update text-encoding-utf-8 and tslib for node ESModu…
trxcllnt Nov 21, 2017
15ed080
ARROW-1703: [C++] Vendor exact version of jemalloc we depend on
xhochy Nov 21, 2017
3fb1491
ARROW-1268: [SITE][FOLLOWUP] Update Spark Post to Reflect Conf Change
BryanCutler Nov 21, 2017
fc4e2c3
ARROW-1808: [C++] Make RecordBatch, Table virtual interfaces for colu…
wesm Nov 22, 2017
1516306
ARROW-1047: [Java] Add Generic Reader Interface for Stream Format
BryanCutler Nov 22, 2017
9b2dc77
ARROW-1845: [Python] Expose Decimal128Type
xhochy Nov 22, 2017
dda2d34
ARROW-1828: [C++] Hash kernel specialization for BooleanType
wesm Nov 23, 2017
1524ed7
ARROW-1782: [Python] Add pyarrow.compress, decompress APIs
wesm Nov 23, 2017
ea0fb37
ARROW-1577: [JS] add ASF release scripts
trxcllnt Nov 23, 2017
6ec4f34
ARROW-1047: [Java] [FollowUp] Change ArrowMagic to be non-public class
icexelloss Nov 23, 2017
ac4bb69
ARROW-1852: [C++] Make retrieval of Plasma manager fd a const operation
mavam Nov 24, 2017
05bfb26
ARROW-1849: [GLib] Add input checks to GArrowRecordBatch
kou Nov 24, 2017
aaa0443
ARROW-1855: [GLib] Add workaround for build failure on macOS
kou Nov 25, 2017
82e42c5
ARROW-1777: [C++] Add ArrayData::Make static ctor for more convenient…
wesm Nov 25, 2017
b20beff
ARROW-1836: [C++] Remove deprecated static_visitor struct to avoid ms…
maxhora Nov 25, 2017
ad82c9a
ARROW-1853: [Plasma] Fix off-by-one error in retry processing
mavam Nov 25, 2017
bf1cf3b
[Python] Add more detail to development docs (#1356)
mrandrewandrade Nov 26, 2017
ebb6c76
ARROW-1859: [GLib] Add GArrowDictionaryDataType
kou Nov 26, 2017
85e2d89
ARROW-1758: [Python] Remove pickle=True option for object serialization
Licht-T Nov 26, 2017
42fc57b
ARROW-1178: [C++/Python] Add option to set chunksize in TableBatchRea…
wesm Nov 27, 2017
6176350
[Release] Apache Arrow JavaScript 0.2.0
wesm Nov 27, 2017
682e248
ARROW-1850: [C++] Use void* / const void* for buffers in file APIs
wesm Nov 27, 2017
b19e183
ARROW-1783: [Python] Provide a "component" dict representation of a s…
wesm Nov 27, 2017
a75325a
ARROW-1710: [Java] Remove Non-Nullable Vectors
BryanCutler Nov 28, 2017
ffb37db
ARROW-1735: [C++] Test CastKernel writing into output array with non-…
wesm Nov 28, 2017
155bf07
ARROW-1854: [Python] Use pickle to serialize numpy arrays of objects.
wesm Nov 29, 2017
b92c435
ARROW-1684: [Python] Support selecting nested Parquet fields by any p…
wesm Nov 29, 2017
bbbbbfb
ARROW-1844: [C++] Add initial Unique benchmarks for int64, variable-l…
wesm Nov 29, 2017
ff8efbf
ARROW-1869: [JAVA] Fix LowCostIdentityHashMap name
Nov 29, 2017
1fd3457
ARROW-1862: [GLib] Add GArrowDictionaryArray
kou Nov 29, 2017
705d842
ARROW-1874: [GLib] Add garrow_array_unique()
kou Dec 1, 2017
ad9105e
ARROW-1817: [Java] Configure JsonReader to read floating point NaN va…
BryanCutler Dec 1, 2017
d859441
ENH: Implement Array.IsValid/IsNull which return Array
Licht-T Dec 1, 2017
646b049
TST: Add tests for Array.IsValid/IsNull
Licht-T Dec 1, 2017
73a0328
ENH: Implement Array.isnull/notnull in Python
Licht-T Dec 1, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions cpp/src/arrow/array-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include "arrow/test-util.h"
#include "arrow/type.h"
#include "arrow/type_traits.h"
#include "arrow/util/bit-util.h"
#include "arrow/util/decimal.h"

namespace arrow {
Expand Down Expand Up @@ -150,30 +151,92 @@ TEST_F(TestArray, TestIsNullIsValid) {
1, 0, 1, 1, 0, 1, 0, 0,
1, 0, 0, 1};
// clang-format on
vector<uint8_t> valid_bitmap;
int64_t null_count = 0;
for (uint8_t x : null_bitmap) {
if (x == 0) {
++null_count;
valid_bitmap.push_back(1);
} else {
valid_bitmap.push_back(0);
}
}

std::shared_ptr<Buffer> null_buf;
std::shared_ptr<Buffer> null_arr_buf;
std::shared_ptr<Buffer> valid_arr_buf;
ASSERT_OK(BitUtil::BytesToBits(null_bitmap, default_memory_pool(), &null_buf));
ASSERT_OK(null_buf->Copy(0, null_buf->size(), &valid_arr_buf));
ASSERT_OK(BitUtil::BytesToBits(valid_bitmap, default_memory_pool(), &null_arr_buf));

std::unique_ptr<Array> arr;
arr.reset(new Int32Array(null_bitmap.size(), nullptr, null_buf, null_count));

std::unique_ptr<Array> null_arr;
std::unique_ptr<Array> valid_arr;
null_arr.reset(new BooleanArray(valid_bitmap.size(), null_arr_buf, nullptr, 0));
valid_arr.reset(new BooleanArray(null_bitmap.size(), valid_arr_buf, nullptr, 0));

ASSERT_EQ(null_count, arr->null_count());
ASSERT_EQ(5, null_buf->size());

ASSERT_TRUE(arr->null_bitmap()->Equals(*null_buf.get()));

EXPECT_TRUE(arr->IsNull()->Equals(*null_arr.get()));
EXPECT_TRUE(arr->IsValid()->Equals(*valid_arr.get()));
for (size_t i = 0; i < null_bitmap.size(); ++i) {
EXPECT_EQ(null_bitmap[i] != 0, !arr->IsNull(i)) << i;
EXPECT_EQ(null_bitmap[i] != 0, arr->IsValid(i)) << i;
}
}

TEST_F(TestArray, TestIsNullIsValidLarge) {
// clang-format off
vector<uint8_t> null_bitmap = {1, 0, 1, 1, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 0, 0,
1, 0, 0, 1};
// clang-format on
const size_t initial_size = null_bitmap.size();
const size_t generate_size = (3 * BitUtil::kSimdWidth * 8) / null_bitmap.size() + 1;

for (size_t i = 1; i < generate_size; i++) {
for (size_t j = 0; j < initial_size; j++) {
null_bitmap.push_back(null_bitmap[j]);
}
}

vector<uint8_t> valid_bitmap;
int64_t null_count = 0;
for (uint8_t x : null_bitmap) {
if (x == 0) {
++null_count;
valid_bitmap.push_back(1);
} else {
valid_bitmap.push_back(0);
}
}

std::shared_ptr<Buffer> null_buf;
std::shared_ptr<Buffer> null_arr_buf;
std::shared_ptr<Buffer> valid_arr_buf;
ASSERT_OK(BitUtil::BytesToBits(null_bitmap, default_memory_pool(), &null_buf));
ASSERT_OK(null_buf->Copy(0, null_buf->size(), &valid_arr_buf));
ASSERT_OK(BitUtil::BytesToBits(valid_bitmap, default_memory_pool(), &null_arr_buf));

std::unique_ptr<Array> arr;
arr.reset(new Int32Array(null_bitmap.size(), nullptr, null_buf, null_count));

std::unique_ptr<Array> null_arr;
std::unique_ptr<Array> valid_arr;
null_arr.reset(new BooleanArray(valid_bitmap.size(), null_arr_buf, nullptr, 0));
valid_arr.reset(new BooleanArray(null_bitmap.size(), valid_arr_buf, nullptr, 0));

EXPECT_TRUE(arr->IsNull()->Equals(*null_arr.get()));
EXPECT_TRUE(arr->IsValid()->Equals(*valid_arr.get()));
}

TEST_F(TestArray, BuildLargeInMemoryArray) {
const int64_t length = static_cast<int64_t>(std::numeric_limits<int32_t>::max()) + 1;

Expand Down
34 changes: 34 additions & 0 deletions cpp/src/arrow/array.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

#include "arrow/buffer.h"
#include "arrow/compare.h"
#include "arrow/memory_pool.h"
#include "arrow/pretty_print.h"
#include "arrow/status.h"
#include "arrow/type_traits.h"
Expand Down Expand Up @@ -52,6 +53,39 @@ std::shared_ptr<ArrayData> ArrayData::Make(const std::shared_ptr<DataType>& type

// ----------------------------------------------------------------------
// Base array class
std::shared_ptr<Array> Array::IsNull() const {
auto pool = default_memory_pool();

const auto bitmap_buffer = null_bitmap();
std::shared_ptr<Buffer> new_bitmap_buffer;

if (bitmap_buffer == NULLPTR) {
ARROW_CHECK_OK(GetEmptyBitmap(pool, length(), &new_bitmap_buffer));
} else {
ARROW_CHECK_OK(CopyFlipedBitmap(pool, bitmap_buffer->mutable_data(), length(),
&new_bitmap_buffer));
}

auto boolean_array =
std::make_shared<BooleanArray>(length(), new_bitmap_buffer, NULLPTR, 0);
return std::dynamic_pointer_cast<Array>(boolean_array);
}

std::shared_ptr<Array> Array::IsValid() const {
auto bitmap_buffer = null_bitmap();
std::shared_ptr<Buffer> new_bitmap_buffer;

if (bitmap_buffer == NULLPTR) {
auto pool = default_memory_pool();
ARROW_CHECK_OK(GetFullBitmap(pool, length(), &new_bitmap_buffer));
} else {
ARROW_CHECK_OK(bitmap_buffer->Copy(0, bitmap_buffer->size(), &new_bitmap_buffer));
}

auto boolean_array =
std::make_shared<BooleanArray>(length(), new_bitmap_buffer, NULLPTR, 0);
return std::dynamic_pointer_cast<Array>(boolean_array);
}

int64_t Array::null_count() const {
if (ARROW_PREDICT_FALSE(data_->null_count < 0)) {
Expand Down
4 changes: 4 additions & 0 deletions cpp/src/arrow/array.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,13 +203,17 @@ class ARROW_EXPORT Array {
BitUtil::BitNotSet(null_bitmap_data_, i + data_->offset);
}

std::shared_ptr<Array> IsNull() const;

/// \brief Return true if value at index is valid (not null). Does not
/// boundscheck
bool IsValid(int64_t i) const {
return null_bitmap_data_ != NULLPTR &&
BitUtil::GetBit(null_bitmap_data_, i + data_->offset);
}

std::shared_ptr<Array> IsValid() const;

/// Size in the number of elements this array contains.
int64_t length() const { return data_->length; }

Expand Down
45 changes: 45 additions & 0 deletions cpp/src/arrow/util/bit-util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
#define __builtin_popcount __popcnt
#include <nmmintrin.h>
#define __builtin_popcountll _mm_popcnt_u64
#else
#include <x86intrin.h>
#endif

#include <algorithm>
Expand Down Expand Up @@ -104,6 +106,12 @@ Status GetEmptyBitmap(MemoryPool* pool, int64_t length, std::shared_ptr<Buffer>*
return Status::OK();
}

Status GetFullBitmap(MemoryPool* pool, int64_t length, std::shared_ptr<Buffer>* result) {
RETURN_NOT_OK(AllocateBuffer(pool, BitUtil::BytesForBits(length), result));
memset((*result)->mutable_data(), 0xffff, static_cast<size_t>((*result)->size()));
return Status::OK();
}

Status CopyBitmap(MemoryPool* pool, const uint8_t* data, int64_t offset, int64_t length,
std::shared_ptr<Buffer>* out) {
std::shared_ptr<Buffer> buffer;
Expand All @@ -116,6 +124,43 @@ Status CopyBitmap(MemoryPool* pool, const uint8_t* data, int64_t offset, int64_t
return Status::OK();
}

Status CopyFlipedBitmap(MemoryPool* pool, const uint8_t* data, int64_t length,
std::shared_ptr<Buffer>* out) {
std::shared_ptr<Buffer> buffer;
RETURN_NOT_OK(GetEmptyBitmap(pool, length, &buffer));
uint8_t* dest = buffer->mutable_data();

// flip bits with vectorization
// TODO: use AVX instructions if available
size_t size = BitUtil::BytesForBits(length);
size_t rational = size / BitUtil::kSimdWidth;
size_t quotient = size % BitUtil::kSimdWidth;

if (quotient != 0) {
size_t align = BitUtil::kSimdWidth * rational;

for (size_t i = 0; i < quotient; i++) {
size_t position = align + i;
dest[position] = ~data[position];
}
}

for (size_t i = 0; i < rational; i++) {
size_t position = i * BitUtil::kSimdWidth;
const __m128i* data_in = reinterpret_cast<const __m128i*>(&data[position]);
__m128i* data_out = reinterpret_cast<__m128i*>(&dest[position]);

__m128i mask = _mm_set_epi32(0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff);
__m128i loaded_data = _mm_load_si128(data_in);
__m128i result = _mm_xor_si128(loaded_data, mask);

_mm_stream_si128(data_out, result);
}

*out = buffer;
return Status::OK();
}

bool BitmapEquals(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, int64_t bit_length) {
if (left_offset % 8 == 0 && right_offset % 8 == 0) {
Expand Down
9 changes: 9 additions & 0 deletions cpp/src/arrow/util/bit-util.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ class Status;

namespace BitUtil {

static constexpr size_t kSimdWidth = 16;

static constexpr uint8_t kBitmask[] = {1, 2, 4, 8, 16, 32, 64, 128};

// the ~i byte version of kBitmaks
Expand Down Expand Up @@ -535,6 +537,9 @@ class BitmapWriter {
ARROW_EXPORT
Status GetEmptyBitmap(MemoryPool* pool, int64_t length, std::shared_ptr<Buffer>* result);

ARROW_EXPORT
Status GetFullBitmap(MemoryPool* pool, int64_t length, std::shared_ptr<Buffer>* result);

/// Copy a bit range of an existing bitmap
///
/// \param[in] pool memory pool to allocate memory from
Expand All @@ -548,6 +553,10 @@ ARROW_EXPORT
Status CopyBitmap(MemoryPool* pool, const uint8_t* bitmap, int64_t offset, int64_t length,
std::shared_ptr<Buffer>* out);

ARROW_EXPORT
Status CopyFlipedBitmap(MemoryPool* pool, const uint8_t* bitmap, int64_t length,
std::shared_ptr<Buffer>* out);

/// Compute the number of 1's in the given data array
///
/// \param[in] data a packed LSB-ordered bitmap as a byte array
Expand Down
9 changes: 8 additions & 1 deletion python/pyarrow/array.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,14 @@ cdef class Array:
return 0

def isnull(self):
raise NotImplemented
null_arr = Array()
null_arr.init(self.sp_array.get().IsNull())
return null_arr

def notnull(self):
notnull_arr = Array()
notnull_arr.init(self.sp_array.get().IsValid())
return notnull_arr

def __getitem__(self, key):
cdef Py_ssize_t n = len(self)
Expand Down
3 changes: 3 additions & 0 deletions python/pyarrow/includes/libarrow.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:

c_bool Equals(const CArray& arr)
c_bool IsNull(int i)
shared_ptr[CArray] IsNull()
c_bool IsValid(int i)
shared_ptr[CArray] IsValid()

shared_ptr[CArrayData] data()

Expand Down