Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
964 commits
Select commit Hold shift + click to select a range
0e680f0
ARROW-1109: [JAVA] transferOwnership fails when readerIndex is not 0
julienledem Apr 11, 2017
c6cf124
ARROW-1110: [JAVA] make union vector naming consistent
julienledem Apr 14, 2017
11deee6
ARROW-1111: [JAVA] Make aligning buffers optional, and allow -1 for u…
StevenMPhillips Jun 6, 2017
ac64853
ARROW-1112: [JAVA] Set lastSet for VarLength and List vectors when lo…
StevenMPhillips Jun 6, 2017
2a2b109
ARROW-742: [C++] std::wstring_convert exceptions handling
maxhora Jun 13, 2017
25ba44c
ARROW-460: [C++] JSON read/write for dictionaries
wesm Jun 13, 2017
d25ea63
ARROW-1115: [C++] use CCACHE_FOUND value for ccache path
Jun 13, 2017
697df1b
ARROW-1117: [Docs] Minor issues in GLib README
sekikn Jun 14, 2017
d1de66b
ARROW-1118: [Site] Website updates for 0.4.1
wesm Jun 15, 2017
5b66c25
ARROW-1122: [Website] Add turbodbc + arrow blog post
MathMagique Jun 15, 2017
3f26dfa
ARROW-1096: [C++] CreateFileMapping maximum size calculation issue
maxhora Jun 16, 2017
d54bf48
ARROW-1122: [Website] Change timestamp to yield correct Jekyll date
wesm Jun 16, 2017
1a23419
ARROW-1124: Increase numpy dependency to >=1.10.x
xhochy Jun 17, 2017
5be05ac
ARROW-742: [C++] Use gflags from toolchain; Resolve cmake FindGFlags …
maxhora Jun 18, 2017
d874d4e
ARROW-1081: Fill null_bitmap correctly in TestBase
xhochy Jun 19, 2017
b5e8a48
ARROW-1128: [Docs] command to build a wheel is not properly rendered
sekikn Jun 19, 2017
86c67d0
ARROW-1129: [C++] Fix gflags issue in Linux/macOS toolchain builds
wesm Jun 20, 2017
f0f1ca6
ARROW-1138: Travis: Use OpenJDK7 instead of OracleJDK7
xhochy Jun 22, 2017
ef579ca
ARROW-1123: Make jemalloc the default allocator
xhochy Jun 22, 2017
5e34309
ARROW-1104: Integrate in-memory object store into arrow
pcmoritz Jun 22, 2017
222628c
ARROW-1140: [C++] Allow optional build of plasma
cpcloud Jun 22, 2017
608b89e
ARROW-1073: C++: Adapative integer builder
xhochy Jun 22, 2017
a16c124
ARROW-1137: Python: Ensure Pandas roundtrip of all-None column
xhochy Jun 22, 2017
c1ec0c7
ARROW-1039: Python: Remove duplicate column
xhochy Jun 23, 2017
074dde4
ARROW-1143: C++: Fix comparison of NullArray
xhochy Jun 23, 2017
e209e58
ARROW-1144: [C++] Remove unused variable
cpcloud Jun 23, 2017
6768f52
ARROW-1139: Silence dlmalloc warning on clang-4.0
pcmoritz Jun 23, 2017
8bf567e
ARROW-1136: [C++] Add null checks for invalid streams
wesm Jun 23, 2017
b7befeb
ARROW-1132: [Python] Unable to write pandas DataFrame w/MultiIndex co…
cpcloud Jun 23, 2017
1514016
ARROW-1146: Add .gitignore for *_generated.h files in src/plasma/format
cpcloud Jun 23, 2017
98f7cac
ARROW-1142: [C++] Port over compression toolchain and interfaces from…
wesm Jun 23, 2017
73007de
ARROW-1147: [C++] Allow optional vendoring of flatbuffers in plasma
cpcloud Jun 23, 2017
41524d6
ARROW-1135: [C++] Use clang 4.0 in one of the Linux builds
wesm Jun 24, 2017
f3bcf76
ARROW-1145: [GLib] Add get_values()
kou Jun 24, 2017
bea30d6
ARROW-1113: [C++] Upgrade to gflags 2.2.0, use tarball instead of git…
wesm Jun 24, 2017
fc3f8c2
ARROW-1131: [Python] Enable the Parquet unit tests by default if the …
wesm Jun 26, 2017
5de6eb5
ARROW-978: [Python] - Change python documentation sphinx theme to boo…
jeffknupp Jun 26, 2017
ec6e183
ARROW-1151: [C++] Add branch prediction to RETURN_NOT_OK
wesm Jun 26, 2017
bfe15db
ARROW-1152: [Cython] read_tensor should work with a readable file
pcmoritz Jun 26, 2017
3e754a0
ARROW-1155: [Python] Add null check when user improperly instantiates…
wesm Jun 26, 2017
cb5f2b9
ARROW-1157: C++/Python: Decimal templates are not correctly exported …
xhochy Jun 27, 2017
b065228
ARROW-1154: [C++] Import miscellaneous computational utility code fro…
wesm Jun 27, 2017
a588938
ARROW-1159: [C++] Use dllimport for visibility when not building Arro…
wesm Jun 27, 2017
bddb219
ARROW-834: Python Support creating from iterables
holdenk Jun 28, 2017
65558db
ARROW-1162: Empty data vector transfer between list vectors should no…
sudheeshkatkam Jun 29, 2017
6958252
ARROW-1165: [C++] Refactor PythonDecimalToArrowDecimal to not use tem…
cpcloud Jun 29, 2017
af83c45
ARROW-1166: Fix errors in example and missing reference in Layout.md
fangzheng Jun 29, 2017
9f500af
ARROW-1170: C++: Link to pthread on ARROW_JEMALLOC=OFF
xhochy Jun 30, 2017
456330f
ARROW-599: CMake support of LZ4 compression lib
maxhora Jul 1, 2017
930db87
ARROW-1169: [C++] jemalloc externalproject doesn't build with CMake's…
xhochy Jun 29, 2017
c294ec3
ARROW-1125: partial schemas for Table.from_pandas
Jul 1, 2017
96e7e99
ARROW-960: Add section on how to develop with pip
xhochy Jun 27, 2017
e268ce8
ARROW-915: [Python] Struct Array reads limited support
itaiin Jul 2, 2017
9e4906f
ARROW-1160: C++: Implement DictionaryBuilder
xhochy Jul 2, 2017
2e5ddfe
ARROW-1179: C++: Add missing virtual destructors
xhochy Jul 3, 2017
2c3e8b0
ARROW-692: Integration test data generator for dictionary types
wesm Jul 3, 2017
a6d0c26
ARROW-1180: [GLib] Fix a returning invalid address bug in garrow_tens…
kou Jul 3, 2017
e18abac
ARROW-1181: [Python] Parquet multiindex test should be optional
BryanCutler Jul 3, 2017
cdee23c
ARROW-600: ZSTD compression lib support
maxhora Jul 3, 2017
681479d
ARROW-1182: C++: Specify BUILD_BYPRODUCTS for zlib and zstd
xhochy Jul 3, 2017
e5a08dd
ARROW-1098. [Format] modify document mistake
Jul 4, 2017
7c18ddd
ARROW-966: [Python] Also accept Field instance in pyarrow.list_
wesm Jul 4, 2017
edcded3
ARROW-1148: [C++] Raise minimum CMake version to 3.2
wesm Jul 4, 2017
cbbd04b
ARROW-1172: [C++] Refactor to use unique_ptr for builders
kou Jul 4, 2017
7d86c28
ARROW-693: [Java] Add dictionary support to JSON reader and writer
BryanCutler Jul 4, 2017
00a7d55
ARROW-1185: [C++] Status class cleanup, warn_unused_result attribute …
wesm Jul 5, 2017
83a4405
ARROW-599: [C++] Lz4 compression codec support
maxhora Jul 6, 2017
c398fda
ARROW-462: [C++] Implement in-memory conversions between non-nested p…
wesm Jul 7, 2017
3309d12
ARROW-1174: [GLib] Fix ListArray test failure
kou Jul 7, 2017
b6b876c
ARROW-1193: [C++] Support pkg-config for arrow_python.so
kou Jul 9, 2017
e894532
ARROW-1197: [GLib] Fix a bug that record batch related functions for …
kou Jul 9, 2017
7870804
ARROW-1074: Support lists and arrays in pandas DataFrames without exp…
Jul 10, 2017
f73c1c3
ARROW-1201: [Python] Incomplete Python types cause a core dump when r…
cpcloud Jul 10, 2017
cab07c2
ARROW-1202: [C++] Remove semicolons from status macros
cpcloud Jul 10, 2017
bc16e0e
ARROW-1196: [C++] Release, Debug, Toolchain, NMake Generator Appveyor…
maxhora Jul 10, 2017
471a85f
ARROW-1168: [Python] pandas metadata may contain "mixed" data types
cpcloud Jul 10, 2017
ad57ea8
ARROW-1125: Python: Add public C++ API to unwrap PyArrow object
xhochy Jul 11, 2017
8452071
ARROW-1199: [C++] Implement mutable POD struct for Array data
wesm Jul 11, 2017
dbedc8d
ARROW-1186: [C++] Add support to build only Parquet dependencies
Jul 11, 2017
e8c09c6
ARROW-1205: C++: Reference to type objects in ArrayLoader may cause s…
xhochy Jul 11, 2017
afb1928
ARROW-1206: [C++] Add finer grained control of compression library su…
wesm Jul 11, 2017
f0ecc06
ARROW-1208: [C++] Temporary remove conda's build of zstd from Toolcha…
maxhora Jul 12, 2017
28e06d8
ARROW-1194: [Python] Expose MockOutputStream in pyarrow.
robertnishihara Jul 13, 2017
74bc873
ARROW-1150: Silence AdaptiveIntBuilder compiler warning on MSVC
xhochy Jul 13, 2017
85892a2
ARROW-1187: Python: Feather: Serialize a DataFrame with None column
xhochy Jul 13, 2017
248a9d8
ARROW-1212: [GLib] Add garrow_binary_array_get_offsets_buffer()
kou Jul 13, 2017
c7e0995
ARROW-1208: [C++] Install zstd from conda for Toolchain Appveyor buil…
maxhora Jul 14, 2017
8cad26e
ARROW-1200: C++: Switch DictionaryBuilder to signed integers
xhochy Jul 14, 2017
cb31b8b
ARROW-1215: [Python] Generate documentation for class members in API …
pcmoritz Jul 14, 2017
bfe3959
ARROW-962: [Python] Add schema attribute to RecordBatchFileReader
wesm Jul 15, 2017
f62db83
ARROW-1100: [Python] Add mode property to NativeFile
wesm Jul 15, 2017
d46b7ea
ARROW-992: [Python] Try to set a __version__ in in-place local builds
wesm Jul 15, 2017
9ff39f3
ARROW-1216: [Python] Fix creating numpy array from arrow buffers on p…
pcmoritz Jul 15, 2017
099f61c
ARROW-1218: [C++] Fix arrow build if no compression library is used
pcmoritz Jul 15, 2017
bb0a758
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle …
wesm Jul 15, 2017
dc4216f
ARROW-575: Python: Auto-detect nested lists and nested numpy arrays i…
xhochy Jul 15, 2017
e438e15
ARROW-1217: [GLib] Add GInputStream based arrow::io::RandomAccessFile
kou Jul 16, 2017
f266f17
ARROW-1220: [C++] Cmake script errors out if lib is not found under *…
maxhora Jul 16, 2017
bf01966
[Python] Correct function name in use with pandas documentation
MarkLavrynenko Jul 16, 2017
50b518a
ARROW-1183: [Python] Implement pandas conversions between Time32, Tim…
wesm Jul 16, 2017
cdf7db9
ARROW-1223: [GLib] Fix function name that returns wrapped object
kou Jul 16, 2017
d538426
ARROW-1228: [GLib] Fix test file name
kou Jul 17, 2017
8644ee1
ARROW-1227: [GLib] Support GOutputStream
kou Jul 17, 2017
e370174
ARROW-1222: [Python] Raise exception when passing unsupported Python …
wesm Jul 17, 2017
5fbfd8e
ARROW-597: [Python] Add read_pandas convenience to stream and file re…
wesm Jul 17, 2017
b474cac
ARROW-1221: [C++] Add run_clang_format.py script, exclusions file. Pi…
wesm Jul 17, 2017
ea9bc83
ARROW-1229: [GLib] Use "read" instead of "get" for reading record batch
kou Jul 17, 2017
0396240
ARROW-1190: [JAVA] Fixing VectorLoader for duplicate field names
antonymayi Jul 17, 2017
1541a08
ARROW-1177: [C++] Check for int32 offset overflow in ListBuilder, Bin…
wesm Jul 17, 2017
b4d34f8
ARROW-1191: [JAVA] Implement getField() method for complex readers
StevenMPhillips Jul 17, 2017
a1c8b83
ARROW-1079: [Python] Filter out private directories when building Par…
wesm Jul 18, 2017
6035d9b
ARROW-1233: [C++] Validate libs availability in conda toolchain
maxhora Jul 18, 2017
8152433
ARROW-1188: [Python] Handle Feather case where category values are nu…
wesm Jul 18, 2017
a73252d
ARROW-1235: [C++] Make operator<< for Array/Status and std::ostream i…
wesm Jul 18, 2017
362e754
ARROW-1103: [Python] Support read_pandas (with index metadata) on dir…
wesm Jul 18, 2017
c5a89b7
ARROW-1120: Support for writing timestamp(ns) to Int96
xhochy Jul 18, 2017
fe9c7ef
ARROW-1236: Fix lib path in pkg-config file
Zaharid Jul 19, 2017
6999dbd
ARROW-935: [Java] Build Javadoc and site with OpenJDK8 in Java CI build
wesm Jul 19, 2017
2c5b412
ARROW-1167: [Python] Support chunking string columns in Table.from_pa…
wesm Jul 19, 2017
5aa0809
[GLib] Update rat_exclusion_files.txt
wesm Jul 19, 2017
db181d1
ARROW-1244: Exclude C++ Plasma source tree when creating source release
wesm Jul 20, 2017
62ef2cd
[C++] Remove Plasma source tree for 0.5.0 release pending IP Clearance
wesm Jul 20, 2017
e9f76e1
[maven-release-plugin] prepare release apache-arrow-0.5.0
wesm Jul 20, 2017
9b26ed8
[maven-release-plugin] prepare for next development iteration
wesm Jul 20, 2017
2c81015
[C++] Restore Plasma source tree after 0.5.0 release
wesm Jul 23, 2017
fabf7fb
ARROW-1241: [C++] Appveyor build matrix extended with Visual Studio 2…
maxhora Jul 24, 2017
e1b098e
ARROW-1240: [JAVA] security: upgrade slf4j to 1.7.25 and logback to 1…
Jul 24, 2017
457bb07
ARROW-1237: [JAVA] expose the ability to set lastSet
siddharthteotia Jul 24, 2017
05f7058
ARROW-1239: [JAVA] upgrading git-commit-id-plugin
antonymayi Jul 24, 2017
a94f471
ARROW-1149: [Plasma] Create Cython client library for Plasma
pcmoritz Jul 24, 2017
6042c48
ARROW-1195: [C++] CpuInfo init with cores number, frequency and cache…
maxhora Jul 24, 2017
ecdc86b
ARROW-1249: [JAVA] expose fillEmpties from Nullable variable length v…
siddharthteotia Jul 25, 2017
886e2af
ARROW-1259: [Plasma] Speed up plasma tests
pcmoritz Jul 25, 2017
9e692af
ARROW-1245: [Integration] Enable JavaTester in Integration tests
BryanCutler Jul 25, 2017
11c92bf
ARROW-1246: [Format] Draft Flatbuffer metadata description for Map
wesm Jul 25, 2017
204f148
ARROW-1260: [Plasma] Use factory method to create Python PlasmaClient
pcmoritz Jul 25, 2017
07b89bf
ARROW-1219: [C++] Use Google C++ code formatting
wesm Jul 25, 2017
08cec90
ARROW-1252: [Website] Updates for 0.5.0 and short blog post summarizi…
wesm Jul 25, 2017
ed54dce
ARROW-1253: [C++/Python] Speed up C++ / Python builds by using conda-…
wesm Jul 25, 2017
f90fa49
[Website] Fix link to 0.5.0 post on install page
wesm Jul 25, 2017
e9e17b5
ARROW-1258: [C++] Suppress Clang dlmalloc compiler warnings
wesm Jul 26, 2017
2eeaa95
ARROW-1248: [Python] Suppress return-type-c-linkage warning in Cython…
wesm Jul 26, 2017
676a4a9
ARROW-1255: [Plasma] Fix typo in plasma protocol; add DCHECK for Read…
Yeolar Jul 26, 2017
5708cd1
[Java] Fix some typos in code comments and exception messages
rendel Jul 26, 2017
dca5d96
ARROW-1275: [C++] Deafult Snappy static lib suffix updated to "_static"
maxhora Jul 26, 2017
d76e43e
ARROW-1268: [WEBSITE] Added blog post for Spark integration toPandas()
BryanCutler Jul 27, 2017
cae3510
ARROW-1274: [C++] Fix CMake >= 3.3 warning. Also add option to suppre…
wesm Jul 27, 2017
7b3378f
ARROW-1204: [C++] Remove WholeProgramOptimization(/GL) compilation fl…
maxhora Jul 27, 2017
f72279b
ARROW-1288: Fix many license headers to use proper ASF one
wesm Jul 27, 2017
b7639c1
ARROW-1285: [Python] Delete any incomplete file when attempt to write…
wesm Jul 28, 2017
ff6c6e0
ARROW-1276: enable parquet serialization of empty DataFrames
crepererum Jul 28, 2017
8841bc0
ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ …
wesm Jul 28, 2017
33c85cd
[Java] Fix letter case in rat plugin config
Jul 27, 2017
4df2a0b
ARROW-1290: [C++] Double buffer size when exceeding capacity in arrow…
wesm Jul 28, 2017
44855bb
ARROW-1273: [Python] Add Parquet read_metadata, read_schema convenien…
wesm Jul 28, 2017
3b14765
ARROW-1289: [Python] Add PYARROW_BUILD_PLASMA CMake option, follow se…
wesm Jul 28, 2017
1dd0f5f
ARROW-1267: [Java] Handle zero length case in BitVector.splitAndTransfer
StevenMPhillips Jul 29, 2017
05af640
ARROW-276: [JAVA] Nullable Vectors should extend BaseValueVector and …
siddharthteotia Jul 29, 2017
ec32617
ARROW-1192: [JAVA] Use buffer slice for splitAndTransfer in List and …
siddharthteotia Jul 29, 2017
5aea3a3
ARROW-1287: [Python] Implement whence argument for pyarrow.NativeFile…
wesm Jul 29, 2017
b4e9ba1
ARROW-968: [Python] Support slices in RecordBatch.__getitem__
wesm Jul 29, 2017
ea1b67c
ARROW-1294: [C++] Pin cmake=3.8.0 in MSVC toolchain build
wesm Jul 29, 2017
4108bda
ARROW-1291: [Python] Cast non-string DataFrame columns to strings in …
wesm Jul 29, 2017
2288bfc
ARROW-1264: [Python] Raise exception in Python instead of aborting if…
wesm Jul 30, 2017
b4eec62
ARROW-932: [Python] Fix MSVC compiler warnings, build Python with /WX…
wesm Jul 30, 2017
af2aeaf
ARROW-1213: [Python] Support s3fs filesystem for Amazon S3 in Parquet…
wesm Jul 31, 2017
900105a
ARROW-187: [C++] Add development style notes to C++ README, note abou…
wesm Jul 31, 2017
b5ff2f6
ARROW-1251: [C++] Update C++ README to account for toolchain evolution
wesm Aug 1, 2017
3a84653
ARROW-1265: [Plasma] Clean up all resources on SIGTERM to keep valgri…
wesm Aug 1, 2017
e1d574c
ARROW-1301: [C++/Python] More complete filesystem API for HDFS
wesm Aug 1, 2017
b8754eb
ARROW-884: [C++] Exclude internal namespaces from generated Doxygen docs
wesm Aug 1, 2017
aa1d753
ARROW-573: [C++/Python] Implement IPC metadata handling for ordered d…
wesm Aug 1, 2017
e5ed31f
ARROW-1093: [Python] Run flake8 in Travis CI. Add note about developm…
wesm Aug 2, 2017
7e7861c
ARROW-1257: Plasma documentation
pcmoritz Aug 2, 2017
e50b6ae
ARROW-1308: [C++] Link utility executables to Arrow shared library if…
wesm Aug 2, 2017
b95bed0
ARROW-1303: [C++] Support downloading Boost
kou Aug 2, 2017
5917e07
ARROW-1305: [GLib] Add GArrowIntArrayBuilder
kou Aug 2, 2017
ee928d2
ARROW-1211: [C++] Enable builder classes to automatically use the def…
wesm Aug 2, 2017
93b51a0
ARROW-1315: [GLib] Add missing status check for arrow::ArrayBuilder::…
kou Aug 2, 2017
21a0191
ARROW-1323: [GLib] Add garrow_boolean_array_get_values()
kou Aug 2, 2017
84b7a0d
ARROW-1312: [C++] Make ARROW_JEMALLOC OFF by default until ARROW-1282…
wesm Aug 2, 2017
1874a8b
ARROW-1310: [JAVA] revert changes made in ARROW-886
siddharthteotia Aug 3, 2017
3732324
ARROW-1224: [Format] Clarify language around buffer padding and align…
siddharthteotia Aug 3, 2017
f775af7
ARROW-1312: [Python] Follow-up: do not use jemalloc in manylinux1 builds
wesm Aug 3, 2017
a388ddf
ARROW-1330: [Plasma] Turn on plasma tests on manylinux1
pcmoritz Aug 4, 2017
aa5d417
ARROW-1326: [Python] Fix Sphinx Build in Travis CI, treat Sphinx warn…
wesm Aug 4, 2017
717bed0
ARROW-1328: [Python] Set correct Arrow type when coercing to millisec…
wesm Aug 4, 2017
3bc7d46
ARROW-1296: [Java] Fix allocationSizeInBytes in FixedValueVectors.res…
icexelloss Aug 4, 2017
25439e7
ARROW-1300: [JAVA] Fix Tests for ListVector
siddharthteotia Aug 4, 2017
3200e91
ARROW-1327: [Python] Always release GIL before calling check_status i…
wesm Aug 7, 2017
619472e
ARROW-1225: [Python] Decode bytes to utf8 unicode if possible when pa…
wesm Aug 7, 2017
c0acb86
ARROW-1333: [Plasma] Example code for using Plasma to sort a DataFrame
robertnishihara Aug 7, 2017
f9d9833
ARROW-1283: [JAVA] Allow VectorSchemaRoot to close more than once
BryanCutler Aug 7, 2017
7a4026a
ARROW-1304: [Java] Fix Indentation, WhitespaceAround and EmptyLineSep…
icexelloss Aug 7, 2017
0b91cad
ARROW-622: [Python] Add coerce_timestamps option to parquet.write_tab…
wesm Aug 7, 2017
2015198
ARROW-1263: [C++] Get CPU info on Windows; Resolve patching whitespac…
maxhora Aug 7, 2017
02ab748
ARROW-1336: [C++] Add arrow::schema factory function, simply some awk…
wesm Aug 8, 2017
66ab6b2
ARROW-1309: [Python] Handle nested lists with all None values in Arra…
wesm Aug 8, 2017
03dcce4
ARROW-1173: [Plasma] Add blog post describing Plasma object store
robertnishihara Aug 8, 2017
939957f
ARROW-1335: [C++] Add offset to PrimitiveArray::raw_values to make co…
wesm Aug 8, 2017
5281a82
ARROW-1334: [C++] Add alternate Table constructor that takes vector o…
wesm Aug 8, 2017
20cee70
ARROW-1338: [Python] Do not close RecordBatchWriter on dealloc in cas…
wesm Aug 8, 2017
2615b47
ARROW-1306: [C++] Use UTF8 filenames in local file error messages
wesm Aug 8, 2017
6e26701
ARROW-439: [Python] Add option in "to_pandas" conversions to yield Ca…
Aug 8, 2017
a9c2f19
ARROW-1242: [JAVA] - upgrade jackson to mitigate security vulnerabili…
Aug 8, 2017
7fdbcc6
ARROW-1243: [JAVA] update all libs to latest versions
Aug 9, 2017
86154f0
ARROW-1340: [Java] Fix NullableMapVector field metadata
elahrvivaz Aug 9, 2017
e44ede8
ARROW-1343: [Java] Aligning serialized schema, end of buffers in Reco…
elahrvivaz Aug 9, 2017
2972c9d
ARROW-1342: [Python] Support strided ndarrays in pandas conversion fr…
wesm Aug 9, 2017
b795e5c
ARROW-1240: [JAVA] security: upgrade logback to address CVE-2017-5929…
Aug 11, 2017
2143349
ARROW-1242: [JAVA] - upgrade jackson to mitigate security vulnerabili…
Aug 11, 2017
63954c0
ARROW-1350: [C++] Do not exclude Plasma source tree from source release
wesm Aug 11, 2017
b173334
[maven-release-plugin] prepare release apache-arrow-0.6.0
wesm Aug 11, 2017
4db732c
[maven-release-plugin] prepare for next development iteration
wesm Aug 11, 2017
6135958
ARROW-1348: [C++/Python] Release verification script for Windows
wesm Aug 14, 2017
142f74e
ARROW-1331: [JAVA] Refactor unit tests
siddharthteotia Aug 14, 2017
a2f4323
ARROW-1352: [Integration] Added specific formatting for producer cons…
BryanCutler Aug 14, 2017
94b7cfa
ARROW-1339: [C++] Use of boost::filesystem::path to handle file paths
maxhora Aug 14, 2017
31457ae
ARROW-801: Provide direct access to underlying buffer memory addresses
siddharthteotia Aug 16, 2017
c2fb9cb
ARROW-1356: [Website] Add new committers
kou Aug 16, 2017
b78e2ef
ARROW-1353: [Website] Update website for 0.6.0 release and add short …
wesm Aug 16, 2017
4471dc9
[C++] DOC: Fix a typo in plasma.md
Aug 16, 2017
c0fa8e0
[Python] DOC: Fix Parquet docs to use pyarrow.parquet namespace for w…
rgbkrk Aug 16, 2017
3c5290a
ARROW-1365: [Python] Remove outdated pyarrow.jemalloc_memory_pool exa…
wesm Aug 17, 2017
4ef7c89
ARROW-1355: [Java] Make Arrow buildable with jdk9
laurentgo Aug 17, 2017
c9805d6
ARROW-1373: Implement getBuffer() methods for ValueVector
siddharthteotia Aug 19, 2017
e1bad9f
[C++] Fix a typo in in plasma.md
Aug 19, 2017
652fd36
ARROW-1366: [Plasma] Define entry point for the plasma store
pcmoritz Aug 19, 2017
10f7158
ARROW-1372: [Plasma] enable HUGETLB support on Linux to improve plasm…
pcmoritz Aug 20, 2017
b50f235
ARROW-759: [Python] Serializing large class of Python objects in Apac…
pcmoritz Aug 20, 2017
de7c671
ARROW-1357: [Python] Account for chunked arrays when converting lists…
wesm Aug 20, 2017
6ad976e
ARROW-1375: [C++] Remove dependency on msvc version for Snappy build
maxhora Aug 20, 2017
4e0aa3c
ARROW-1387: [C++] Set up GPU leaf library, add unit test module for C…
wesm Aug 21, 2017
5303594
ARROW-1395: [C++/Python] Remove APIs deprecated from 0.5.0 onward
wesm Aug 22, 2017
3c70ff1
ARROW-1384: [C++] Add SerializeRecordBatch API for writing a record b…
wesm Aug 22, 2017
2c3a5f4
ARROW-1392: [C++] Add GPU IO interfaces for CUDA
wesm Aug 23, 2017
c70f8bc
wip: UnionVector test passed
icexelloss Aug 21, 2017
6485759
All Java tests passed
icexelloss Aug 21, 2017
6910fd6
Add TODO
icexelloss Aug 21, 2017
083c114
Remove generate_union_case
icexelloss Aug 21, 2017
14b30f3
wip
icexelloss Aug 22, 2017
8f1150f
Test passed
icexelloss Aug 23, 2017
798ab66
enable cppTester
icexelloss Aug 23, 2017
4bec7b7
Add example json files for integration
icexelloss Aug 23, 2017
7ecd950
Add generator for union json data in integration_test.py
icexelloss Aug 25, 2017
12cc897
Revert test changes in integration_test.py
icexelloss Aug 25, 2017
4a4ff39
Make C++ JSON reader robust to union data with empty list of type ids
wesm Aug 27, 2017
4b8de83
Fix JsonReader to set valueCount correctly
icexelloss Aug 28, 2017
88befb2
Minor style fix
icexelloss Aug 28, 2017
2d6d41d
Remove empty type id case
icexelloss Aug 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions cpp/src/arrow/ipc/json-internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -269,10 +269,10 @@ class SchemaWriter {
writer_->Key("mode");
switch (type.mode()) {
case UnionMode::SPARSE:
writer_->String("SPARSE");
writer_->String("Sparse");
break;
case UnionMode::DENSE:
writer_->String("DENSE");
writer_->String("Dense");
break;
}

Expand Down Expand Up @@ -569,7 +569,7 @@ class ArrayWriter {
WriteValidityField(array);
const auto& type = static_cast<const UnionType&>(*array.type());

WriteIntegerField("TYPE_ID", array.raw_type_ids(), array.length());
WriteIntegerField("TYPE", array.raw_type_ids(), array.length());
if (type.mode() == UnionMode::DENSE) {
WriteIntegerField("OFFSET", array.raw_value_offsets(), array.length());
}
Expand Down Expand Up @@ -763,9 +763,9 @@ static Status GetUnion(const RjObject& json_type,
std::string mode_str = it_mode->value.GetString();
UnionMode mode;

if (mode_str == "SPARSE") {
if (mode_str == "Sparse") {
mode = UnionMode::SPARSE;
} else if (mode_str == "DENSE") {
} else if (mode_str == "Dense") {
mode = UnionMode::DENSE;
} else {
std::stringstream ss;
Expand All @@ -774,13 +774,25 @@ static Status GetUnion(const RjObject& json_type,
}

const auto& it_type_codes = json_type.FindMember("typeIds");
RETURN_NOT_ARRAY("typeIds", it_type_codes, json_type);

std::vector<uint8_t> type_codes;
const auto& id_array = it_type_codes->value.GetArray();
for (const rj::Value& val : id_array) {
DCHECK(val.IsUint());
type_codes.push_back(static_cast<uint8_t>(val.GetUint()));
if (it_type_codes == json_type.MemberEnd()) {
for (uint8_t code = 0; code < static_cast<uint8_t>(children.size()); ++code) {
type_codes.push_back(code);
}
} else {
RETURN_NOT_ARRAY("typeIds", it_type_codes, json_type);
const auto& id_array = it_type_codes->value.GetArray();
if (id_array.Size() == 0) {
for (uint8_t code = 0; code < static_cast<uint8_t>(children.size()); ++code) {
type_codes.push_back(code);
}
} else {
for (const rj::Value& val : id_array) {
DCHECK(val.IsUint());
type_codes.push_back(static_cast<uint8_t>(val.GetUint()));
}
}
}

*type = union_(children, type_codes, mode);
Expand Down Expand Up @@ -1142,8 +1154,8 @@ class ArrayReader {

RETURN_NOT_OK(GetValidityBuffer(is_valid_, &null_count, &validity_buffer));

const auto& json_type_ids = obj_->FindMember("TYPE_ID");
RETURN_NOT_ARRAY("TYPE_ID", json_type_ids, *obj_);
const auto& json_type_ids = obj_->FindMember("TYPE");
RETURN_NOT_ARRAY("TYPE", json_type_ids, *obj_);
RETURN_NOT_OK(
GetIntArray<uint8_t>(json_type_ids->value.GetArray(), length_, &type_id_buffer));

Expand Down
82 changes: 82 additions & 0 deletions integration/data/union.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"schema" : {
"fields" : [{
"name" : "union",
"nullable" : true,
"type" : {
"name" : "union",
"mode" : "Sparse",
"typeIds" : [4,5]
},
"children" : [{
"name" : "int",
"nullable" : true,
"type" : {
"name" : "int",
"bitWidth" : 32,
"isSigned" : true
},
"children" : [ ],
"typeLayout" : {
"vectors" : [{
"type" : "VALIDITY",
"typeBitWidth" : 1
},{
"type" : "DATA",
"typeBitWidth" : 32
}]
}
},{
"name" : "bigint",
"nullable" : true,
"type" : {
"name" : "int",
"bitWidth" : 64,
"isSigned" : true
},
"children" : [ ],
"typeLayout" : {
"vectors" : [{
"type" : "VALIDITY",
"typeBitWidth" : 1
},{
"type" : "DATA",
"typeBitWidth" : 64
}]
}
}],
"typeLayout" : {
"vectors" : [
{
"type": "VALIDITY",
"typeBitWidth": 1
},
{
"type" : "TYPE",
"typeBitWidth" : 8
}
]
}
}]
},
"batches" : [{
"count" : 10,
"columns" : [{
"name" : "union",
"count" : 10,
"VALIDITY" : [1,1,1,1,1,1,1,1,1,1],
"TYPE" : [4,5,4,5,4,5,4,5,4,5],
"children" : [{
"name" : "int",
"count" : 10,
"VALIDITY" : [1,0,1,0,1,0,1,0,1,0],
"DATA" : [0,0,2,0,4,0,6,0,8,0]
},{
"name" : "bigint",
"count" : 10,
"VALIDITY" : [0,1,0,1,0,1,0,1,0,1],
"DATA" : [0,1,0,3,0,5,0,7,0,9]
}]
}]
}]
}
86 changes: 85 additions & 1 deletion integration/integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -533,6 +533,55 @@ def generate_column(self, size, name=None):
return StructColumn(name, size, is_valid, field_values)


class UnionType(DataType):

def __init__(self, name, mode, type_ids, field_types, nullable=True):
DataType.__init__(self, name, nullable=nullable)
self.mode = mode
self.type_ids = type_ids
self.field_types = field_types

def _get_type(self):
type_ids = self.type_ids if self.type_ids is not None else []

attrs = [
('name', 'union'),
('mode', self.mode),
('typeIds', type_ids)
]

return OrderedDict(attrs)

def _get_children(self):
return [type_.get_json() for type_ in self.field_types]

def _get_type_layout(self):
return OrderedDict([
('vectors',
[OrderedDict([('type', 'VALIDITY'),
('typeBitWidth', 1)]),
OrderedDict([('type', 'TYPE'),
('typeBitWidth', 8)])])])

def _make_type(self, size):
if self.type_ids is not None:
type_ids = self.type_ids
else:
type_ids = np.arange(len(self.field_types))

return np.random.choice(type_ids, size)

def generate_column(self, size, name=None):
is_valid = self._make_is_valid(size)
types = self._make_type(size)

field_values = [type_.generate_column(size)
for type_ in self.field_types]
if name is None:
name = self.name
return UnionColumn(name, size, is_valid, types, field_values)


class Dictionary(object):

def __init__(self, id_, field, values, ordered=False):
Expand Down Expand Up @@ -603,6 +652,23 @@ def _get_children(self):
return [field.get_json() for field in self.field_values]


class UnionColumn(Column):
def __init__(self, name, count, is_valid, types, field_values):
Column.__init__(self, name, count)
self.is_valid = is_valid
self.types = types
self.field_values = field_values

def _get_buffers(self):
return [
('VALIDITY', [int(v) for v in self.is_valid]),
('TYPE', [int(v) for v in self.types])
]

def _get_children(self):
return [field.get_json() for field in self.field_values]


class JsonRecordBatch(object):

def __init__(self, count, columns):
Expand Down Expand Up @@ -747,6 +813,22 @@ def generate_dictionary_case():
dictionaries=[dict1, dict2])


def _generate_union_field(type_ids=None):
return UnionType('union_nullable', "Sparse", type_ids,
[get_field('f1', 'int64'),
get_field('f2', 'float64'),
get_field('f3', 'utf8'),
get_field('f4', 'binary'),
StructType('f5', [get_field('f1', 'int32'),
get_field('f2', 'utf8')]),])


def generate_union_case():
type_ids = np.random.choice(range(128), 5, replace=False).tolist()
fields = [_generate_union_field(type_ids)]
return _generate_file("union", fields, [10])


def get_generated_json_files():
temp_dir = tempfile.mkdtemp()

Expand All @@ -758,7 +840,8 @@ def _temp_path():
generate_primitive_case([0, 0, 0]),
generate_datetime_case(),
generate_nested_case(),
generate_dictionary_case()
generate_dictionary_case(),
generate_union_case(),
]

generated_paths = []
Expand Down Expand Up @@ -948,6 +1031,7 @@ def get_static_json_files():

def run_all_tests(debug=False):
testers = [CPPTester(debug=debug), JavaTester(debug=debug)]

static_json_files = get_static_json_files()
generated_json_files = get_generated_json_files()
json_files = static_json_files + generated_json_files
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ public void execute(File arrowFile, File jsonFile) throws IOException {
LOGGER.debug("ARROW schema: " + arrowSchema);
LOGGER.debug("JSON Input file size: " + jsonFile.length());
LOGGER.debug("JSON schema: " + jsonSchema);
Validator.compareSchemas(jsonSchema, arrowSchema);
Validator.compareSchemas(arrowSchema, jsonSchema);

List<ArrowBlock> recordBatches = arrowReader.getRecordBlocks();
Iterator<ArrowBlock> iterator = recordBatches.iterator();
Expand Down
Loading