Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
dc92b83
fix typo
trxcllnt Feb 26, 2018
ef1acc7
read union buffers in the correct order
trxcllnt Feb 26, 2018
6522cb0
fix Data generics for FixedSizeList
trxcllnt Feb 26, 2018
43c671f
[WIP] add Binary writer
trxcllnt Feb 26, 2018
c8ba1fe
don't write an empty buffer for NullVectors
trxcllnt Feb 26, 2018
ae1f481
align to 64-byte boundaries
trxcllnt Feb 26, 2018
efb840f
fix typo
trxcllnt Feb 27, 2018
18b9dd2
Fix a typo
lsb Apr 26, 2018
aaec76b
fix @std/esm options for node10
trxcllnt May 11, 2018
d98e178
add option to run gulp cmds with `-t src` to run jest against the `sr…
trxcllnt May 11, 2018
7fff99e
move IPC magic into its own module
trxcllnt May 11, 2018
4333e54
FileBlock constructor should accept Long | number, have public number…
trxcllnt May 11, 2018
85eb7ee
fix erroneous footer length check in reader
trxcllnt May 11, 2018
a9d773d
move ValidityView into its own module, like ChunkedView is
trxcllnt May 11, 2018
508f4f8
add getChildAt(n) methods to List and FixedSizeList Vectors to be mor…
trxcllnt May 11, 2018
da0f457
first pass at a working binary writer, only arrow stream format teste…
trxcllnt May 11, 2018
a242da8
Add `Table.prototype.serialize` method to make ArrayBuffers from Tables
trxcllnt May 11, 2018
db02c1c
Add an integration test for binary writer
trxcllnt May 11, 2018
402187e
add apache license headers
trxcllnt May 11, 2018
304e75d
fix magic string alignment in file reader, add file reader tests
trxcllnt May 11, 2018
73a2fa9
fix stream -> file, file -> stream, add tests
trxcllnt May 11, 2018
4e80851
write correct recordBatch length
trxcllnt May 12, 2018
e75da13
add support for reading streaming format via node streams
trxcllnt May 12, 2018
78cba38
arrow2csv: support reading arrow streams from stdin
trxcllnt May 12, 2018
263d06d
clean up js integration script
trxcllnt May 12, 2018
832cc30
add more js integration scripts for creating/converting arrow formats
trxcllnt May 12, 2018
de81ac1
Update JSTester to be an Arrow producer now too
trxcllnt May 12, 2018
af9f4a8
run integration tests in node 10.1
trxcllnt May 13, 2018
3187732
set bitmap alignment to 8 bytes if < 64 values
trxcllnt May 13, 2018
b52af25
cleanup
trxcllnt May 13, 2018
d4b8637
add license headers
trxcllnt May 13, 2018
0be6de3
use node v10.1.0 in travis
trxcllnt May 13, 2018
c0b88c2
always write flatbuffer vectors
trxcllnt May 13, 2018
ccaf489
remove stream-to-iterator
trxcllnt May 13, 2018
081fefc
remove bin from ts package.json
trxcllnt May 13, 2018
a79334d
fix typo again after rebase
trxcllnt May 13, 2018
a6a7ab9
put test tables into hoisted functions so it's easier to set breakpoints
trxcllnt May 14, 2018
2df1a4a
update google-closure-compiler, remove gcc-specific workarounds in th…
trxcllnt May 14, 2018
ed85572
fix instanceof ArrayBuffer in jest/node 10
trxcllnt May 14, 2018
a06180b
don't run JS integration tests in src-only mode when --debug=true
trxcllnt May 15, 2018
efc7225
fix perf tests
trxcllnt May 15, 2018
7924e67
rename readNodeStream -> readStream, fromNodeStream -> fromReadableSt…
trxcllnt May 15, 2018
df43bc5
make arrow2csv support streaming files from stdin, add rowsToString()…
trxcllnt May 15, 2018
14e6b38
cleanup: remove dead code
trxcllnt May 15, 2018
f497f7a
measure maxColumnWidths across all recordBatches when printing a table
trxcllnt May 15, 2018
4ed6554
Merge branch 'master' of https://github.com/apache/arrow into js-buff…
trxcllnt May 16, 2018
b765b12
speed up integration_test.py by only testing the JS source, not every…
trxcllnt May 16, 2018
e34afaa
export the RecordBatchSerializer
trxcllnt May 17, 2018
1a9864c
read message bodyLength from flatbuffer object
trxcllnt May 17, 2018
4594fe3
align to 8-byte boundaries only
trxcllnt May 17, 2018
7a346dc
add a handy script for printing the alignment of buffers in a table
trxcllnt May 17, 2018
917c2fc
test the ES5/UMD bundle in the integration tests
trxcllnt May 17, 2018
261a864
Merge branch 'master' into js-buffer-writer
trxcllnt May 20, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/travis_env_common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
# specific language governing permissions and limitations
# under the License.

# hide nodejs experimental-feature warnings
export NODE_NO_WARNINGS=1
export MINICONDA=$HOME/miniconda
export PATH="$MINICONDA/bin:$PATH"
export CONDA_PKGS_DIRS=$HOME/.conda_packages
Expand Down
33 changes: 25 additions & 8 deletions integration/integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1092,35 +1092,52 @@ def file_to_stream(self, file_path, stream_path):
os.system(cmd)

class JSTester(Tester):
PRODUCER = False
PRODUCER = True
CONSUMER = True

INTEGRATION_EXE = os.path.join(ARROW_HOME, 'js/bin/integration.js')
EXE_PATH = os.path.join(ARROW_HOME, 'js/bin')
VALIDATE = os.path.join(EXE_PATH, 'integration.js')
JSON_TO_ARROW = os.path.join(EXE_PATH, 'json-to-arrow.js')
STREAM_TO_FILE = os.path.join(EXE_PATH, 'stream-to-file.js')
FILE_TO_STREAM = os.path.join(EXE_PATH, 'file-to-stream.js')

name = 'JS'

def _run(self, arrow_path=None, json_path=None, command='VALIDATE'):
cmd = [self.INTEGRATION_EXE]
def _run(self, exe_cmd, arrow_path=None, json_path=None, command='VALIDATE'):
cmd = [exe_cmd]

if arrow_path is not None:
cmd.extend(['-a', arrow_path])

if json_path is not None:
cmd.extend(['-j', json_path])

cmd.extend(['--mode', command])
cmd.extend(['--mode', command, '-t', 'es5', '-m', 'umd'])

if self.debug:
print(' '.join(cmd))

run_cmd(cmd)

def validate(self, json_path, arrow_path):
return self._run(arrow_path, json_path, 'VALIDATE')
return self._run(self.VALIDATE, arrow_path, json_path, 'VALIDATE')

def json_to_file(self, json_path, arrow_path):
cmd = ['node', self.JSON_TO_ARROW, '-a', arrow_path, '-j', json_path]
cmd = ' '.join(cmd)
if self.debug:
print(cmd)
os.system(cmd)

def stream_to_file(self, stream_path, file_path):
# Just copy stream to file, we can read the stream directly
cmd = ['cp', stream_path, file_path]
cmd = ['cat', stream_path, '|', 'node', self.STREAM_TO_FILE, '>', file_path]
cmd = ' '.join(cmd)
if self.debug:
print(cmd)
os.system(cmd)

def file_to_stream(self, file_path, stream_path):
cmd = ['cat', file_path, '|', 'node', self.FILE_TO_STREAM, '>', stream_path]
cmd = ' '.join(cmd)
if self.debug:
print(cmd)
Expand Down
196 changes: 2 additions & 194 deletions js/DEVELOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,11 @@ This argument configuration also applies to `clean` and `test` scripts.

* `npm run deploy`

Uses [learna](https://github.com/lerna/lerna) to publish each build target to npm with [conventional](https://conventionalcommits.org/) [changelogs](https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/conventional-changelog-cli).
Uses [lerna](https://github.com/lerna/lerna) to publish each build target to npm with [conventional](https://conventionalcommits.org/) [changelogs](https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/conventional-changelog-cli).

# Updating the Arrow format flatbuffers generated code

Once generated, the flatbuffers format code needs to be adjusted for our TS and JS build environments.

## TypeScript
Once generated, the flatbuffers format code needs to be adjusted for our build scripts.

1. Generate the flatbuffers TypeScript source from the Arrow project root directory:
```sh
Expand Down Expand Up @@ -101,193 +99,3 @@ Once generated, the flatbuffers format code needs to be adjusted for our TS and
```
1. Add `/* tslint:disable:class-name */` to the top of `Schema.ts`
1. Execute `npm run lint` to fix all the linting errors
## JavaScript (for Google Closure Compiler builds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixed now in upstream Flatbuffers?

Copy link
Contributor Author

@trxcllnt trxcllnt May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm I was able to update the version of google-closure-compiler we use in commit 2df1a4a, which allowed me to remove a number of hacks we had in the build to create the ES5/UMD bundle. One of those hacks was that we had to generate two versions of the FlatBuffers code, once in legacy ES5 JS that closure could use, and the TypeScript used everything else. With the new update, closure-compiler doesn't choke on the compiled TS output anymore.

1. Generate the flatbuffers JS source from the Arrow project root directory
```sh
cd $ARROW_HOME
flatc --js --no-js-exports -o ./js/src/format ./format/*.fbs
cd ./js/src/format
# Delete Tensor_generated.js (skip this when we support Tensors)
rm Tensor_generated.js
# append an ES6 export to Schema_generated.js
echo "$(cat Schema_generated.js)
export { org };
" > Schema_generated.js
# import Schema's "org" namespace and
# append an ES6 export to File_generated.js
echo "import { org } from './Schema';
$(cat File_generated.js)
export { org };
" > File_generated.js
# import Schema's "org" namespace and
# append an ES6 export to Message_generated.js
echo "import { org } from './Schema';
$(cat Message_generated.js)
export { org };
" > Message_generated.js
```
1. Fixup the generated JS enums with the reverse value-to-key mappings to match TypeScript
`Message_generated.js`
```js
// Replace this
org.apache.arrow.flatbuf.MessageHeader = {
NONE: 0,
Schema: 1,
DictionaryBatch: 2,
RecordBatch: 3,
Tensor: 4
};
// With this
org.apache.arrow.flatbuf.MessageHeader = {
NONE: 0, 0: 'NONE',
Schema: 1, 1: 'Schema',
DictionaryBatch: 2, 2: 'DictionaryBatch',
RecordBatch: 3, 3: 'RecordBatch',
Tensor: 4, 4: 'Tensor'
};
```
`Schema_generated.js`
```js
/**
* @enum
*/
org.apache.arrow.flatbuf.MetadataVersion = {
/**
* 0.1.0
*/
V1: 0, 0: 'V1',
/**
* 0.2.0
*/
V2: 1, 1: 'V2',
/**
* 0.3.0 -> 0.7.1
*/
V3: 2, 2: 'V3',
/**
* >= 0.8.0
*/
V4: 3, 3: 'V4'
};
/**
* @enum
*/
org.apache.arrow.flatbuf.UnionMode = {
Sparse: 0, 0: 'Sparse',
Dense: 1, 1: 'Dense',
};
/**
* @enum
*/
org.apache.arrow.flatbuf.Precision = {
HALF: 0, 0: 'HALF',
SINGLE: 1, 1: 'SINGLE',
DOUBLE: 2, 2: 'DOUBLE',
};
/**
* @enum
*/
org.apache.arrow.flatbuf.DateUnit = {
DAY: 0, 0: 'DAY',
MILLISECOND: 1, 1: 'MILLISECOND',
};
/**
* @enum
*/
org.apache.arrow.flatbuf.TimeUnit = {
SECOND: 0, 0: 'SECOND',
MILLISECOND: 1, 1: 'MILLISECOND',
MICROSECOND: 2, 2: 'MICROSECOND',
NANOSECOND: 3, 3: 'NANOSECOND',
};
/**
* @enum
*/
org.apache.arrow.flatbuf.IntervalUnit = {
YEAR_MONTH: 0, 0: 'YEAR_MONTH',
DAY_TIME: 1, 1: 'DAY_TIME',
};
/**
* ----------------------------------------------------------------------
* Top-level Type value, enabling extensible type-specific metadata. We can
* add new logical types to Type without breaking backwards compatibility
*
* @enum
*/
org.apache.arrow.flatbuf.Type = {
NONE: 0, 0: 'NONE',
Null: 1, 1: 'Null',
Int: 2, 2: 'Int',
FloatingPoint: 3, 3: 'FloatingPoint',
Binary: 4, 4: 'Binary',
Utf8: 5, 5: 'Utf8',
Bool: 6, 6: 'Bool',
Decimal: 7, 7: 'Decimal',
Date: 8, 8: 'Date',
Time: 9, 9: 'Time',
Timestamp: 10, 10: 'Timestamp',
Interval: 11, 11: 'Interval',
List: 12, 12: 'List',
Struct_: 13, 13: 'Struct_',
Union: 14, 14: 'Union',
FixedSizeBinary: 15, 15: 'FixedSizeBinary',
FixedSizeList: 16, 16: 'FixedSizeList',
Map: 17, 17: 'Map'
};
/**
* ----------------------------------------------------------------------
* The possible types of a vector
*
* @enum
*/
org.apache.arrow.flatbuf.VectorType = {
/**
* used in List type, Dense Union and variable length primitive types (String, Binary)
*/
OFFSET: 0, 0: 'OFFSET',
/**
* actual data, either wixed width primitive types in slots or variable width delimited by an OFFSET vector
*/
DATA: 1, 1: 'DATA',
/**
* Bit vector indicating if each value is null
*/
VALIDITY: 2, 2: 'VALIDITY',
/**
* Type vector used in Union type
*/
TYPE: 3, 3: 'TYPE'
};
/**
* ----------------------------------------------------------------------
* Endianness of the platform producing the data
*
* @enum
*/
org.apache.arrow.flatbuf.Endianness = {
Little: 0, 0: 'Little',
Big: 1, 1: 'Big',
};
```
37 changes: 37 additions & 0 deletions js/bin/file-to-stream.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#! /usr/bin/env node

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

const fs = require('fs');
const path = require('path');

const encoding = 'binary';
const ext = process.env.ARROW_JS_DEBUG === 'src' ? '.ts' : '';
const { util: { PipeIterator } } = require(`../index${ext}`);
const { Table, serializeStream, fromReadableStream } = require(`../index${ext}`);

(async () => {
// Todo (ptaylor): implement `serializeStreamAsync` that accepts an
// AsyncIterable<Buffer>, rather than aggregating into a Table first
const in_ = process.argv.length < 3
? process.stdin : fs.createReadStream(path.resolve(process.argv[2]));
const out = process.argv.length < 4
? process.stdout : fs.createWriteStream(path.resolve(process.argv[3]));
new PipeIterator(serializeStream(await Table.fromAsync(fromReadableStream(in_))), encoding).pipe(out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to wrap this logic into a method on Table? Something like

public serializeTo (out) {
    new PipeIterator(serializeStream(this, 'binary')).pipe(out)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit yeah, that's a bit what the comment above is about.

Ideally we don't have to aggregate the input into a table, instead just stream the fromReadableStream() values through a hypothetical writeStreamAsync method:

writeStreamAsync: (input: AsyncIterable<Schema | RecordBatch>) => AsyncIterable<Uint8Array>;

That said, the table should also have a serialize() method that returns a PipeIterator instead of a concatenated buffer, like we do with rowsToString()

Copy link
Member

@TheNeuralBit TheNeuralBit May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the table should also have a serialize() method that returns a PipeIterator instead of a concatenated buffer, like we do with rowsToString()

Yeah that's all I was getting at


})().catch((e) => { console.error(e); process.exit(1); });
Loading