Skip to content

heads up: Arrow API updates and breaking changes #38

@trxcllnt

Description

@trxcllnt

Hey guys, we're getting ready to release Arrow JS 0.3.0 soon and want to give you a heads up on breaking changes. We refactored things a bit to align better with the C++ implementation, and make it easier to implement the Arrow Writer APIs (coming soon).

I'm just looking at perspective.js#L281, but haven't searched the rest of the perspective codebase so there may be more places to update.

  • Table now exposes its Schema

  • Removed the eagerly-allocated columns list from the Table, in favor of lazily allocating columns via getColumnAt(i). You can map the schema's fields to get the columns:

    table.schema.fields.map((field, idx) => table.getColumnAt(idx))
  • Removed the vector.name property, now accessible via the schema (Vectors can be combined in ways that they differ from the original schema, in which any tie to the original field metadata isn't necessarily valid anymore)

  • Added DataType classes so now all the vector type information from the schema is available and strongly-typed at runtime. Now the vector.type field will refer to this instance. We also export the DataType classes and enums on the Arrow.type namespace, so you can do enum or instanceof comparisons.

  • Added TypeVisitor and VectorVisitor classes, to make it easier to walk the schema and vector trees:

    import { Vector, visitor, type } from 'apache-arrow';
    // Visitor to convert Vector<Date | Timestamp> to Vector<Int> of epoch ms
    class DateTimeVisitor extends visitor.VectorVisitor {
        visitDateVector(vec: Vector<type.Date_>) {
            return vec.asEpochMilliseconds();
        }
        visitTimestampVector(vec: Vector<type.Timestamp>) {
            return vec.asEpochMilliseconds();
        }
        visitNullVector(vec: Vector<type.Null>) { return vec; }
        visitBoolVector(vec: Vector<type.Bool>) { return vec; }
        visitIntVector(vec: Vector<type.Int>) { return vec; }
        visitFloatVector(vec: Vector<type.Float>) { return vec; }
        visitUtf8Vector(vec: Vector<type.Utf8>) { return vec; }
        visitBinaryVector(vec: Vector<type.Binary>) { return vec; }
        visitFixedSizeBinaryVector(vec: Vector<type.FixedSizeBinary>) { return vec; }
        visitTimeVector(vec: Vector<type.Time>) { return vec; }
        visitDecimalVector(vec: Vector<type.Decimal>) { return vec; }
        visitListVector(vec: Vector<type.List>) { return vec; }
        visitStructVector(vec: Vector<type.Struct>) { return vec; }
        visitUnionVector(vec: Vector<type.Uniona{ return vector; }>): vecny;
        visitDictionaryVector(vec: Vector<type.Dictionary>) { return vec; }
        visitIntervalVector(vec: Vector<type.Interval>) { return vec; }
        visitFixedSizeListVector(vec: Vector<type.FixedSizeList>) { return vec; }
        visitMapVector(vec: Vector<type.Map_>) { return vec; }
    }
    const table = Table.from(buf);
    const visitor = new DateTimeVisitor();
    const cols = table.schema.fields.map((field, idx) => {
        return visitor.visit(table.getColumnAt(idx));
    });

That's all I can think of for now. We have a repo where we push new releases for testing and use ahead of the apache release vote, so feel free to link to this in your package.json if you want to test it out: https://github.com/graphistry/arrow

Best,
Paul

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions