Skip to content

Type instability in getcolumn #499

@baumgold

Description

@baumgold

In Arrow.Table all columns are stored in a Vector{AbstractVector}. This causes downstream type instability problems and performance problems when iterating over a single column.

julia> using Arrow, Tables

julia> buf = Arrow.tobuffer((a=[1,2,3], b=[4,5,6]));

julia> tt = Arrow.Table(buf)
Arrow.Table with 3 rows, 2 columns, and schema:
 :a  Int64
 :b  Int64

julia> @code_warntype Tables.getcolumn(tt, :a)
MethodInstance for Tables.getcolumn(::Arrow.Table, ::Symbol)
  from getcolumn(t::Arrow.Table, nm::Symbol) @ Arrow ~/.julia/packages/Arrow/ID4np/src/table.jl:369
Arguments
  #self#::Core.Const(Tables.getcolumn)
  t::Arrow.Table
  nm::Symbol
Body::AbstractVector
1%1 = Arrow.lookup(t)::Dict{Symbol, AbstractVector}%2 = Base.getindex(%1, nm)::AbstractVector
└──      return %2

This uses Julia v1.10 and Arrow v2.7.1.

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 5 on 48 virtual cores
Environment:
  JULIA_NUM_THREADS = 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions