-
Notifications
You must be signed in to change notification settings - Fork 70
Closed
Description
this is an example to demonstrate what "early returning" mean following a discussion on Slack with Alexander Plavin
tl;dr
The idea is that you have N columns you filter on, however, maybe 20% of the N columns are enough to early fail 80% of the rows. We need an ergonomic and 0-overhead interface to delay the allocation (or worse, when Feather is compressed) as much as possible.
Setup
julia> using Arrow, BenchmarkTools
julia> function gendata()
x = [rand(rand(0:10)) for _ = 1:10^5]
y = [randn(rand(0:10)) for _ = 1:10^5]
(;x, y)
end
julia> foreach(1:10) do _
Arrow.append("./out.feather", gendata())
endBenchmark
julia> const tbl = Arrow.Table("out.feather");
julia> function kernel1(xs, ys)
s1 = maximum(ys; init=0.0)
s1 < 0.95 && return false
maximum(xs; init=0.0) < 0.7 && return false
return true
end
julia> @benchmark map(kernel1, tbl.x, tbl.y)
BenchmarkTools.Trial: 19 samples with 1 evaluation.
Range (min … max): 264.955 ms … 271.889 ms ┊ GC (min … max): 3.83% … 4.93%
Time (median): 267.430 ms ┊ GC (median): 3.82%
Time (mean ± σ): 267.704 ms ± 2.096 ms ┊ GC (mean ± σ): 4.17% ± 0.47%
▁ ▁ ██ ▁▁ ▁ ▁ █▁ ▁ ▁ ▁ ▁ ▁ ▁
█▁▁▁█▁██▁▁▁▁██▁▁▁▁▁█▁█▁▁▁▁▁██▁▁█▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁█ ▁
265 ms Histogram: frequency by time 272 ms <
Memory estimate: 192.34 MiB, allocs estimate: 2000004.
julia> function method2(xss, yss)
map(axes(yss, 1)) do i
ys = yss[i]
s1 = maximum(ys; init=0.0)
s1 < 0.95 && return false
xs = xss[i]
maximum(xs; init=0.0) < 0.7 && return false
return true
end
end
julia> @benchmark method2(tbl.x, tbl.y)
BenchmarkTools.Trial: 25 samples with 1 evaluation.
Range (min … max): 197.487 ms … 219.503 ms ┊ GC (min … max): 2.89% … 2.82%
Time (median): 200.121 ms ┊ GC (median): 2.95%
Time (mean ± σ): 202.677 ms ± 6.567 ms ┊ GC (mean ± σ): 3.21% ± 0.48%
█▁ █ ▁ ▄
██▁█▁▆▆█▆▁▁▁▁▁▁▆█▆▁▁▁▁▁▆▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆▁▁▆ ▁
197 ms Histogram: frequency by time 220 ms <
Memory estimate: 148.03 MiB, allocs estimate: 1536913.
julia> map(kernel1, tbl.x, tbl.y) == method2(tbl.x, tbl.y)
trueMetadata
Metadata
Assignees
Labels
No labels