Skip to content

Conversation

@tanmaykm
Copy link
Contributor

@tanmaykm tanmaykm commented Apr 1, 2021

This adds a method to append partitions to existing arrow files. Partitiions to append to are supplied in the form of any Tables.jl-compatible table.

Multiple record batches will be written based on the number of Tables.partitions(tbl) that are provided.

Each partition being appended must have the same Tables.Schema as the destination arrow file that is being appended to.

Other parameters that append accepts are similar to what write accepts.

@tanmaykm
Copy link
Contributor Author

tanmaykm commented Apr 1, 2021

Also ref #105 which seems related.

@codecov
Copy link

codecov bot commented Apr 1, 2021

Codecov Report

Merging #160 (0f0ab4b) into main (6be335c) will increase coverage by 0.62%.
The diff coverage is 91.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #160      +/-   ##
==========================================
+ Coverage   81.32%   81.94%   +0.62%     
==========================================
  Files          25       26       +1     
  Lines        3015     3119     +104     
==========================================
+ Hits         2452     2556     +104     
  Misses        563      563              
Impacted Files Coverage Δ
src/Arrow.jl 54.54% <ø> (ø)
src/table.jl 95.98% <85.71%> (+0.22%) ⬆️
src/write.jl 95.69% <85.71%> (ø)
src/append.jl 93.05% <93.05%> (ø)
src/eltypes.jl 87.01% <0.00%> (-0.26%) ⬇️
src/arrowtypes.jl 0.00% <0.00%> (ø)
src/ArrowTypes/src/ArrowTypes.jl 97.27% <0.00%> (+0.01%) ⬆️
src/arraytypes/list.jl 91.59% <0.00%> (+0.84%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6be335c...0f0ab4b. Read the comment docs.

@tanmaykm tanmaykm force-pushed the tan/append branch 2 times, most recently from 57d13c6 to ce68fd0 Compare April 7, 2021 08:51
This adds a method to `append` partitions to existing arrow files. Partitiions to append to are supplied in the form of any [Tables.jl](https://github.com/JuliaData/Tables.jl)-compatible table.

Multiple record batches will be written based on the number of `Tables.partitions(tbl)` that are provided.

Each partition being appended must have the same `Tables.Schema` as the destination arrow file that is being appended to.

Other parameters that `append` accepts are similar to what `write` accepts.
@tanmaykm
Copy link
Contributor Author

Hi @quinnj, does this look okay, or is anything more needed here?

@quinnj
Copy link
Member

quinnj commented Apr 14, 2021

Sorry, I've been trying to catch up on a bunch of stuff since coming back from vacation; planning on reviewing this more in-depth in the next 24 hours.

store few additional stream properties in the `Stream` data type and avoid duplicating code for append functionality
@tanmaykm
Copy link
Contributor Author

@quinnj I have now added commits to allow appends to IO and added few members to Stream instead of duplicating code for append.

@tanmaykm
Copy link
Contributor Author

bump!

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @tanmaykm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants