RFC: Implement initial support for COPY ... TO ... statement#6313
RFC: Implement initial support for COPY ... TO ... statement#6313alamb wants to merge 11 commits intoapache:mainfrom
COPY ... TO ... statement#6313Conversation
There was a problem hiding this comment.
Note that this is the mechanism as used by CREATE TABLE AS SELECT (aka LogicalPlan::CreateMemTable). It is different than the mechanism used by INSERT INTO ... ` (added in #5520 by @metesynnada ) that uses an ExecutionPlan.
The difference bothers me, but I can see the benefits of both approaches
There was a problem hiding this comment.
I was playing with this more this evening and I think I came up with something that is half way between that I like even better. Will keep iterating and report back
There was a problem hiding this comment.
I think we have something you may be able to leverage, check this PR out. It extends the ExecutionPlan approach to writing files, I think you can leverage that work here too. With that change, COPY TO and INSERT INTO will use the same ExecutionPlan-based approach -- the only difference would be related to appending vs overwriting.
FYI, if you are curious about timing, we plan to finalize and submit to upstream in a week or so.
There was a problem hiding this comment.
Thanks @ozankabak -- I have reviewed https://github.com/synnada-ai/arrow-datafusion/pull/89 and I have thought about how to incorporate the same structure
I really like your idea to use the the same plans for COPY TO and INSERT INTO. After some more thought, I have an idea of how to plan COPY statements using the same plans as an INSERT.
Here is a proposal for a simplified API: #6339
I'll try and bang out a PR shortly
|
Thank you for the heads up - I will study the PR you mention.
…On Wed, May 10, 2023 at 5:02 PM Mehmet Ozan Kabak ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In datafusion/core/src/execution/context.rs
<#6313 (comment)>
:
> @@ -450,6 +457,36 @@ impl SessionContext {
self.read_batch(record_batch)
}
+ // Execute a COPY TO statement, returning the number of rows
I think we have something you may be able to leverage, check this PR
<https://github.com/synnada-ai/arrow-datafusion/pull/89> out. It extends
the ExecutionPlan approach to writing files, I think you can leverage that
work here too.
—
Reply to this email directly, view it on GitHub
<#6313 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXZMJBIT634DSGELJU3XLXFQF7FANCNFSM6AAAAAAX37ZGKA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
COPY ... TO ... statementCOPY ... TO ... statement
|
i have enough feedback for now and I working to add this functionality in pieces, so no need for this PR now |
Which issue does this PR close?
Closes #5654
Closes #5988
Rationale for this change
What changes are included in this PR?
(I'll try and break this up into smaller pieces for easier review but I want to show it all working together)
COPY .. TO ...statementsLogicalPlan::CopyTovariant-[ ] Parser tests
-[ ] Add end user documentation
-[ ] Properly support writing single parquet files
-[ ] sqllogictests
I also plan to file follow on tasks (like support for other file formats, options, etc)
Are these changes tested?
Yes
Are there any user-facing changes?
Yes there is a new (documented) statement