-
Notifications
You must be signed in to change notification settings - Fork 4
WIP: extend jplyr interface #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Also -- and this is just a suggestion -- would you be amenable to changing the name of the package to |
|
I like the direction you're going with this PR, and think it should precede the work in #1. I also agree the package should be renamed to It remains unclear to me what needs to be done on my end, so I'll wait for clearer directives from you after the dust settles. |
|
That sounds good. One thing I think you and I need to clear up is how to support manipulation verbs that come after a @query iris |>
groupby(species,
having(mean(petal_length) > 1.5))
@query iris |>
groupby(species,
aggregate(avg_petal_length = mean(petal_length)))whereas I was leaning towards representing such qualifiers as their own manipulation verbs with @query iris |>
groupby(species) |>
having(mean(petal_length) > 1.5)
@query iris |>
groupby(species) |>
summarize(avg_petal_length = mean(petal_length))Note that EDIT: I should give my arguments in favor of at least supporting the latter syntax in the present package:
@query iris |>
groupby(species) |>
summarize(avg_petal_length = mean(petal_length))right out of the box. I think it would be strange not to support this syntax for SQL backends, especially when dplyr does support it (and hence users coming from dplyr may expect it).
qry = @query iris |>
groupby(species)
qrya = @query qry |>
having(mean(petal_length) > 1.5)
qryb = @query qry |>
summarize(avg_petal_width = mean(petal_width)) |
|
When we were discussing it some weeks back, there wasn't very good support for the notion of a |
|
All of those TO-DO items may be more appropriate for future PRs. @yeesian, should the return type for collecting against a |
Let's make it a DataFrame then. |
|
@yeesian So, here's my current thinking about the For in-memory column-indexable Julia tables ("Julia tables" for short), I have a However, collecting |
|
I'm okay with the decision. Just some questions for my understanding:
What do we mean by grouping information is lost?
Is it meant to be an eventual replacement for GroupedDataFrames?
I was under the impression a |
This PR reframes the SQL translation regime implemented in SQLQuery as an extension of the jplyr querying framework. The translation logic is left essentially untouched.
This PR introduces the following broad changes:
SQLTable <: AbstractTableand the concreteSQLiteTable <: SQLTable. The latter is a thin wrapper around aSQLite.DBconnection and a field that names the table.jplyrto provide the graph generation@querymacro. The intended user-facing interface is that users use@query src |> ..., wheresrcis an object of typeT <: SQLTableto generate aQuery{T}, which they can thencollectagainstsrc. (Note thatjplyrdoes not support all of theQueryNodeleaf subtype objects that the present package does, in particularDistinctNode,LimitNode, andOffsetNode. Thus, the present package introduces these types and illustrates (or will illustrate) jplyr'sQueryNoderegistration mechanism (which still needs some work).)Base.collect(tbl::SQLTable, graph::jplyr.QueryNode), which translates thegraphinto a SQL string viatranslatesqland runs the query againsttbl. By default, the results set is streamed into aTables.Table.QueryNodeobjects,SQLTables orSQLiteTables). In thesqltableandsqlitetablefolders, aqueryfolder houses the code relevant to extending the jplyrcollectmachinery.translate.jllives insqltable/query.TO-DO (not exhaustive):
jplyr.QueryNoderegistration frameworkQueryNodeleaf subtypes, forSQLTables/SQLiteTablesDataStreamsinterfacingAbstractTables/Tables/jplyrHere's a teaser (relying on new work in
Tables/AbtractTables/jplyrthat I haven't yet pushed):