Conversation
Codecov ReportBase: 95.12% // Head: 96.55% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
+ Coverage 95.12% 96.55% +1.42%
==========================================
Files 1 1
Lines 41 58 +17
==========================================
+ Hits 39 56 +17
Misses 2 2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
quinnj
left a comment
There was a problem hiding this comment.
Sorry for the slow review here; (getting ramped up in the new job and trying to get caught up a on a lot of Julia stuff + prioritize for the future). I like the name change to just metadata and getting this merged. I can make the required changes in Arrow.jl.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
nalimilan
left a comment
There was a problem hiding this comment.
Looks good! Though we may want to hold this until we have an implementation ready in DataFrames.jl and/or in Tables.jl just in case we discover unanticipated needs.
|
I will do implementation in DataFrames.jl then and I understand that @quinnj will do implementation in Arrow.jl - right? @nalimilan - What would you want implemented in Tables.jl? |
|
I have started the implementation and see one problem. I propose to discuss it here (although it is DataFrames.jl specific). Assume that I want to add metadata to some data frame that does not have metadata yet. I run @nalimilan - is this OK for you? |
|
Yeah it sounds fine to create the dict on the first call to avoid returning BTW, instead of recommending users mutate the |
|
I started implementing it and it seems that we need The where
Side note: to be able to add metadata to @nalimilan - do you have any additional thoughts on these two points? |
|
In JuliaData/DataFrames.jl#3055 I have drafted the implementation so you can see the important aspects of the design. |
|
Indeed I also realized that. :-/ Now that you list the requirements, One alternative would be to have an Cc: @Tokazama |
|
Thank you for the feedback. I think we have settled the design. If there will be no additional comments tomorrow I will start updating JuliaData/DataFrames.jl#3055. |
|
OK - I am starting to implement the API in DataFrames.jl 😄. |
|
I have started working on DataFrames.jl and I already have a decision to be made. In this PR we have
|
|
@nalimilan - I have added deletion to the API so that we can evaluate if we like it. |
|
Adding this to DataAPI sounds better than having a separate package, anyway empty definitions are cheap. The |
|
I am OK with But then for columns we will have:
? |
src/DataAPI.jl
Outdated
| One of the uses of the metadata `style` is decision | ||
| how the metadata should be propagated when `x` is transformed. This interface | ||
| defines the `:none` style that indicates that metadata should not be propagated | ||
| under transformations. All types supporting metadata allow at least this style. |
There was a problem hiding this comment.
@nalimilan - maybe we should make this description more precise? Currently in DataFrames.jl I needed to make a decision when :none metadata is kept and it is kept only in two cases:
DataFrameconstructor;copy;
all other operations drop all :none metadata. So, essentially both table level and column level :none metadata are attached to a concrete instance of a table or its copies (this is a safest approach, i.e. making sure that indeed when metadata could be invalidated it is dropped). Are we in agreement here?
There was a problem hiding this comment.
Given no response I will make the definition more precise.
| metadatakeys(::Any) = () | ||
|
|
||
| """ | ||
| metadata!(x, key::AbstractString, value; style) |
There was a problem hiding this comment.
Thinking about it, maybe the syntax would be more natural as metadata!(x, key => value)? That would allow extending this in the future to pass multiple pairs if it appears to be convenient.
A counter-argument is that setindex! doesn't use that syntax, but it's almost never called that way since x[key] = value is nicer. Of course both syntaxes could be allowed as they are not ambiguous (we will probably never allow keys to be pairs).
There was a problem hiding this comment.
As a first reaction it makes sense.
My only reservation was that in DataFrames.jl => is used for operation specification language, so we would have yet a third way to interpret => there.
The question is how would it look for colmetadata!?
colmetadata!(x, col, key => value; style=style)
(which does not look that nice)
Also note that metadata!(x, key => value) would not be allowed, you would need to write metadata!(x, key => value; style=style).
In summary - I would keep the things as they are here and consider what we design here a low-level API.
I assume that the extra new package planned (tentatively named TableMetadataTools.jl) will provide convenient high-level functions. In practice I even expect that if we define there:
caption!(table, str) = metadata!(table, "caption", str, style=:note)
caption(table) = metadata(table, "caption")
label!(table, col, str) = colmetadata!(table, col, "label", str, style=:note)
label(table, col) = colmetadata(table, col, "caption")
this will cover 95% of use cases of metadata in practice.
In summary - I propose to discuss a convenience high-level API in TableMetadataTools.jl, as I expect that in that package we will drop the requirement to specify style which we have in low-level API, as in high level API all styles will be :note.
|
I changed @nalimilan - can you please recheck and comment if it can be merged (I guess in JuliaData/DataFrames.jl#3055 we have converged with the implementation). Thank you! |
|
Thank you! The ball starts rolling (the metadata discussion is the most complex addition we made ever in the ecosystem) |
Following the discussion in JuliaData/DataFrames.jl#2961 I propose to have
getmetadatabe a function defined on DataAPI.jl level. In this way in particular:getmetadatawrite appropriate metadata to arrow file;DataFrameconstructor taking a table can check if it supports metadata and if it does automatically attach this metadata to aDataFrame;