-
Notifications
You must be signed in to change notification settings - Fork 13
Closed
Description
I've started implementing metadata and colmetadata for Parquet2.jl. I have a few thoughts, sorry for not bringing this up when this was being discussed but there was a lot of conversation and I tuned out at some point.
- There is currently no way as part of the API to fetch with a default such as in
Base.get. In many situations this means there is no way of fetching data without at least 2 lookups. - There isn't a clean way in the API of fetching all metadata, one would have to do something like
Dict(k=>metadata(x, k) for k \in metadatakeys(x))which seems a bit awkward, especially considering that in many cases the object is probably just sitting there in the first place and shouldn't have to be reconstructed. - I'm not sure if this is a problem, but I thought I'd point out that Tables.jl supports cases where the relationship between an object and its
colmetadatais more complicated than this API suggests. For example, in Parquet2 aDatasetis a table that has columns which are concatenations of sub-columns which belong ti sub-tables (which are also Tables.jl tables) calledRowGroup. It's therefore not possible to definecolmetadataonDatasetbecause it would be ambiguous which column metadata should be used (or whether it would be appropriate to merge them). This is surely not a typical case, but it seems worth pointing out that Tables.jl isn't enough to specify whatcolmetadatashould do. - Defining
ArgumentErrorfallbacks seems a bit dubious. These clearly should beMethodErrorif there is no reasonable fallback. The most obvious consequence of this is that error handling routines might catch a wrong error here. Nothing else immediately comes to mind, though I do vaguely remember somebody writing a blog post at some point describing why this pattern leads to trouble... I'd also be a little worried about it making method ambiguity cases worse.
I realize that opening this issue might seem like more of an annoyance than anything else since the ship has sailed and now we'd have to deal with breakage. However there might still be room to add a few methods such as, perhaps
metadata(x)
metadata(x, k, default)
metadata!(x, k, default)
metadata!(f, x, k)Perhaps it's already fine for packages to include these but in that case perhaps it should be documented.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels