[Variant] Improve write API in Variant::Object#7741
Conversation
Variant::ObjectVariant::Object
e750ecc to
8b7abc9
Compare
|
Hi, I wanted to get people's thoughts on the current Right now, calling From a user standpoint, I'd prefer the latter approach. However, it's a relatively expensive operation. Since each Worst-case, we'd rewrite our entire |
I also agree updating the existing value is preferable. My reading of the Variant spec didn't require all bytes in the variant's value to be used So what i am saying is I think it would be correct for the VariantBuilder to just update the key and leave the old value there (but not referenced) 🤔 That would result in a larger final variant, but I think as long as we documented this behavior it would be ok from the user perspective (I am envisioning many different possible desired optimizations for variant creation) |
cb89527 to
1de0779
Compare
45655af to
057d736
Compare
I wonder if we can add a |
The above would certainly work, in the sense of producing a valid variant object. My only concern would be that the scenario almost certainly arises due to user error (which is quite different from a generic map or set), and silently tolerating that error isn't necessarily doing the user any favors in the long run. They'll just discover at read time that they lost data, instead of fast-failing at write time. We can probably get away with either approach -- silently replacing or loudly complaining -- I just want to be sure we make the choice intentionally. |
I think maybe we could just show how to do this with an example (rather than having to make a function) prior to anyone actually encountering the error. I think it could be done by recursively rewriting the entire variant
Maybe we could have some flag that controls the validation behavior? Something like let mut builder = VariantBuilder::new();
let mut obj = builder.new_object()
// specify that an error should be thrown on repeated fields
.with_validate_unique_fields()
...
obj.finish()?; // this throws error if there were repeated fieldsThat way people could check for errors programatically if they wanted to and could disable the checking if they didn't care 🤔 This is all for a follow on PR I think |
|
@friendlymatthew -- I think this PR has a logical merge conflct now Here is a proposed fix: |
|
close/reopen to rerun CI |
Fix logical conflict in Variant write API PR
Opened #7777 |
|
@alamb this has been rebased onto main |
|
Thanks @friendlymatthew and @scovich |
Which issue does this PR close?
Variant::Objectcan contain two fields with the same field name #7730Rationale for this change
This commit changes the function name
ObjectBuilder::append_valuetoObjectBuilder::insert.Right now, calling insert() with a duplicate key results in two fields with the same key in the object, which deviates from the Variant spec. This PR updates the logic such that the second
insert()with a duplicate key will update the value. The old value still exists in the backing buffer, but is unreferenced. One side effect from this approach is a larger variant size.The Parquet Variant spec states: