Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 32 additions & 3 deletions parquet-variant/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ impl MetadataBuilder {
/// obj.insert("a", 1);
/// obj.insert("a", 2); // duplicate field
///
/// // When validation is enabled, finish will return an error
/// let result = obj.finish(); // returns Err
/// assert!(result.is_err());
/// ```
Expand Down Expand Up @@ -495,10 +496,20 @@ impl VariantBuilder {
.with_validate_unique_fields(self.validate_unique_fields)
}

/// Append a non-nested value to the builder.
///
/// # Example
/// ```
/// # use parquet_variant::{Variant, VariantBuilder};
/// let mut builder = VariantBuilder::new();
/// // most primitive types can be appended directly as they implement `Into<Variant>`
/// builder.append_value(42i8);
/// ```
pub fn append_value<'m, 'd, T: Into<Variant<'m, 'd>>>(&mut self, value: T) {
self.buffer.append_non_nested_value(value);
}

/// Finish the builder and return the metadata and value buffers.
pub fn finish(self) -> (Vec<u8>, Vec<u8>) {
(self.metadata_builder.finish(), self.buffer.into_inner())
}
Expand Down Expand Up @@ -577,6 +588,7 @@ impl<'a> ListBuilder<'a> {
self.offsets.push(element_end);
}

/// Finish the list, writing it to the parent buffer and consuming self.
pub fn finish(mut self) {
self.check_new_offset();

Expand All @@ -603,6 +615,14 @@ impl<'a> ListBuilder<'a> {
}
}

/// Drop implementation for ListBuilder does nothing
/// as the `finish` method must be called to finalize the list.
/// This is to ensure that the list is always finalized before its parent builder
/// is finalized.
impl Drop for ListBuilder<'_> {
fn drop(&mut self) {}
}

/// A builder for creating [`Variant::Object`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
Expand Down Expand Up @@ -694,9 +714,7 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
list_builder
}

/// Finalize object
///
/// This consumes self and writes the object to the parent buffer.
/// Finalize the object, writing it to the parent buffer and consuming self.
pub fn finish(mut self) -> Result<(), ArrowError> {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually remove finish as there are still examples where the builders are in same scope with top-level builder. So explicit call of finish is still necessary for that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's said, and, for the case these builders are in the same scope as the top-level builder, after this PR, if such a builder is not called finish, it will be caught by the compiler now, e.g.,

error[E0505]: cannot move out of `builder` because it is borrowed
    --> parquet-variant/src/builder.rs:1510:9
     |
1469 |         let mut builder = VariantBuilder::new().with_validate_unique_fields(true);
     |             ----------- binding `builder` declared here
...
1485 |         let mut outer_list = builder.new_list();
     |                              ------- borrow of `builder` occurs here
...
1510 |         builder.finish();
     |         ^^^^^^^ move out of `builder` occurs here
1511 |     }
     |     - borrow might be used here, when `outer_list` is dropped and runs the `Drop` code for type `ListBuilder`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds like a great idea

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! The compiler is normally willing to end the lifetime of the orphaned outer_list just before the builder.finish call, but providing an impl Drop somehow force-extends the lifetime?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is normally willing to end the lifetime of the orphaned outer_list just before the builder.finish call

No, outer_list's lifetime normally will be ended (if no finish call) after the scope as usual. That's why the compiler complains because it borrows builder. So builder.finish call cannot be happened before the end of outer_list lifetime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only happens if there's an impl Drop tho:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=fe049aba262d590948887ee08f43618c

Somehow, the compiler is able to shorten the lifetime of outer_list if impl Drop doesn't get in the way.

Copy link
Copy Markdown
Member Author

@viirya viirya Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only happens if there's an impl Drop tho:

Isn't what this PR does?

self.check_pending_field();

Expand Down Expand Up @@ -756,6 +774,14 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
}
}

/// Drop implementation for ObjectBuilder does nothing
/// as the `finish` method must be called to finalize the object.
/// This is to ensure that the object is always finalized before its parent builder
/// is finalized.
impl Drop for ObjectBuilder<'_, '_> {
fn drop(&mut self) {}
}

/// Trait that abstracts functionality from Variant fconstruction implementations, namely
/// `VariantBuilder`, `ListBuilder` and `ObjectFieldBuilder` to minimize code duplication.
pub(crate) trait VariantBuilderExt<'m, 'v> {
Expand Down Expand Up @@ -1490,6 +1516,9 @@ mod tests {
"Invalid argument error: Duplicate field keys detected: [x]"
);

inner_list.finish();
outer_list.finish();

Comment on lines +1519 to +1521
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bug. Caught by the change. These builders were not called finish.

Copy link
Copy Markdown
Contributor

@alamb alamb Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome -- i also think it is a great example of how this change improves the variant builders

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the test is still broken/useless, if it lacks any checking to verify the builder actually produced what it was asked to build?

Copy link
Copy Markdown
Member Author

@viirya viirya Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test just wants to test field validation for object builders, one failure case, one success case. It doesn't care about what it actually builds.

I said "bug" doesn't mean that the test is useless but it lacks the finish calls making it looks not following the insert - finish pattern that is the what the issue proposed to worry about.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, it can achieve the effort to address the original issue without the worries.

it does seem like the empty drop helps this one usecase

I will try and write a test that shows what happens when a builder isn't finishd and we can evaluate the issues

// Valid object should succeed
let mut list = builder.new_list();
let mut valid_obj = list.new_object();
Expand Down
Loading