Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Mar 15, 2021

This adds a vaguely Hypothesis-esque helper to generate a random record batch from a list of fields, whose metadata is inspected and used to set generation parameters (e.g. min/max value).

@github-actions
Copy link

@lidavidm lidavidm force-pushed the arrow-11745 branch 5 times, most recently from 7f74aef to d5c40ec Compare March 16, 2021 18:01
@lidavidm lidavidm requested a review from bkietz March 16, 2021 21:43
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge improvement, thanks!

I think it'd be worthwhile to add an overload of arrow::field which takes metadata but not nullable:

diff --git a/cpp/src/arrow/type_fwd.h b/cpp/src/arrow/type_fwd.h
index 230c1ff6c..578aa2d63 100644
--- a/cpp/src/arrow/type_fwd.h
+++ b/cpp/src/arrow/type_fwd.h
@@ -629,6 +629,18 @@ std::shared_ptr<Field> ARROW_EXPORT
 field(std::string name, std::shared_ptr<DataType> type, bool nullable = true,
       std::shared_ptr<const KeyValueMetadata> metadata = NULLPTR);
 
+/// \brief Create a Field instance
+///
+/// The field will be nullable.
+///
+/// \param name the field name
+/// \param type the field value type
+/// \param metadata any custom key-value metadata
+inline std::shared_ptr<Field> field(std::string name, std::shared_ptr<DataType> type,
+                                    std::shared_ptr<const KeyValueMetadata> metadata) {
+  return field(std::move(name), std::move(type), /*nullable=*/true, std::move(metadata));
+}
+
 /// \brief Create a Schema instance
 ///
 /// \param fields the schema's fields

@lidavidm lidavidm changed the title ARROW-11745: [C++] Add helper to generate random record batches by schema ARROW-11745: [C++] WIP: Add helper to generate random record batches by schema Mar 17, 2021
@lidavidm lidavidm changed the title ARROW-11745: [C++] WIP: Add helper to generate random record batches by schema ARROW-11745: [C++] Add helper to generate random record batches by schema Mar 17, 2021
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is basically ready to go, just a few more comments:

@@ -358,6 +362,59 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator {
std::default_random_engine seed_rng_;
};

/// Generate a record batch with random data of the specified length.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lidavidm lidavidm marked this pull request as ready for review March 19, 2021 11:50
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merging

@bkietz bkietz closed this in c7b8abf Mar 19, 2021
@emkornfield
Copy link
Contributor

Any plans of applying this to parquet roundtrip tests?

@lidavidm
Copy link
Member Author

@emkornfield could be done, did you want to file a JIRA for that? (Looks like the current tests all use hardcoded data?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants