-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11745: [C++] Add helper to generate random record batches by schema #9715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7f74aef to
d5c40ec
Compare
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge improvement, thanks!
I think it'd be worthwhile to add an overload of arrow::field which takes metadata but not nullable:
diff --git a/cpp/src/arrow/type_fwd.h b/cpp/src/arrow/type_fwd.h
index 230c1ff6c..578aa2d63 100644
--- a/cpp/src/arrow/type_fwd.h
+++ b/cpp/src/arrow/type_fwd.h
@@ -629,6 +629,18 @@ std::shared_ptr<Field> ARROW_EXPORT
field(std::string name, std::shared_ptr<DataType> type, bool nullable = true,
std::shared_ptr<const KeyValueMetadata> metadata = NULLPTR);
+/// \brief Create a Field instance
+///
+/// The field will be nullable.
+///
+/// \param name the field name
+/// \param type the field value type
+/// \param metadata any custom key-value metadata
+inline std::shared_ptr<Field> field(std::string name, std::shared_ptr<DataType> type,
+ std::shared_ptr<const KeyValueMetadata> metadata) {
+ return field(std::move(name), std::move(type), /*nullable=*/true, std::move(metadata));
+}
+
/// \brief Create a Schema instance
///
/// \param fields the schema's fields
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is basically ready to go, just a few more comments:
cpp/src/arrow/testing/random.h
Outdated
| @@ -358,6 +362,59 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator { | |||
| std::default_random_engine seed_rng_; | |||
| }; | |||
|
|
|||
| /// Generate a record batch with random data of the specified length. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merging
|
Any plans of applying this to parquet roundtrip tests? |
|
@emkornfield could be done, did you want to file a JIRA for that? (Looks like the current tests all use hardcoded data?) |
This adds a vaguely Hypothesis-esque helper to generate a random record batch from a list of fields, whose metadata is inspected and used to set generation parameters (e.g. min/max value).