-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10259: [Rust] Add custom metadata to Field #9025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3623c7e to
0ec54e3
Compare
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @mqy. Please enable the relevant integration tests (see 18b9281#diff-16170ebfd75c15cd0ef92a3e4f35dba353edcb11352101c363fc35cf3729267d). They should be marked with a TODO :)
rust/arrow/src/datatypes.rs
Outdated
| @@ -1903,9 +1929,20 @@ mod tests { | |||
|
|
|||
| #[test] | |||
| fn serde_struct_type() { | |||
| let kv_array = [("k".to_string(), "v".to_string())]; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should serialise the metadata to a JSON struct in the form of:
{
"metadata": [
{"key": "k", "value": "v"}
]
}There are integration tests that we need to enable, I'll find the relevant file in the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in impl From<&Field> for ArrowJsonField, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nevi-me I just saw the comment "The JSON can either be an Object or an Array of Objects" from ARROW-8883: [Rust] [Integration] Enable more tests
That's interesting! But I haven't seen the Object example from testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_custom_metadata.json.gz. So would you please give some pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember where I encountered it, might be in the 0.14.1 test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember where I encountered it, might be in the
0.14.1test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.
@nevi-me After comment out, datatypes::tests::schema_json failed, but the test data are not from testing module, instead are manually constructed. Files with "metadata" in the file name can only be found in these directories: 1.0.0-bigendian and 1.0.0-littleendian/. The Object format for Schema metadata came from #5907 by @grundprinzip
Anyway, I think it's no harm to support parsing Field metadata from JSON Object(Map), just as what Schema did.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR
@nevi-me thanks! |
Codecov Report
@@ Coverage Diff @@
## master #9025 +/- ##
==========================================
- Coverage 82.57% 82.56% -0.02%
==========================================
Files 204 204
Lines 50330 50487 +157
==========================================
+ Hits 41561 41684 +123
- Misses 8769 8803 +34
Continue to review full report at Codecov.
|
|
Hey @mqy, I'm back in the city, and have access to my desktop; so I'll be able to review this PR and help you enable integration tests during the week. |
Welcome back! |
|
Please enable |
|
@nevi-me the integration test passed at commit "ipc: build field metadata", please take some time to review, thanks! |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patiently working on this @mqy. I'm happy with the implementation.
I have one change on with_metadata(&mut self, ...) -> Self;, where we should change the signature to not return the cloned Self.
I'll think about whether we should keep the {"k": "v"} format in the long run, or whether to standardise with {"key": "k", "value": "v"}. I might enounter more examples of this in the coming weeks as I work on testing the Rust Parquet work with other languages.
@alamb @jorgecarleitao may I please have a concurring review, more to check code style.
@carols10cents this might also be of interest to you, not sure if we have Flight tests that require Field to have custom metadata.
| } | ||
|
|
||
| /// Merge field into self if it is compatible. Struct will be merged recursively. | ||
| /// NOTE: `self` may be updated to unexpected state in case of merge failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, I'm happy with someone who encounters an issue here in future, providing an alternative implementation that doesn't partially mutate &mut self if there's a failure.
@houqp any opinion here, as you contributed this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this "good now, better later" strategy is a good one. I agree with @nevi-me 's suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i agree as well. would be better to have an atomic merge implementation done as a separate PR in the future.
rust/arrow/src/datatypes.rs
Outdated
| @@ -1903,9 +1929,20 @@ mod tests { | |||
|
|
|||
| #[test] | |||
| fn serde_struct_type() { | |||
| let kv_array = [("k".to_string(), "v".to_string())]; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds the missing custom metadata to data type
Field-- the requirement is specified by ARROW-10259.To adapt existing tests for custom metadata, I updated Field's display: print the struct with debug, this will be improved in later PRs.