Skip to content

Conversation

@mqy
Copy link
Contributor

@mqy mqy commented Dec 28, 2020

This PR adds the missing custom metadata to data type Field -- the requirement is specified by ARROW-10259.

To adapt existing tests for custom metadata, I updated Field's display: print the struct with debug, this will be improved in later PRs.

@github-actions
Copy link

@mqy mqy marked this pull request as draft December 28, 2020 08:06
@mqy mqy changed the title ARROW-10259: [Rust] Add custom_metadata to Field ARROW-10259: [Rust] Add custom metadata to Field Dec 28, 2020
@mqy mqy force-pushed the field-metadata branch 2 times, most recently from 3623c7e to 0ec54e3 Compare December 28, 2020 10:12
@mqy mqy marked this pull request as ready for review December 28, 2020 10:14
@mqy mqy marked this pull request as draft December 28, 2020 11:44
@nevi-me nevi-me self-requested a review December 29, 2020 07:02
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @mqy. Please enable the relevant integration tests (see 18b9281#diff-16170ebfd75c15cd0ef92a3e4f35dba353edcb11352101c363fc35cf3729267d). They should be marked with a TODO :)

@@ -1903,9 +1929,20 @@ mod tests {

#[test]
fn serde_struct_type() {
let kv_array = [("k".to_string(), "v".to_string())];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should serialise the metadata to a JSON struct in the form of:

{
  "metadata": [
    {"key": "k", "value": "v"}
  ]
}

There are integration tests that we need to enable, I'll find the relevant file in the repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in impl From<&Field> for ArrowJsonField, thanks!

Copy link
Contributor Author

@mqy mqy Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevi-me I just saw the comment "The JSON can either be an Object or an Array of Objects" from ARROW-8883: [Rust] [Integration] Enable more tests

That's interesting! But I haven't seen the Object example from testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_custom_metadata.json.gz. So would you please give some pointers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where I encountered it, might be in the 0.14.1 test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.

Copy link
Contributor Author

@mqy mqy Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where I encountered it, might be in the 0.14.1 test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.

@nevi-me After comment out, datatypes::tests::schema_json failed, but the test data are not from testing module, instead are manually constructed. Files with "metadata" in the file name can only be found in these directories: 1.0.0-bigendian and 1.0.0-littleendian/. The Object format for Schema metadata came from #5907 by @grundprinzip

Anyway, I think it's no harm to support parsing Field metadata from JSON Object(Map), just as what Schema did.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR

@mqy
Copy link
Contributor Author

mqy commented Dec 29, 2020

Thanks for working on this @mqy. Please enable the relevant integration tests (see 18b9281#diff-16170ebfd75c15cd0ef92a3e4f35dba353edcb11352101c363fc35cf3729267d). They should be marked with a TODO :)

@nevi-me thanks!
I had looked at recent PRs for reader/writer, tried to figure out what to do, finally got lost :( Your feed back helps a lot!.

@codecov-io
Copy link

codecov-io commented Jan 1, 2021

Codecov Report

Merging #9025 (d70f6bd) into master (5f2495d) will decrease coverage by 0.01%.
The diff coverage is 82.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9025      +/-   ##
==========================================
- Coverage   82.57%   82.56%   -0.02%     
==========================================
  Files         204      204              
  Lines       50330    50487     +157     
==========================================
+ Hits        41561    41684     +123     
- Misses       8769     8803      +34     
Impacted Files Coverage Δ
rust/arrow/src/array/builder.rs 85.89% <ø> (ø)
rust/datafusion/src/physical_plan/planner.rs 77.89% <ø> (ø)
rust/datafusion/src/sql/planner.rs 84.05% <33.33%> (+0.08%) ⬆️
rust/arrow/src/ipc/convert.rs 89.77% <47.61%> (-2.14%) ⬇️
rust/arrow/src/util/integration_util.rs 66.59% <81.96%> (-0.89%) ⬇️
rust/arrow/src/datatypes.rs 78.69% <86.99%> (+1.02%) ⬆️
rust/datafusion/src/logical_plan/dfschema.rs 85.52% <100.00%> (+0.06%) ⬆️
rust/datafusion/src/logical_plan/display.rs 94.93% <100.00%> (ø)
rust/datafusion/src/logical_plan/plan.rs 82.48% <100.00%> (ø)
rust/datafusion/src/physical_plan/expressions.rs 84.48% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a911e67...d70f6bd. Read the comment docs.

@mqy mqy requested a review from nevi-me January 4, 2021 09:17
@mqy mqy marked this pull request as ready for review January 4, 2021 09:17
@nevi-me
Copy link
Contributor

nevi-me commented Jan 4, 2021

Hey @mqy, I'm back in the city, and have access to my desktop; so I'll be able to review this PR and help you enable integration tests during the week.

@mqy
Copy link
Contributor Author

mqy commented Jan 5, 2021

Hey @mqy, I'm back in the city, and have access to my desktop; so I'll be able to review this PR and help you enable integration tests during the week.

Welcome back!

@nevi-me
Copy link
Contributor

nevi-me commented Jan 5, 2021

Please enable generate_custom_metadata_case()in dev/archery/archery/integration/datagen.py. You'll see the Rust case disabled.
That integration test fails with

Converting /tmp/arrow-integration-kj0llr2i/generated_custom_metadata.json to /tmp/tmpecqxfv9u/70af42de_generated_custom_metadata.json_as_file
Error: ParseError("Field \'metadata\' must have exact one json map for a key-value pair")

@mqy mqy marked this pull request as draft January 6, 2021 10:15
@mqy mqy marked this pull request as ready for review January 6, 2021 11:02
@mqy
Copy link
Contributor Author

mqy commented Jan 6, 2021

@nevi-me the integration test passed at commit "ipc: build field metadata", please take some time to review, thanks!

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patiently working on this @mqy. I'm happy with the implementation.

I have one change on with_metadata(&mut self, ...) -> Self;, where we should change the signature to not return the cloned Self.

I'll think about whether we should keep the {"k": "v"} format in the long run, or whether to standardise with {"key": "k", "value": "v"}. I might enounter more examples of this in the coming weeks as I work on testing the Rust Parquet work with other languages.

@alamb @jorgecarleitao may I please have a concurring review, more to check code style.

@carols10cents this might also be of interest to you, not sure if we have Flight tests that require Field to have custom metadata.

}

/// Merge field into self if it is compatible. Struct will be merged recursively.
/// NOTE: `self` may be updated to unexpected state in case of merge failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, I'm happy with someone who encounters an issue here in future, providing an alternative implementation that doesn't partially mutate &mut self if there's a failure.

@houqp any opinion here, as you contributed this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this "good now, better later" strategy is a good one. I agree with @nevi-me 's suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i agree as well. would be better to have an atomic merge implementation done as a separate PR in the future.

@@ -1903,9 +1929,20 @@ mod tests {

#[test]
fn serde_struct_type() {
let kv_array = [("k".to_string(), "v".to_string())];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR

@nevi-me nevi-me closed this in 98159f1 Jan 7, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice change -- thank you @mqy and @nevi-me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants