Skip to content

ARROW-7960: [C++][Parquet][WIP] Add schema translation for missing logicalTypes#6758

Closed
igorcalabria wants to merge 1 commit intoapache:masterfrom
igorcalabria:parquet-map-support
Closed

ARROW-7960: [C++][Parquet][WIP] Add schema translation for missing logicalTypes#6758
igorcalabria wants to merge 1 commit intoapache:masterfrom
igorcalabria:parquet-map-support

Conversation

@igorcalabria
Copy link

WIP PR for adding missing LogicalTypes listed in https://issues.apache.org/jira/browse/ARROW-7960.
Adding direct translation to map broke the reader because there's no special case for maps. It should be straight forward to reuse the struct reader for it

@github-actions
Copy link

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add schema tests to confirm this code?

Also, for the different list types in the original JIRA I mentioned determining them empirically, but if the schema is written to parquet (via metadata) we should be able to determine that exactly. Do you think you could look into that under a separate issue?


const auto& key_val_group = static_cast<const GroupNode&>(key_val_node);

if (key_val_group.field_count() != 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps it seems that value can be omitted in this case I think we should probably use Null arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm thoughts on omitted value columns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems that omitted value could be used when a "Map" represents a "Set" value. Using Null for the value type sounds right to me

RETURN_NOT_OK(arrow::GroupToStruct(key_val_group, max_def_level, max_rep_level, ctx,
out, child_field));

const auto key_field = child_field->children[0].field;
Copy link
Contributor

@emkornfield emkornfield Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use auto here (spell out the type) and below.

const SchemaField* parent, SchemaField* out) {
if (node.logical_type()->is_list()) {
return ListToSchemaField(node, max_def_level, max_rep_level, ctx, parent, out);
} else if (node.logical_type()->is_map()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this handles the backward compatibility case of "MAP_KEY_VALUE" listed in the spec


if (!key_val_node.is_repeated()) {
return Status::NotImplemented(
"Non-repeated nodes in MAP-annotated group are not supported.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Non-repeated nodes in MAP-annotated group are not supported.");
"Non-repeated key_value node in MAP-annotated group is not supported.");

const Node& key_val_node = *group.field(0);

if (!key_val_node.is_repeated()) {
return Status::NotImplemented(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NotImplemented here is correct, this should be considered invalid input. Consider providing the offending field name (and additional metadata if possible) in all these error messages, it will make determining errors easier.

@emkornfield
Copy link
Contributor

Adding direct translation to map broke the reader because there's no special case for maps.

What do you mean broke? I would have expected more tests to fail in CI if things were actually broken.


if (key_val_group.field_count() != 2) {
return Status::NotImplemented(
"Only groups with 2 fields are supported on MAP-annoted key val group");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: annotated

@wesm
Copy link
Member

wesm commented Mar 31, 2020

What do you mean broke? I would have expected more tests to fail in CI if things were actually broken.

I'm guessing that he has a file containing Map data and it failed to instantiate a column reader.

This patch will need some unit tests for the different Map cases

@wesm
Copy link
Member

wesm commented Jul 11, 2020

I'm closing this until it can be picked up again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants