Skip to content

[Bug]Parquet map/list/struct structure recognize error #4969

@xinghuayu007

Description

@xinghuayu007

Describe the bug
When a parquet file contains Map/List/Struct structrue, Doris can not recognized the column correctly, and throws exception 'Invalid column: xxxx', that means Doris can not find the column. The Map structure will be recognized into two columns: key and value.

To Reproduce
Steps to reproduce the behavior:

  1. create a hive table with parquet format:
    CREATE TABLE parquet_test (
    id int,
    str string,
    mp MAP<STRING,STRING>,
    lst ARRAY,
    strct STRUCT<A:STRING,B:STRING>)
    PARTITIONED BY (part string)
    STORED AS PARQUET;
  2. insert some data intot the table:
    insert into parquet_test partition(part="12345") select 123, NULL, str_to_map("key1:1234,key2:3456"), array('123', '345'), named_struct('A','123','B','345') from binlog_test;
  3. load the hive table into a Doris table:
    LOAD LABEL test_label ( DATA INFILE("hdfs://path/parquet_file") INTO TABLE doris_table COLUMNS TERMINATED BY "\x01" FORMAT AS "parquet" (id, str, mp, lst, strct) ) ...........
  4. See error
    W1126 10:34:36.547367 9618 parquet_reader.cpp:141] Invalid Column Name: mp

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions