Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 5, 2023

Proposed changes

Issue Number: close #xxx

  1. All base types in Hive are supported except for binary .
    Create table and insert data in hive:
CREATE TABLE ywtest.test (
  a ARRAY<TINYINT>,
  b ARRAY<SMALLINT>,
  c ARRAY<INT>,
  d ARRAY<BIGINT>,
  e ARRAY<BOOLEAN>,
  f ARRAY<FLOAT>,
  g ARRAY<DOUBLE>,
  h ARRAY<STRING>,
  i ARRAY<TIMESTAMP>,
  j ARRAY<DATE>
)
stored as textfile;

INSERT INTO ywtest.test VALUES
(
    ARRAY(tinyint(1), tinyint(2), tinyint(3)), 
    ARRAY(smallint(10), smallint(20), smallint(30), smallint(40)), 
    ARRAY(100, 200, 300), 
    ARRAY(bigint(100000000000000), bigint(20000000000000), bigint(30000000000000), bigint(40000000000000)), 
    ARRAY(true, false, true),
    ARRAY(float(1.23), float(4.56), float(7.89)), 
    ARRAY(double(10.1), double(20.2), double(30.3)), 
    ARRAY("apple", "banana", "orange"),
    ARRAY(TIMESTAMP("2023-07-04 12:00:00"), TIMESTAMP("2023-07-05 12:00:00"), TIMESTAMP("2023-07-06 12:00:00")),
    ARRAY(date("2023-07-04"), date("2023-07-05"), date("2023-07-06"))
),
(
    ARRAY(tinyint(10), tinyint(20), tinyint(30)), 
    ARRAY(smallint(100), smallint(200), smallint(300), smallint(400)), 
    ARRAY(1000, 2000, 3000), 
    ARRAY(bigint(1000000000000000), bigint(200000000000000), bigint(300000000000000), bigint(400000000000000)), 
    ARRAY(true, true,true, true),
    ARRAY(float(12.3), float(45.6), float(78.9)), 
    ARRAY(double(100.1), double(200.2), double(300.3)), 
    ARRAY("abc", "eeee", "sdads"),
    ARRAY(TIMESTAMP("2023-07-02 12:00:00"), TIMESTAMP("2023-07-02 12:00:00"), TIMESTAMP("2023-07-02 12:00:00")),
    ARRAY(date("2021-07-04"), date("2021-07-05"), date("2021-07-06"))
),
(
    ARRAY(tinyint(1), tinyint(2), tinyint(3)), 
    ARRAY(smallint(10), smallint(20), smallint(30), smallint(40)), 
    ARRAY(100, 200, 300), 
    ARRAY(bigint(100000000000000), bigint(20000000000000), bigint(30000000000000), bigint(40000000000000)), 
    ARRAY(true, false, true),
    ARRAY(float(12.3), float(45.6), float(78.9)), 
    ARRAY(double(100.1), double(200.2), double(300.3)), 
    ARRAY("abc", "eeee", "sdads"),
    ARRAY(TIMESTAMP("2023-07-02 12:00:00"), TIMESTAMP("2023-07-02 12:00:00"), TIMESTAMP("2023-07-02 12:00:00")),
    ARRAY(date("2021-07-04"), date("2021-07-05"), date("2021-07-06"))
);

Display the results in doris:

mysql> select * from ywtest.test\G;
*************************** 1. row ***************************
a: [1, 2, 3]
b: [10, 20, 30, 40]
c: [100, 200, 300]
d: [100000000000000, 20000000000000, 30000000000000, 40000000000000]
e: [1, 0, 1]
f: [1.23, 4.56, 7.89]
g: [10.1, 20.2, 30.3]
h: ["apple", "banana", "orange"]
i: [2023-07-04 12:00:00.000000, 2023-07-05 12:00:00.000000, 2023-07-06 12:00:00.000000]
j: [2023-07-04, 2023-07-05, 2023-07-06]
*************************** 2. row ***************************
a: [10, 20, 30]
b: [100, 200, 300, 400]
c: [1000, 2000, 3000]
d: [1000000000000000, 200000000000000, 300000000000000, 400000000000000]
e: [1, 1, 1, 1]
f: [12.3, 45.6, 78.9]
g: [100.1, 200.2, 300.3]
h: ["abc", "eeee", "sdads"]
i: [2023-07-02 12:00:00.000000, 2023-07-02 12:00:00.000000, 2023-07-02 12:00:00.000000]
j: [2021-07-04, 2021-07-05, 2021-07-06]
*************************** 3. row ***************************
a: [1, 2, 3]
b: [10, 20, 30, 40]
c: [100, 200, 300]
d: [100000000000000, 20000000000000, 30000000000000, 40000000000000]
e: [1, 0, 1]
f: [12.3, 45.6, 78.9]
g: [100.1, 200.2, 300.3]
h: ["abc", "eeee", "sdads"]
i: [2023-07-02 12:00:00.000000, 2023-07-02 12:00:00.000000, 2023-07-02 12:00:00.000000]
j: [2021-07-04, 2021-07-05, 2021-07-06]
3 rows in set (0.02 sec)
  1. Support for specifying delimiters。
hive> show create table ywtest.cyw14;
OK
CREATE TABLE `ywtest.cyw14`(
  `id` int, 
  `arr` array<array<int>>, 
  `info` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'collection.delim'=',.', 
  'field.delim'='\t', 
  'line.delim'='\n', 
  'serialization.format'='\t') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/ywtest.db/cyw14'
TBLPROPERTIES (
  'transient_lastDdlTime'='1688454532')
Time taken: 0.095 seconds, Fetched: 19 row(s)

Display the results in doris:

mysql> select * from ywtest.cyw14;
+------+-----------------------------------+------+
| id   | arr                               | info |
+------+-----------------------------------+------+
|    1 | [[1, 2, 3], [4, 5, 6], [8, 9, 0]] |   11 |
|    2 | [[11, 22, 33], [44, 55, 66]]      |   22 |
+------+-----------------------------------+------+
2 rows in set (0.02 sec)
  1. Support array multi-level nesting.
    Display the results in doris:
mysql> select * from ywtest.cyw3;
+-----------------------------------------------------------------+----------------------------------------------------+
| info                                                            | str                                                |
+-----------------------------------------------------------------+----------------------------------------------------+
| [[[1], [2, 4]], [[12222, 12313, 123131, 4211], [1, 2], [2, 3]]] | [["hello", "world"], ["hive", "hive", "hivetext"]] |
+-----------------------------------------------------------------+----------------------------------------------------+
1 row in set (0.01 sec)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Jul 5, 2023
@hubgeter hubgeter marked this pull request as draft July 5, 2023 08:53
@hubgeter hubgeter force-pushed the hive-textfile-array branch from 264eedf to 4dcc5ba Compare July 6, 2023 06:19
@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hubgeter hubgeter marked this pull request as ready for review July 6, 2023 06:44
@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2023

clang-tidy review says "All clean, LGTM! 👍"

@morningman morningman added the dev/2.0.0 2.0.0 release label Jul 7, 2023
@morningman
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 9, 2023

run buildall

@hubgeter hubgeter force-pushed the hive-textfile-array branch from 2c8f336 to 2e047a2 Compare July 10, 2023 02:11
@hubgeter
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hubgeter hubgeter force-pushed the hive-textfile-array branch from 2e047a2 to 651de6c Compare July 10, 2023 07:53
@hubgeter
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner dev/2.0.0-merged kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants