Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Nov 17, 2022

Proposed changes

Issue Number: close #xxx

Problem summary

When executing create table as select stmt,
the varchar/char/string type of column in created table will be unified to string type.

Because when select from external table (mysql/pg, etc), the length of varchar in external database
is calculated by "char" length, not "byte" length.
So if there is a column with varchar(10) in external table, then there will be a same varchar(10)
in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS.

Change to string will not impact performance of the capacity of disk storage.
And notice that if a string type column is the first column, it will be changed to varchar(65535),
because we do not allow string type column as sort key column.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the kind/docs Categorizes issue or PR as related to documentation. label Nov 17, 2022
@hello-stephen
Copy link
Contributor

hello-stephen commented Nov 17, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.46 seconds
load time: 463 seconds
storage size: 17171858026 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221118021354_clickbench_pr_48377.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 2c4236f into apache:master Nov 18, 2022
FreeOnePlus pushed a commit to FreeOnePlus/doris that referenced this pull request Nov 27, 2022
…4382)

When executing create table as select stmt,
the varchar/char/string type of column in created table will be unified to string type.

Because when select from external table (mysql/pg, etc), the length of varchar in external database
is calculated by "char" length, not "byte" length.
So if there is a column with varchar(10) in external table, then there will be a same varchar(10)
in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS.

Change to string will not impact performance of the capacity of disk storage.
And notice that if a string type column is the first column, it will be changed to varchar(65535),
because we do not allow string type column as sort key column.
@morningman morningman mentioned this pull request Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/behavior-changed kind/docs Categorizes issue or PR as related to documentation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants