Skip to content

Conversation

@eldenmoon
Copy link
Member

  1. introduce a new type VARIANT to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
  2. introduce a new expression SchemaChangeExpr for doing schema change for extensibility

Proposed changes

Issue Number: close #17493

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization kind/test labels Mar 7, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@eldenmoon
Copy link
Member Author

run buildall

29: optional bool is_nullable

30: optional TJsonLiteral json_literal
31: optional TSchemaChangeInfo schema_change_info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It't somewhat strange that Expr has schema_change_info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to TSchemaChangeExpr

msg.node_type = TExprNodeType.SCHEMA_CHANGE_EXPR;
// set type info
TTypeDesc desc = new TTypeDesc();
desc.setTypes(new ArrayList<TTypeNode>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the function of setTypes to an emply list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variant.toThrift(desc) will add TTypeNode to the empty list

}
size_t operator()(const Int64& x) {
// Only support Int32 and Int64
// Only Int64 at present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete support for Int32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since int32 is not enough to hold values when user first load int value then load bigint value, the bigint values will be overflow

column_object.finalize();
// flatten object columns for the purpose of extracting static columns and
// fill default values missing in static columns
schema_util::unfold_object(columns.size() - 1, columns, _slot_desc_index, slot_descs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no flatten_object before, why add unfold_object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatten renamed to unfold

1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run p0

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qidaye qidaye merged commit 9b7596f into apache:master Mar 13, 2023
luwei16 pushed a commit to luwei16/Doris that referenced this pull request Apr 7, 2023
* [WIP](dynamic-table) support dynamic schema table (apache#16335)

Issue Number: close apache#16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.

* [improve](dynamic-table) change `addColumns` RPC interface fields from `required` to `optional` and and config doc (apache#16632)

* [doc](dynamic-table) Add docs for dynamic-table (apache#16669)

* [improve](dynamic table) refine SegmentWriter columns writer generate (apache#16816)

* [improve](dynamic table) refine SegmentWriter columns writer generate

```
Dynamic Block consists of two parts, dynamic part of columns and static part of columns
static   dynamic
| ----- | ------- |
the static ones are original _tablet_schame columns
the dynamic ones are auto generated and extended from file scan.
```
**We should only consisder to use Block info to generte columns when it's a dynamic table load procudure.**
And seperate the static ones and dynamic ones

* test

* [typo](docs)fix dynamic Table version label (apache#16895)

* [Feature](Dynamic schema table) step1 support schema change expression (apache#17494)

1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility

* Fix compile

* [Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null

Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior.
Second column alignment logic has bug :

```
template <typename ColumnInserterFn>
void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt,
                                    ColumnInserterFn inserter) {
    CHECK(dst.is_finalized() && src.is_finalized());
    // Use rows() here instead of size(), since size() will check_consistency
    // but we could not check_consistency since num_rows will be upgraded even
    // if src and dst is empty, we just increase the num_rows of dst and fill
    // num_rows of default values when meet new data
    size_t num_rows = dst.rows();
```

---------

Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
apache#17494)

1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] (Dynamic table) expression support

3 participants