Skip to content

Conversation

@gitccl
Copy link
Contributor

@gitccl gitccl commented Mar 3, 2023

Proposed changes

Issue Number: close #17308

Problem summary

I tested the local tvf with tpch queries. First, generate lineitem datasets with 6001215 rows, and load it into lineitem table by:

insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 
from local(
        "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", 
        "backend_id" = "10003", 
        "format" = "csv", 
        "column_separator" = "|"
);

Then, run q1 and q16 tpch queries, the query result is correct.

It can also analyze the BE's log directly like:

mysql> select * from local(
        "file_path" = "log/be.out",
        "backend_id" = "10006",
        "format" = "csv")
       where c1 like "%start_time%" limit 10;
+--------------------------------------------------------+
| c1                                                     |
+--------------------------------------------------------+
| start time: 2023年 08月 07日 星期一 23:20:32 CST       |
| start time: 2023年 08月 07日 星期一 23:32:10 CST       |
| start time: 2023年 08月 08日 星期二 00:20:50 CST       |
| start time: 2023年 08月 08日 星期二 00:29:15 CST       |
+--------------------------------------------------------+

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Mar 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2023

clang-tidy review says "All clean, LGTM! 👍"

@morningman morningman self-assigned this Mar 24, 2023
@gitccl gitccl marked this pull request as ready for review May 21, 2023 03:26
@github-actions github-actions bot added the kind/docs Categorizes issue or PR as related to documentation. label May 21, 2023
@gitccl
Copy link
Contributor Author

gitccl commented May 21, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl
Copy link
Contributor Author

gitccl commented May 21, 2023

run buildall

@gitccl gitccl requested a review from morningman May 24, 2023 01:28
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

morningman
morningman previously approved these changes May 28, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman
Copy link
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 28, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 28, 2023
@gitccl
Copy link
Contributor Author

gitccl commented May 28, 2023

run buildall

morningman
morningman previously approved these changes May 28, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 28, 2023
@morningman
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.34 seconds
stream load tsv: 516 seconds loaded 74807831229 Bytes, about 138 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162135191 Bytes

morningman
morningman previously approved these changes Aug 8, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

PR approved by at least one committer and no changes requested.

glob_t glob_result;
memset(&glob_result, 0, sizeof(glob_result));

int rc = glob(pattern.c_str(), GLOB_TILDE, NULL, &glob_result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int rc = glob(pattern.c_str(), GLOB_TILDE, NULL, &glob_result);
int rc = glob(pattern.c_str(), GLOB_TILDE, nullptr, &glob_result);

Comment on lines +83 to 86

protected String getFsName(FileSplit split) {
return tableValuedFunction.getFsName();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected String getFsName(FileSplit split) {
return tableValuedFunction.getFsName();
}
@Override
protected String getFsName(FileSplit split) {
return tableValuedFunction.getFsName();
}

@morningman
Copy link
Contributor

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.16 seconds
stream load tsv: 510 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17162481757 Bytes

@morningman
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.58 seconds
stream load tsv: 510 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162138241 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 10, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@BePPPower BePPPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner area/vectorization dev/2.0.1-merged kind/docs Categorizes issue or PR as related to documentation. merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Table value function support reading local file

5 participants