Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Oct 12, 2023

Proposed changes

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@mrhhsg
Copy link
Member Author

mrhhsg commented Oct 12, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.39 seconds
stream load tsv: 555 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162260022 Bytes

}

auto pruned_predicates = read_options.column_predicates;
auto pruned = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some regression test

}

bool is_always_true(const std::pair<WrapperField*, WrapperField*>& statistic) const override {
if (statistic.first->is_null()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add || statistic.second ->is null??

col_predicates);
}

bool ColumnReader::prune_predicates_by_zone_map(std::vector<ColumnPredicate*>& predicates,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a config to disable this
and the prune should only be used in query, not in compaction.

}
}

if (pruned) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a config to disable this
and the prune should only be used in query, not in compaction.

@yiguolei yiguolei added usercase Important user case type label dev/2.0.3 labels Oct 12, 2023
@mrhhsg
Copy link
Member Author

mrhhsg commented Oct 13, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 13, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.29% (8147/22449)
Line Coverage: 28.42% (65276/229650)
Region Coverage: 27.10% (33814/124768)
Branch Coverage: 23.89% (17255/72228)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2a832ba4da34cedf8bf38ffa0737136ba28a4d8f_2a832ba4da34cedf8bf38ffa0737136ba28a4d8f/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.73 seconds
stream load tsv: 575 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162378192 Bytes

Copy link
Contributor

@jacktengg jacktengg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 283bd59 into apache:master Oct 13, 2023
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Oct 13, 2023
…he segment (apache#25366)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
xiaokang pushed a commit that referenced this pull request Oct 13, 2023
…he segment (#25366) (#25427)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
xiaokang added a commit that referenced this pull request Oct 14, 2023
yiguolei pushed a commit that referenced this pull request Oct 14, 2023
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Oct 18, 2023
…he segment (apache#25366) (apache#25427)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
yiguolei pushed a commit that referenced this pull request Oct 18, 2023
…he segment (#25582)

* [improvement](scanner) Remove the predicate that is always true for the segment (#25366) (#25427)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.

* [fix](scanner) coredump caused by 'prune_predicates_by_zone_map' (#25555)
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
…he segment (apache#25366)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…he segment (apache#25366)

By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged merge_conflict reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants