Skip to content

Conversation

@wangbo
Copy link
Contributor

@wangbo wangbo commented Nov 15, 2023

BePPPower and others added 30 commits October 31, 2023 15:43
Since the default column separator for tvf reading csv format has changed, these cases need to be fixed.
Fix select table tablet not effective, table distributed by random.
If tabletID specified in query does not exist in this partition, skip scan partition.
…hema (apache#25844)

There is FE config `infodb_support_ext_catalog`, the default is false.
Which means that the tables in `information_schema` database will not return info of external catalog.
Because if there are too many external catalogs in Doris with lots of db/tbl (like running p0 regression tests),
querying infomation_schema db will take a long time and may causing rpc timeout.

And there is an unresolved issue that if thrift rpc timeout, the BE may be crashed in ASAN mode.
So to avoid this issue(not fix yet), this PR mainly changes:

if `infodb_support_ext_catalog` is false,
1. query info of external catalog in information_schema db is not allowed, such as

	show database like "external_catalog";
	show tables like "xxx"

2. select * from information_schema.tbl will not contains external catalogs' info

3. For external p0 regression test pipeline, set `infodb_support_ext_catalog` to true to run the tests related to external catalog
…timeunit is const (apache#25824)

this PR apache#22602 have check function.
only support date_trunc(column, const), so the second must be const literal
and no need to check time unit every row.
…pache#25938) (apache#26222)

we put bound expr into unbound group by list by mistake.
This will lead to bind twice on some exprssion.
Since binding is not idempotent, below exception will be thrown for sql

```sql
select k5 / k5 as nu, sum(k1) from test group by nu order by nu nulls first
```

```
Caused by: org.apache.doris.nereids.exceptions.AnalysisException: Input slot(s) not in child's output: k5#5 in plan: LogicalProject[176] ( distinct=false, projects=[(cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`apache#14, sum(k1)apache#15], excepts=[] ), child output is: [nu#16, sum(k1)apache#15]
plan tree:
LogicalProject[176] ( distinct=false, projects=[(cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`apache#14, sum(k1)apache#15], excepts=[] )
+--LogicalAggregate[168] ( groupByExpr=[nu#16], outputExpr=[nu#16, sum(k1#1) AS `sum(k1)`apache#15], hasRepeat=false )
   +--LogicalProject[156] ( distinct=false, projects=[k1#1, (cast(k5#5 as DECIMALV3(16, 10)) / k5#5) AS `nu`apache#16], excepts=[] )
      +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.test, indexName=test, selectedIndexId=503229, preAgg=OFF, Aggregate function sum(k1) contains key column k1. )
    at org.apache.doris.nereids.rules.analysis.CheckAfterRewrite.checkAllSlotReferenceFromChildren(CheckAfterRewrite.java:108) ~[classes/:?]
```
ByteYue and others added 29 commits November 12, 2023 21:55
…apache#26890)

backport apache#26435
Improve the accuracy of sample stats collection. For non distribution columns, use
`n*d / (n - f1 + f1*n/N)`

where `f1` is the number of distinct values that occurred exactly once in our sample of n rows (from a total of N),
and `d` is the total number of distinct values in the sample.

For distribution columns, use `ndv(n) * fraction of tablets sampled` for NDV.

For very large tablet to sample, use limit to control the total lines to scan (for non key column only, because key column is sorted and will be inaccurate using limit).
…pache#26926)

Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
…roovy (apache#26925)

Co-authored-by: stephen <hello-stephen@qq.com>
@wangbo wangbo closed this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.