Skip to content

Conversation

@Jibing-Li
Copy link
Contributor

show column cached stats sometimes show wrong min/max value:

mysql> show column cached stats hive.tpch100.region;
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | updated_time |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| r_regionkey | 5.0   | 5.0  | 0.0      | 24.0      | 4.0           | N/A  | N/A  | null         |
| r_comment   | 5.0   | 5.0  | 0.0      | 396.0     | 66.0          | N/A  | N/A  | null         |
| r_name      | 5.0   | 5.0  | 0.0      | 40.8      | 6.8           | N/A  | N/A  | null         |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+

This pr is to fix this bug. It is because while transferring ColumnStatistic object to JSON, it doesn't contain the minExpr and maxExpr attribute.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Jibing-Li Jibing-Li marked this pull request as ready for review August 16, 2023 08:24
@Jibing-Li
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.72 seconds
stream load tsv: 539 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 97.4 seconds inserted 10000000 Rows, about 102K ops/s
storage size: 17162251743 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 17, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 3fe419e into apache:master Aug 17, 2023
@Jibing-Li Jibing-Li deleted the min branch August 17, 2023 08:49
xiaokang pushed a commit that referenced this pull request Aug 17, 2023
`show column cached stats` sometimes show wrong min/max value:
```
mysql> show column cached stats hive.tpch100.region;
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | updated_time |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| r_regionkey | 5.0   | 5.0  | 0.0      | 24.0      | 4.0           | N/A  | N/A  | null         |
| r_comment   | 5.0   | 5.0  | 0.0      | 396.0     | 66.0          | N/A  | N/A  | null         |
| r_name      | 5.0   | 5.0  | 0.0      | 40.8      | 6.8           | N/A  | N/A  | null         |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
```
This pr is to fix this bug. It is because while transferring ColumnStatistic object to JSON, it doesn't contain the minExpr and maxExpr attribute.
airborne12 pushed a commit to airborne12/apache-doris that referenced this pull request Aug 21, 2023
`show column cached stats` sometimes show wrong min/max value:
```
mysql> show column cached stats hive.tpch100.region;
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| column_name | count | ndv  | num_null | data_size | avg_size_byte | min  | max  | updated_time |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
| r_regionkey | 5.0   | 5.0  | 0.0      | 24.0      | 4.0           | N/A  | N/A  | null         |
| r_comment   | 5.0   | 5.0  | 0.0      | 396.0     | 66.0          | N/A  | N/A  | null         |
| r_name      | 5.0   | 5.0  | 0.0      | 40.8      | 6.8           | N/A  | N/A  | null         |
+-------------+-------+------+----------+-----------+---------------+------+------+--------------+
```
This pr is to fix this bug. It is because while transferring ColumnStatistic object to JSON, it doesn't contain the minExpr and maxExpr attribute.
@xiaokang xiaokang mentioned this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.1-merged p0_w reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants