-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Doc] Add docs to OLAP_SCAN_NODE query profile #3808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yangzhg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| ``` | ||
| OLAP_SCAN_NODE (id=0): (Active: 4.050ms, non-child: 35.68%) | ||
| -BitmapIndexFilterCount: 0 # Number of rows filtered by bitmap index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already change this param name above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
|
||
| ``` | ||
| OLAP_SCAN_NODE (id=0):(Active: 4.050ms, non-child: 35.68%) | ||
| - BitmapIndexFilterCount: 0 # 利用 bitmap 索引过滤掉的行数。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already change this param name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
|
||
| 通过以上指标,可以大致分析出存储引擎处理的行数以及最终过滤后的结果行数大小。通过 `Rows***Filtered` 这组指标,也可以分析查询条件是否下推到了存储引擎,以及不同索引的过滤效果。 | ||
|
|
||
| 如果 `RawRowsRead` 和 `RowsRead` 差距较大,则说明大量的行被聚合,而聚合可能比较耗时。如果 `RowsRead` 和 `RowsReturned` 差距较大,则说明很多行在 Scanner 中进行了过滤。这说明很多选择度高的谓词条件并没有推送给存储引擎。而在 Scanner 中的过滤效率会比在存储引擎中过滤效率差。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是说,有些选择度高的谓词条件没有在存储引擎中通过索引过滤吧?而Scanner 中的过滤效率比存储引擎中利用索引过滤效率差。
| * RowsBloomFilterFiltered | ||
| * RowsStatsFiltered | ||
| * RowsDelFiltered | ||
| * RawRowsRead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些指标 V1 也有吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V1 也有,但是v1比较乱,就不写了
| * RawRowsRead | ||
| * RowsRead | ||
| * RowsReturned | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是应该先总说一下,存储引擎先根据索引过滤数据,然后scan 再过滤一次。然后在存储引擎的部分再把 v1 和 v2分着说。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我增加一下说明。这里不写v1了
|
|
||
| OlapScanNode 的 Profile 通常用于分析数据扫描的效率。除了前面介绍的通过行数相关信息可以推断谓词条件下推和索引使用情况外,还可以通过以下几个方面进行简单的分析。 | ||
|
|
||
| * 首先,很多指标,如 `IOTimer`,`BlockFetchTime` 等都是所有 Scanner 线程指标的累加,因此数值可能会比较大。并且因为 Scanner 线程是异步读取数据的,所以这些累加指标只能反映 Scanner 累加的工作时间,并不直接代表 ScanNode 的耗时。ScanNode 在整个查询计划中的耗时占比为 `Active` 字段记录的值。有时会出现比如 `IOTimer` 有几十秒,而 `Active` 实际只有几秒钟。这种情况通常因为:1. `IOTimer` 为多个 Scanner 的累加时间,而 Scanner 数量较多。2. 上层节点比较耗时。比如上层节点耗时 100秒,而底层 ScanNode 只需 10秒。则反映在 `Active` 的字段可能只有几毫秒。因为在上层处理数据的同时,ScanNode 已经异步的进行了数据扫描并准备好了数据。当上层节点从 ScanNode 获取数据时,可以获取到已经准备好的数据,因此 Active 时间很短。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一行的格式是不是有问题?
|
|
||
| 通过以上指标,可以大致分析出存储引擎处理的行数以及最终过滤后的结果行数大小。通过 `Rows***Filtered` 这组指标,也可以分析查询条件是否下推到了存储引擎,以及不同索引的过滤效果。 | ||
|
|
||
| 如果 `RawRowsRead` 和 `RowsRead` 差距较大,则说明大量的行被聚合,而聚合可能比较耗时。如果 `RowsRead` 和 `RowsReturned` 差距较大,则说明很多行在 Scanner 中进行了过滤。这说明很多选择度高的谓词条件并没有推送给存储引擎。而在 Scanner 中的过滤效率会比在存储引擎中过滤效率差。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉这一段也像简单分析?要不 直接把说完指标的含义,直接到简单分析?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面一节是单独描述了下过滤行数相关的说明。
简单分析 主要是从整体做分析。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好吧
EmmyMiao87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那就先这样吧
ISSUE: #3365