From 0ec3a05ceb77753490fe0441a5e748652457da45 Mon Sep 17 00:00:00 2001 From: HappenLee Date: Thu, 28 May 2020 20:57:55 +0800 Subject: [PATCH 1/2] [Bug] Fix bug that runningprofile show time problem in FE web page. [Doc] Add runningprofile doc. --- docs/en/administrator-guide/runing-profile.md | 147 +++++++++++++++++ .../administrator-guide/running-profile.md | 151 ++++++++++++++++++ .../apache/doris/common/util/DebugUtil.java | 2 +- 3 files changed, 299 insertions(+), 1 deletion(-) create mode 100644 docs/en/administrator-guide/runing-profile.md create mode 100644 docs/zh-CN/administrator-guide/running-profile.md diff --git a/docs/en/administrator-guide/runing-profile.md b/docs/en/administrator-guide/runing-profile.md new file mode 100644 index 00000000000000..e411bf59659963 --- /dev/null +++ b/docs/en/administrator-guide/runing-profile.md @@ -0,0 +1,147 @@ +--- +{ + "title": "Statistics of query execution", + "language": "en" +} +--- + + + +# Statistics of query execution + +This document focuses on introducing the **RuningProfle** which recorded runtime status of Doris in query execution. Using these statistical information, we can understand the execution of frgment to become a expert of Doris's **debugging and tuning**. + +## Noun Interpretation + +* **FE**: Frontend, frontend node of Doris. Responsible for metadata management and request access. + +* **BE**: Backend, backend node of Doris. Responsible for query execution and data storage. + +* **Fragment**: FE will convert the execution of specific SQL statements into corresponding fragments and distribute them to BE for execution. BE will execute corresponding fragments and gather the result of RunningProfile to send back FE. + +## Basic concepts + +FE splits the query plan into fragments and distributes them to BE for task execution. BE records the statistics of **Running State** when executing fragment. BE print the outputs statistics of fragment execution into the log. FE can also collect these statistics recorded by each fragment and print the results on FE's web page. +## Specific operation + +Turn on the report switch on FE through MySQL command + +``` +mysql> set is_report_success=true; +``` + +After executing the corresponding SQL statement, we can see the report information of the corresponding SQL statement on the FE web page like the picture below. +![image.png](https://upload-images.jianshu.io/upload_images/8552201-f5308be377dc4d90.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) + +The latest **100 statements** executed will be listed here. We can view detailed statistics of RunningProfile. +``` +Query: + Summary: + Query ID: 9664061c57e84404-85ae111b8ba7e83a + Start Time: 2020-05-02 10:34:57 + End Time: 2020-05-02 10:35:08 + Total: 10s323ms + Query Type: Query + Query State: EOF + Doris Version: trunk + User: root + Default Db: default_cluster:test + Sql Statement: select max(Bid_Price) from quotes group by Symbol +``` +Here is a detailed list of ```query ID, execution time, execution statement``` and other summary information. The next step is to print the details of each fragment collected from be. + ``` + Fragment 0: + Instance 9664061c57e84404-85ae111b8ba7e83d (host=TNetworkAddress(hostname:10.144.192.47, port:9060)):(Active: 10s270ms, % non-child: 0.14%) + - MemoryLimit: 2.00 GB + - BytesReceived: 168.08 KB + - PeakUsedReservation: 0.00 + - SendersBlockedTimer: 0ns + - DeserializeRowBatchTimer: 501.975us + - PeakMemoryUsage: 577.04 KB + - RowsProduced: 8.322K (8322) + EXCHANGE_NODE (id=4):(Active: 10s256ms, % non-child: 99.35%) + - ConvertRowBatchTime: 180.171us + - PeakMemoryUsage: 0.00 + - RowsReturned: 8.322K (8322) + - MemoryUsed: 0.00 + - RowsReturnedRate: 811 +``` +The fragment ID is listed here; ``` hostname ``` show the be node executing the fragment; ```active: 10s270ms```show the total execution time of the node; ```non child: 0.14%``` show the execution time of the node self except the execution time of the subchild node. Subsequently, the statistics of the child nodes will be printed in turn. **here you can distinguish the parent-child relationship by intent**. + +## Profile statistic analysis + +There are many statistical information collected at BE. so we list the corresponding meanings of profile are below: + +#### Fragment + - AverageThreadTokens: Number of threads used to execute fragment, excluding the usage of thread pool + - PeakReservation: Peak memory used by buffer pool + - MemoryLimit: Memory limit at query + - PeakMemoryUsage: Peak memory usage + - RowsProduced: Number of rows that process + - BytesReceived: Size of bytes received by network + - DeserializeRowBatchTimer: Time consuming to receive data deserialization + +#### DataStreamSender + - BytesSent: Total bytes data sent + - IgnoreRows: Rows filtered + - OverallThroughput: Total throughput = BytesSent / Time + - SerializeBatchTime: Sending data serialization time + - UncompressedRowBatchSize: Size of rowbatch before sending data compression + +#### SORT_NODE + - InMemorySortTime: In memory sort time + - InitialRunsCreated: Number of initialize sort run + - MergeGetNext: Time cost of MergeSort from multiple sort_run to get the next batch (only show spilled disk) + - MergeGetNextBatch: Time cost MergeSort one sort_run to get the next batch (only show spilled disk) + - SortDataSize: Total sorted data + - TotalMergesPerformed: Number of external sort merges + +#### AGGREGATION_NODE: + - PartitionsCreated: Number of partition split by aggregate + - GetResultsTime: Time to get aggregate results from each partition + - HTResizeTime: Time spent in resizing hashtable + - HTResize: Number of times hashtable resizes + - HashBuckets: Number of buckets in hashtable + - HashBucketsWithDuplicate: Number of buckets with duplicatenode in hashtable + - HashCollisions: Number of hash conflicts generated + - HashDuplicateNodes: Number of duplicate nodes with the same buckets in hashtable + - HashFailedProbe: Number of failed probe operations + - HashFilledBuckets: Number of buckets filled data + - HashProbe: Number of hashtable probe + - HashTravelLength: The number of steps moved when hashtable queries + +#### OLAP_SCAN_NODE: + - BytesRead: Total data + - TotalReadThroughput:Throughput = BytesRead / Time + - TabletCount: Number of scanned tablets + - RowsPushedCondFiltered:Number of filters pushed down + - RawRowsRead: Number of rows read + - RowsReturned: Number of rows returned by the node + - RowsReturnedRate: Rate of rows returned + - PeakMemoryUsage: Peak memory usage of the node + +#### Buffer pool: + - AllocTime: Memory allocation time + - CumulativeAllocationBytes: Cumulative amount of memory allocated + - CumulativeAllocations: Cumulative number of memory allocations + - PeakReservation: Peak of reservation + - PeakUnpinnedBytes: Amount of memory data of unpin + - PeakUsedReservation: Peak usage of reservation + - ReservationLimit: Limit of reservation of bufferpool diff --git a/docs/zh-CN/administrator-guide/running-profile.md b/docs/zh-CN/administrator-guide/running-profile.md new file mode 100644 index 00000000000000..b7ea78988de0b3 --- /dev/null +++ b/docs/zh-CN/administrator-guide/running-profile.md @@ -0,0 +1,151 @@ +--- +{ + "title": "查询执行的统计", + "language": "zh-CN" +} +--- + + + +# 查询执行的统计 + +本文档主要介绍Doris在查询执行的统计结果。利用这些统计的信息,可以更好的帮助我们了解Doris的执行情况,并有针对性的进行相应**Debug与调优工作**。 + + +## 名词解释 + +* FE:Frontend,Doris 的前端节点。负责元数据管理和请求接入。 +* BE:Backend,Doris 的后端节点。负责查询执行和数据存储。 +* Fragment:FE会将具体的SQL语句的执行转化为对应的Fragment并下发到BE进行执行。BE上执行对应Fragment,并将结果汇聚返回给FE。 + +## 基本原理 + +FE将查询计划拆分成为Fragment下发到BE进行任务执行。BE在执行Fragment时记录了**运行状态时的统计值**,并将Fragment执行的统计信息输出到日志之中。 FE也可以通过开关将各个Fragment记录的这些统计值进行搜集,并在FE的Web页面上打印结果。 + +## 操作流程 + +通过Mysql命令,将FE上的Report的开关打开 + +``` +mysql> set is_report_success=true; +``` + +之后执行对应的SQL语句之后,在FE的Web页面就可以看到对应SQL语句执行的Report信息: +![image.png](https://upload-images.jianshu.io/upload_images/8552201-f5308be377dc4d90.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) + +这里会列出最新执行完成的**100条语句**,我们可以通过Profile查看详细的统计信息。 +``` +Query: + Summary: + Query ID: 9664061c57e84404-85ae111b8ba7e83a + Start Time: 2020-05-02 10:34:57 + End Time: 2020-05-02 10:35:08 + Total: 10s323ms + Query Type: Query + Query State: EOF + Doris Version: trunk + User: root + Default Db: default_cluster:test + Sql Statement: select max(Bid_Price) from quotes group by Symbol +``` +这里详尽的列出了**查询的ID,执行时间,执行语句**等等的总结信息。接下来内容是打印从BE收集到的各个Fragement的详细信息。 + ``` + Fragment 0: + Instance 9664061c57e84404-85ae111b8ba7e83d (host=TNetworkAddress(hostname:10.144.192.47, port:9060)):(Active: 10s270ms, % non-child: 0.14%) + - MemoryLimit: 2.00 GB + - BytesReceived: 168.08 KB + - PeakUsedReservation: 0.00 + - SendersBlockedTimer: 0ns + - DeserializeRowBatchTimer: 501.975us + - PeakMemoryUsage: 577.04 KB + - RowsProduced: 8.322K (8322) + EXCHANGE_NODE (id=4):(Active: 10s256ms, % non-child: 99.35%) + - ConvertRowBatchTime: 180.171us + - PeakMemoryUsage: 0.00 + - RowsReturned: 8.322K (8322) + - MemoryUsed: 0.00 + - RowsReturnedRate: 811 +``` +这里列出了Fragment的ID;```hostname```指的是执行Fragment的BE节点;```Active:10s270ms```表示该节点的执行总时间;```non-child: 0.14%```表示除了执行节点自身的执行时间,不包含子节点的执行时间。后续依次打印子节点的统计信息,**这里可以通过缩进区分节点之间的父子关系**。 +## Profile参数解析 +BE端收集的统计信息较多,下面列出了各个参数的对应含义: + +#### Fragment + - AverageThreadTokens: 执行Fragment使用线程数目,不包含线程池的使用情况 + - Buffer Pool PeakReservation: Buffer Pool使用的内存的峰值 + - MemoryLimit: 查询时的内存限制 + - PeakMemoryUsage: 内存使用的峰值 + - RowsProduced: 处理列的行数 + - BytesReceived: 通过网络接收的Bytes大小 + - DeserializeRowBatchTimer: 反序列化的耗时 +#### BlockMgr: + - BlocksCreated: 落盘时创建的Blocks数目 + - BlocksRecycled: 重用的Blocks数目 + - BytesWritten: 总的落盘写数据量 + - MaxBlockSize: 单个Block的大小 + - TotalReadBlockTime: 读Block的总耗时 +#### DataStreamSender + - BytesSent: 发送的总数据量 = 接受者 * 发送数据量 + - IgnoreRows: 过滤的行数 + - OverallThroughput: 总的吞吐量 = BytesSent / 时间 + - SerializeBatchTime: 发送数据序列化消耗的时间 + - UncompressedRowBatchSize: 发送数据压缩前的RowBatch的大小 + + #### SORT_NODE + - InMemorySortTime: 内存之中的排序耗时 + - InitialRunsCreated: 初始化排序的趟数(如果内存排序的话,该数为1) + - SortDataSize: 总的排序数据量 + - MergeGetNext: MergeSort从多个sort_run获取下一个batch的耗时 (仅在落盘时计时) + - MergeGetNextBatch: MergeSort提取下一个sort_run的batch的耗时 (仅在落盘时计时) + - TotalMergesPerformed: 进行外排merge的次数 + +#### AGGREGATION_NODE: + - PartitionsCreated: 聚合查询拆分成Partition的个数 + - GetResultsTime: 从各个partition之中获取聚合结果的时间 + - HTResizeTime: HashTable进行resize消耗的时间 + - HTResize: HashTable进行resize的次数 + - HashBuckets: HashTable中Buckets的个数 + - HashBucketsWithDuplicate: HashTable有DuplicateNode的Buckets的个数 + - HashCollisions: HashTable产生哈希冲突的次数 + - HashDuplicateNodes: HashTable出现Buckets相同DuplicateNode的个数 + - HashFailedProbe: HashTable Probe操作失败的次数 + - HashFilledBuckets: HashTable填入数据的Buckets数目 + - HashProbe: HashTable查询的次数 + - HashTravelLength: HashTable查询时移动的步数 + +#### OLAP_SCAN_NODE: + - RowsProduced: 生成结果的行数 + - BytesRead: scan node 扫描数据的总量 + - TotalReadThroughput:吞吐量 + - TabletCount: tablet 的数目 + - RowsPushedCondFiltered:下推的过滤器 + - RawRowsRead: 实际读取的行数 + - RowsReturned: 该节点返回的行数 + - RowsReturnedRate: 返回行数的速率 + - PeakMemoryUsage 内存使用的峰值 + +#### Buffer pool: + - AllocTime: 内存分配耗时 + - CumulativeAllocationBytes: 累计内存分配的量 + - CumulativeAllocations: 累计的内存分配次数 + - PeakReservation: Reservation的峰值 + - PeakUnpinnedBytes: unpin的内存数据量 + - PeakUsedReservation: Reservation的内存使用量 + - ReservationLimit: BufferPool的Reservation的限制量 diff --git a/fe/src/main/java/org/apache/doris/common/util/DebugUtil.java b/fe/src/main/java/org/apache/doris/common/util/DebugUtil.java index 31febdcf2b2ef1..9348447aea2027 100644 --- a/fe/src/main/java/org/apache/doris/common/util/DebugUtil.java +++ b/fe/src/main/java/org/apache/doris/common/util/DebugUtil.java @@ -75,7 +75,7 @@ public static void printTimeMs(long value, StringBuilder builder) { } if (newValue >= MINUTE) { builder.append(newValue / MINUTE).append("m"); - newValue %= newValue; + newValue %= MINUTE; minute = true; } if (!hour && newValue >= SECOND) { From db1861775b8e3bae0c7b3f09f4b4dcac31e9e36b Mon Sep 17 00:00:00 2001 From: HappenLee Date: Sun, 31 May 2020 21:04:25 +0800 Subject: [PATCH 2/2] revise some mistake in running-profile.md --- docs/zh-CN/administrator-guide/running-profile.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/zh-CN/administrator-guide/running-profile.md b/docs/zh-CN/administrator-guide/running-profile.md index b7ea78988de0b3..a662d30bf7a19a 100644 --- a/docs/zh-CN/administrator-guide/running-profile.md +++ b/docs/zh-CN/administrator-guide/running-profile.md @@ -83,7 +83,7 @@ Query: - MemoryUsed: 0.00 - RowsReturnedRate: 811 ``` -这里列出了Fragment的ID;```hostname```指的是执行Fragment的BE节点;```Active:10s270ms```表示该节点的执行总时间;```non-child: 0.14%```表示除了执行节点自身的执行时间,不包含子节点的执行时间。后续依次打印子节点的统计信息,**这里可以通过缩进区分节点之间的父子关系**。 +这里列出了Fragment的ID;```hostname```指的是执行Fragment的BE节点;```Active:10s270ms```表示该节点的执行总时间;```non-child: 0.14%```表示执行节点自身的执行时间,不包含子节点的执行时间。后续依次打印子节点的统计信息,**这里可以通过缩进区分节点之间的父子关系**。 ## Profile参数解析 BE端收集的统计信息较多,下面列出了各个参数的对应含义: