-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](scan) unify the local and remote scan bytes stats for all scanners #40493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38319 ms |
TPC-DS: Total hot run time: 192956 ms |
ClickBench: Total hot run time: 31.92 s |
cc42838 to
102b368
Compare
64f0607 to
a6d8c99
Compare
|
run buildall |
TPC-H: Total hot run time: 38143 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 193085 ms |
ClickBench: Total hot run time: 31.51 s |
f7ce982 to
57e247a
Compare
53d8028 to
905ef11
Compare
| // first need to update the last statistics in _owned_cache_stats | ||
| // to the file_cache_stats in the input parameter. | ||
| // Then reset _owned_cache_stats | ||
| if (io_ctx->file_cache_stats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential data race ?
| [buffer_ptr = shared_from_this()]() { buffer_ptr->prefetch_buffer(); }); | ||
| } | ||
|
|
||
| void PrefetchBuffer::_update_and_reset_io_context(const IOContext* io_ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this method is called?
905ef11 to
481472b
Compare
…6119) Fix the bug that causes audit loader to fail. Related PR: #45167 #40493 The bug causes audit loader fail as following errors in audit.log. ``` 2024-12-27 11:47:47,001 [stream_load] |Label=audit_log_20241227_114552_856_127_0_0_1_8030|Db=__internal_schema|Table=audit_log|User=|ClientIp=10.0.1.3|Status=Success|Message=OK|Url=http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1|TotalRows=34|LoadedRows=0|FilteredRows=34|UnselectedRows=0|LoadBytes=6887|StartTime=2024-12-27 11:45:52.858|FinishTime=2024-12-27 11:45:52.888 ``` The detail error is: ``` curl http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1 Reason: actual column number in csv file is more than schema column number.actual number: 29, schema column number: 27; line delimiter: [ ], column separator: [ ], result values: ``` Co-authored-by: derenli <derenli@tencent.com>
|
We're closing this PR because it hasn't been updated in a while. |
Previously, only olap table's query has local and remote bytes read statistics.
This PR add these stats for all scanners.
Use
CachedRemoteFileReaderno matterenable_file_cacheis true or falsePreviously, if
enable_file_cacheis true, we useCachedRemoteFileReader.Otherwise, we use raw file reader to read data.
In order to unify the query stats, in this PR, I use
CachedRemoteFileReaderno matter
enable_file_cacheis true or false.When reading data, if cache is disable,
CachedRemoteFileReaderwill usethe raw file reader in it directly.
Add
_update_bytes_and_rows_read()interface inVScannerThis method will be called after each
get_block()method.It will update the scan bytes and rows in query statistics.
So that we can get real time statistics when querying system table
backend_active_tasksAdd
REMOTE_SCAN_BYTESandLOCAL_SCAN_BYTEScolumns inbackend_active_tasksREMOTE_SCAN_BYTESis bytes read from remote fs.LOCAL_SCAN_BYTESis bytes read from local disks.And
SCAN_BYTESis now the sum ofREMOTE_SCAN_BYTESandLOCAL_SCAN_BYTESAdd new columns for audit log table
local_scan_bytesremote_scan_bytesshuffle_bytesshuffle_rowscloud_cluster_name