Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Oct 28, 2025

What problem does this PR solve?

Introduce load job statistic system table:

mysql> show create table information_schema.load_jobs\G
*************************** 1. row ***************************
       Table: load_jobs
Create Table: CREATE TABLE `load_jobs` (
  `JOB_ID` text NULL,
  `LABEL` text NULL,
  `STATE` text NULL,
  `PROGRESS` text NULL,
  `TYPE` text NULL,
  `ETL_INFO` text NULL,
  `TASK_INFO` text NULL,
  `ERROR_MSG` text NULL,
  `CREATE_TIME` text NULL,
  `ETL_START_TIME` text NULL,
  `ETL_FINISH_TIME` text NULL,
  `LOAD_START_TIME` text NULL,
  `LOAD_FINISH_TIME` text NULL,
  `URL` text NULL,
  `JOB_DETAILS` text NULL,
  `TRANSACTION_ID` text NULL,
  `ERROR_TABLETS` text NULL,
  `USER` text NULL,
  `COMMENT` text NULL,
  `FIRST_ERROR_MSG` text NULL
) ENGINE=SCHEMA;
1 row in set (0.01 sec)

User can use the select * from information_schema.load_jobs instead of the show load. The advantage is that SQL can be very flexible in locating jobs.

Example:

mysql> SELECT * FROM          information_schema.load_jobs     WHERE          LABEL = 'test_load_job_label_b5347e94f2614e2c92705d6a6824a380'\G
*************************** 1. row ***************************
          JOB_ID: 1761643165987
           LABEL: test_load_job_label_b5347e94f2614e2c92705d6a6824a380
           STATE: FINISHED
        PROGRESS: Unknown id: 1761643165987
            TYPE: INSERT
        ETL_INFO: \N
       TASK_INFO: cluster:N/A; timeout(s):14400; max_filter_ratio:0.0
       ERROR_MSG: \N
     CREATE_TIME: 2025-10-28 17:23:08
  ETL_START_TIME: 2025-10-28 17:23:08
 ETL_FINISH_TIME: 2025-10-28 17:23:08
 LOAD_START_TIME: 2025-10-28 17:23:08
LOAD_FINISH_TIME: 2025-10-28 17:23:09
             URL: 
     JOB_DETAILS: {"ScannedRows":1,"LoadBytes":25,"FileNumber":0,"FileSize":0,"TaskNumber":1,"Unfinished backends":[],"All backends":[1754377661179]}
  TRANSACTION_ID: 72076
   ERROR_TABLETS: {}
            USER: root
         COMMENT: 
 FIRST_ERROR_MSG:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@sollhui
Copy link
Contributor Author

sollhui commented Oct 28, 2025

run buildall

@Thearas
Copy link
Contributor

Thearas commented Oct 28, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 28, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.64% (1649/2045)
Line Coverage 66.99% (29099/43437)
Region Coverage 67.30% (14415/21418)
Branch Coverage 57.69% (7668/13292)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 31.58% (24/76) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.74% (18055/34231)
Line Coverage 37.98% (163709/431081)
Region Coverage 32.33% (124711/385724)
Branch Coverage 33.71% (54561/161851)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 91.67% (110/120) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.42% (23950/33536)
Line Coverage 57.81% (249087/430874)
Region Coverage 52.89% (206476/390387)
Branch Coverage 54.61% (88806/162610)

@sollhui
Copy link
Contributor Author

sollhui commented Oct 29, 2025

run performance

@doris-robot
Copy link

TPC-DS: Total hot run time: 190987 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e33c84bf1608485539c1a12cc212206909d710bd, data reload: false

query1	1080	414	400	400
query2	6583	1710	1709	1709
query3	6764	224	231	224
query4	26642	23399	23194	23194
query5	4387	636	468	468
query6	324	245	234	234
query7	4656	497	302	302
query8	317	283	293	283
query9	8700	2595	2608	2595
query10	516	365	295	295
query11	15944	15163	14902	14902
query12	198	127	124	124
query13	1693	587	446	446
query14	11687	9302	9265	9265
query15	209	189	172	172
query16	7686	682	524	524
query17	1621	791	643	643
query18	2044	464	371	371
query19	388	221	184	184
query20	144	138	138	138
query21	232	149	137	137
query22	4564	4661	4533	4533
query23	35451	34276	34503	34276
query24	8442	2552	2523	2523
query25	682	523	463	463
query26	1297	304	167	167
query27	3861	547	371	371
query28	4486	2305	2225	2225
query29	842	649	487	487
query30	313	243	207	207
query31	951	865	832	832
query32	87	73	78	73
query33	628	407	385	385
query34	1602	887	528	528
query35	848	933	800	800
query36	978	1070	894	894
query37	133	118	85	85
query38	3630	3576	3500	3500
query39	1474	1474	1501	1474
query40	224	135	125	125
query41	68	65	63	63
query42	123	109	108	108
query43	490	494	481	481
query44	1233	740	747	740
query45	187	186	180	180
query46	893	983	643	643
query47	1780	1810	1746	1746
query48	420	421	328	328
query49	777	506	430	430
query50	660	700	410	410
query51	3816	3975	3833	3833
query52	110	109	98	98
query53	236	274	197	197
query54	600	593	540	540
query55	93	87	85	85
query56	334	331	324	324
query57	1189	1213	1120	1120
query58	297	281	275	275
query59	2615	2619	2446	2446
query60	354	336	335	335
query61	166	162	164	162
query62	824	756	710	710
query63	239	202	202	202
query64	4495	1190	853	853
query65	4064	3910	3974	3910
query66	1109	435	351	351
query67	15597	15289	15072	15072
query68	8463	924	605	605
query69	493	371	285	285
query70	1328	1280	1277	1277
query71	522	342	326	326
query72	5817	4892	4867	4867
query73	723	563	360	360
query74	8987	9163	8943	8943
query75	3977	3366	2833	2833
query76	3761	1177	752	752
query77	835	411	326	326
query78	9663	9825	8947	8947
query79	2168	830	600	600
query80	635	598	508	508
query81	507	270	229	229
query82	453	167	139	139
query83	271	270	251	251
query84	256	113	97	97
query85	959	477	437	437
query86	384	314	318	314
query87	3720	3838	3703	3703
query88	3572	2262	2250	2250
query89	394	340	297	297
query90	2022	220	223	220
query91	180	177	134	134
query92	89	74	65	65
query93	1804	1006	643	643
query94	681	443	348	348
query95	422	328	320	320
query96	493	590	286	286
query97	2943	2960	2880	2880
query98	237	214	215	214
query99	1344	1413	1331	1331
Total cold run time: 283180 ms
Total hot run time: 190987 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.64 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e33c84bf1608485539c1a12cc212206909d710bd, data reload: false

query1	0.05	0.05	0.05
query2	0.12	0.07	0.07
query3	0.31	0.07	0.07
query4	1.61	0.09	0.09
query5	0.26	0.25	0.26
query6	1.18	0.65	0.64
query7	0.02	0.02	0.02
query8	0.07	0.06	0.06
query9	0.65	0.53	0.54
query10	0.58	0.60	0.60
query11	0.25	0.14	0.14
query12	0.26	0.15	0.15
query13	0.64	0.63	0.62
query14	1.03	1.04	1.04
query15	0.95	0.87	0.87
query16	0.39	0.38	0.39
query17	1.02	1.08	1.05
query18	0.24	0.21	0.23
query19	1.96	1.84	1.82
query20	0.02	0.02	0.02
query21	15.39	0.28	0.24
query22	4.97	0.11	0.10
query23	15.37	0.38	0.22
query24	2.88	0.52	0.32
query25	0.10	0.10	0.09
query26	0.19	0.17	0.17
query27	0.10	0.09	0.08
query28	3.75	1.26	1.12
query29	12.59	4.09	3.39
query30	0.35	0.12	0.11
query31	2.84	0.64	0.45
query32	3.25	0.61	0.51
query33	3.08	3.09	3.13
query34	16.68	5.13	4.49
query35	4.54	4.55	4.57
query36	0.65	0.53	0.50
query37	0.22	0.09	0.09
query38	0.21	0.06	0.06
query39	0.06	0.05	0.06
query40	0.20	0.19	0.17
query41	0.12	0.06	0.07
query42	0.08	0.05	0.05
query43	0.06	0.05	0.05
Total cold run time: 99.29 s
Total hot run time: 28.64 s

@liaoxin01 liaoxin01 merged commit 34f472b into apache:master Oct 29, 2025
29 of 31 checks passed
dwdwqfwe pushed a commit to dwdwqfwe/doris that referenced this pull request Oct 31, 2025
### What problem does this PR solve?

Introduce load job statistic system table:
```
mysql> show create table information_schema.load_jobs\G
*************************** 1. row ***************************
       Table: load_jobs
Create Table: CREATE TABLE `load_jobs` (
  `JOB_ID` text NULL,
  `LABEL` text NULL,
  `STATE` text NULL,
  `PROGRESS` text NULL,
  `TYPE` text NULL,
  `ETL_INFO` text NULL,
  `TASK_INFO` text NULL,
  `ERROR_MSG` text NULL,
  `CREATE_TIME` text NULL,
  `ETL_START_TIME` text NULL,
  `ETL_FINISH_TIME` text NULL,
  `LOAD_START_TIME` text NULL,
  `LOAD_FINISH_TIME` text NULL,
  `URL` text NULL,
  `JOB_DETAILS` text NULL,
  `TRANSACTION_ID` text NULL,
  `ERROR_TABLETS` text NULL,
  `USER` text NULL,
  `COMMENT` text NULL,
  `FIRST_ERROR_MSG` text NULL
) ENGINE=SCHEMA;
1 row in set (0.01 sec)
```

User can use the `select * from information_schema.load_jobs` instead of
the `show load`. The advantage is that SQL can be very flexible in
locating jobs.


Example:
```
mysql> SELECT * FROM          information_schema.load_jobs     WHERE          LABEL = 'test_load_job_label_b5347e94f2614e2c92705d6a6824a380'\G
*************************** 1. row ***************************
          JOB_ID: 1761643165987
           LABEL: test_load_job_label_b5347e94f2614e2c92705d6a6824a380
           STATE: FINISHED
        PROGRESS: Unknown id: 1761643165987
            TYPE: INSERT
        ETL_INFO: \N
       TASK_INFO: cluster:N/A; timeout(s):14400; max_filter_ratio:0.0
       ERROR_MSG: \N
     CREATE_TIME: 2025-10-28 17:23:08
  ETL_START_TIME: 2025-10-28 17:23:08
 ETL_FINISH_TIME: 2025-10-28 17:23:08
 LOAD_START_TIME: 2025-10-28 17:23:08
LOAD_FINISH_TIME: 2025-10-28 17:23:09
             URL: 
     JOB_DETAILS: {"ScannedRows":1,"LoadBytes":25,"FileNumber":0,"FileSize":0,"TaskNumber":1,"Unfinished backends":[],"All backends":[1754377661179]}
  TRANSACTION_ID: 72076
   ERROR_TABLETS: {}
            USER: root
         COMMENT: 
 FIRST_ERROR_MSG:
```
sollhui added a commit to sollhui/doris that referenced this pull request Dec 9, 2025
Introduce load job statistic system table:
```
mysql> show create table information_schema.load_jobs\G
*************************** 1. row ***************************
       Table: load_jobs
Create Table: CREATE TABLE `load_jobs` (
  `JOB_ID` text NULL,
  `LABEL` text NULL,
  `STATE` text NULL,
  `PROGRESS` text NULL,
  `TYPE` text NULL,
  `ETL_INFO` text NULL,
  `TASK_INFO` text NULL,
  `ERROR_MSG` text NULL,
  `CREATE_TIME` text NULL,
  `ETL_START_TIME` text NULL,
  `ETL_FINISH_TIME` text NULL,
  `LOAD_START_TIME` text NULL,
  `LOAD_FINISH_TIME` text NULL,
  `URL` text NULL,
  `JOB_DETAILS` text NULL,
  `TRANSACTION_ID` text NULL,
  `ERROR_TABLETS` text NULL,
  `USER` text NULL,
  `COMMENT` text NULL,
  `FIRST_ERROR_MSG` text NULL
) ENGINE=SCHEMA;
1 row in set (0.01 sec)
```

User can use the `select * from information_schema.load_jobs` instead of
the `show load`. The advantage is that SQL can be very flexible in
locating jobs.

Example:
```
mysql> SELECT * FROM          information_schema.load_jobs     WHERE          LABEL = 'test_load_job_label_b5347e94f2614e2c92705d6a6824a380'\G
*************************** 1. row ***************************
          JOB_ID: 1761643165987
           LABEL: test_load_job_label_b5347e94f2614e2c92705d6a6824a380
           STATE: FINISHED
        PROGRESS: Unknown id: 1761643165987
            TYPE: INSERT
        ETL_INFO: \N
       TASK_INFO: cluster:N/A; timeout(s):14400; max_filter_ratio:0.0
       ERROR_MSG: \N
     CREATE_TIME: 2025-10-28 17:23:08
  ETL_START_TIME: 2025-10-28 17:23:08
 ETL_FINISH_TIME: 2025-10-28 17:23:08
 LOAD_START_TIME: 2025-10-28 17:23:08
LOAD_FINISH_TIME: 2025-10-28 17:23:09
             URL:
     JOB_DETAILS: {"ScannedRows":1,"LoadBytes":25,"FileNumber":0,"FileSize":0,"TaskNumber":1,"Unfinished backends":[],"All backends":[1754377661179]}
  TRANSACTION_ID: 72076
   ERROR_TABLETS: {}
            USER: root
         COMMENT:
 FIRST_ERROR_MSG:
```
yiguolei pushed a commit that referenced this pull request Dec 11, 2025
…8850)

pick #57421

Introduce load job statistic system table:
```
mysql> show create table information_schema.load_jobs\G
*************************** 1. row ***************************
       Table: load_jobs
Create Table: CREATE TABLE `load_jobs` (
  `JOB_ID` text NULL,
  `LABEL` text NULL,
  `STATE` text NULL,
  `PROGRESS` text NULL,
  `TYPE` text NULL,
  `ETL_INFO` text NULL,
  `TASK_INFO` text NULL,
  `ERROR_MSG` text NULL,
  `CREATE_TIME` text NULL,
  `ETL_START_TIME` text NULL,
  `ETL_FINISH_TIME` text NULL,
  `LOAD_START_TIME` text NULL,
  `LOAD_FINISH_TIME` text NULL,
  `URL` text NULL,
  `JOB_DETAILS` text NULL,
  `TRANSACTION_ID` text NULL,
  `ERROR_TABLETS` text NULL,
  `USER` text NULL,
  `COMMENT` text NULL,
  `FIRST_ERROR_MSG` text NULL
) ENGINE=SCHEMA;
1 row in set (0.01 sec)
```

User can use the `select * from information_schema.load_jobs` instead of
the `show load`. The advantage is that SQL can be very flexible in
locating jobs.

Example:
```
mysql> SELECT * FROM          information_schema.load_jobs     WHERE          LABEL = 'test_load_job_label_b5347e94f2614e2c92705d6a6824a380'\G
*************************** 1. row ***************************
          JOB_ID: 1761643165987
           LABEL: test_load_job_label_b5347e94f2614e2c92705d6a6824a380
           STATE: FINISHED
        PROGRESS: Unknown id: 1761643165987
            TYPE: INSERT
        ETL_INFO: \N
       TASK_INFO: cluster:N/A; timeout(s):14400; max_filter_ratio:0.0
       ERROR_MSG: \N
     CREATE_TIME: 2025-10-28 17:23:08
  ETL_START_TIME: 2025-10-28 17:23:08
 ETL_FINISH_TIME: 2025-10-28 17:23:08
 LOAD_START_TIME: 2025-10-28 17:23:08
LOAD_FINISH_TIME: 2025-10-28 17:23:09
             URL:
     JOB_DETAILS: {"ScannedRows":1,"LoadBytes":25,"FileNumber":0,"FileSize":0,"TaskNumber":1,"Unfinished backends":[],"All backends":[1754377661179]}
  TRANSACTION_ID: 72076
   ERROR_TABLETS: {}
            USER: root
         COMMENT:
 FIRST_ERROR_MSG:
```

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants