Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Apr 18, 2025

What problem does this PR solve?

The MetaCache is used to cache the external table instance, like hive table.
The type of cache value is Optional<Table>.

When first loading a key, if the key does not exist(table not exists), the cacheloader
will return null, and finally there will be a cache entry <key, EmptyOptional> in cache.

So when the second time to get this key from cache, the cache will return the EmptyOptional
instead of try loading this key again from remote datasource.
But what we expect is to try loading the key from remote datasource if it does not exist in cache.

So we need check the return result of the cache, if return result is null or EmptyOptional,
we should load the key again. Otherwise, the following case may be failed:

  1. select a non-exist hive table in Doris.
  2. create the table in Hive.
  3. select the table in Doris, expect to get table succeed, but still return "table does not exist".

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33735 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 15e69fac1464d85bdb574c6d4d64bc47ef6dc849, data reload: false

------ Round 1 ----------------------------------
q1	25823	5066	5032	5032
q2	2062	270	187	187
q3	10420	1245	665	665
q4	10223	1029	549	549
q5	7523	2265	2370	2265
q6	177	161	134	134
q7	899	726	592	592
q8	9434	1283	1140	1140
q9	6922	5126	5096	5096
q10	6856	2278	1884	1884
q11	491	276	255	255
q12	345	357	220	220
q13	17767	3711	3066	3066
q14	224	223	219	219
q15	526	491	475	475
q16	441	441	400	400
q17	601	840	362	362
q18	7431	7120	7058	7058
q19	1628	950	538	538
q20	310	331	231	231
q21	3849	3315	2372	2372
q22	1089	1073	995	995
Total cold run time: 115041 ms
Total hot run time: 33735 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5160	5050	5039	5039
q2	232	324	230	230
q3	2164	2601	2276	2276
q4	1459	1836	1409	1409
q5	4397	4418	4342	4342
q6	208	173	130	130
q7	2009	1878	1745	1745
q8	2554	2572	2550	2550
q9	7295	7149	6978	6978
q10	3045	3158	2748	2748
q11	558	499	471	471
q12	671	761	613	613
q13	3518	3797	3320	3320
q14	294	310	265	265
q15	537	496	508	496
q16	482	517	459	459
q17	1132	1543	1426	1426
q18	7720	7593	7388	7388
q19	800	814	811	811
q20	2004	2025	1814	1814
q21	5021	4986	4825	4825
q22	1094	1044	1052	1044
Total cold run time: 52354 ms
Total hot run time: 50379 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192003 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 15e69fac1464d85bdb574c6d4d64bc47ef6dc849, data reload: false

query1	1426	1096	1048	1048
query2	6241	1818	1762	1762
query3	11110	4761	4739	4739
query4	25870	23759	23478	23478
query5	4672	627	458	458
query6	319	218	212	212
query7	3988	482	281	281
query8	310	263	236	236
query9	8525	2492	2497	2492
query10	442	315	270	270
query11	15242	15006	14712	14712
query12	167	109	100	100
query13	1550	496	385	385
query14	8841	5988	6087	5988
query15	190	197	166	166
query16	7215	636	492	492
query17	1146	720	569	569
query18	1985	433	341	341
query19	218	197	194	194
query20	129	130	120	120
query21	219	123	106	106
query22	4477	4618	4202	4202
query23	34599	33810	33445	33445
query24	8472	2442	2411	2411
query25	560	523	420	420
query26	1226	277	147	147
query27	2718	497	340	340
query28	4655	2135	2136	2135
query29	724	565	441	441
query30	275	227	202	202
query31	950	866	792	792
query32	73	69	106	69
query33	534	363	314	314
query34	792	857	511	511
query35	800	844	750	750
query36	945	1004	864	864
query37	116	102	82	82
query38	4225	4301	4197	4197
query39	1493	1438	1456	1438
query40	211	119	104	104
query41	54	57	53	53
query42	128	105	108	105
query43	503	490	491	490
query44	1295	817	804	804
query45	193	178	171	171
query46	847	1050	636	636
query47	1848	1857	1806	1806
query48	384	425	306	306
query49	726	501	447	447
query50	658	702	410	410
query51	4269	4290	4238	4238
query52	114	107	96	96
query53	233	259	187	187
query54	574	565	495	495
query55	85	81	84	81
query56	298	289	302	289
query57	1165	1189	1133	1133
query58	277	263	270	263
query59	2724	2844	2652	2652
query60	326	311	370	311
query61	134	130	128	128
query62	789	738	692	692
query63	230	189	193	189
query64	4455	1067	670	670
query65	4413	4383	4428	4383
query66	1206	409	310	310
query67	15760	15517	15135	15135
query68	8480	884	500	500
query69	476	297	265	265
query70	1247	1066	1092	1066
query71	478	316	288	288
query72	5724	4751	4802	4751
query73	713	613	343	343
query74	8855	8988	8860	8860
query75	3942	3164	2689	2689
query76	3708	1195	768	768
query77	784	383	288	288
query78	10007	10129	9338	9338
query79	3215	798	551	551
query80	618	546	445	445
query81	477	251	217	217
query82	439	124	94	94
query83	266	247	233	233
query84	290	105	86	86
query85	797	359	317	317
query86	346	296	310	296
query87	4417	4396	4270	4270
query88	3106	2196	2154	2154
query89	440	312	283	283
query90	1956	205	209	205
query91	141	140	111	111
query92	80	57	59	57
query93	2143	930	574	574
query94	658	416	301	301
query95	379	289	281	281
query96	496	554	271	271
query97	3198	3200	3101	3101
query98	231	206	198	198
query99	1447	1416	1309	1309
Total cold run time: 280892 ms
Total hot run time: 192003 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 15e69fac1464d85bdb574c6d4d64bc47ef6dc849, data reload: false

query1	0.03	0.03	0.04
query2	0.13	0.10	0.11
query3	0.26	0.20	0.20
query4	1.59	0.20	0.19
query5	0.58	0.59	0.58
query6	1.18	0.73	0.71
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.58	0.53	0.52
query10	0.60	0.58	0.56
query11	0.15	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	1.19	1.19	1.22
query15	0.86	0.84	0.84
query16	0.38	0.39	0.42
query17	1.04	1.00	0.99
query18	0.21	0.20	0.20
query19	1.89	1.73	1.76
query20	0.01	0.01	0.02
query21	15.39	0.90	0.56
query22	0.75	1.16	0.65
query23	14.96	1.35	0.62
query24	7.34	1.28	1.16
query25	0.48	0.21	0.08
query26	0.59	0.17	0.15
query27	0.05	0.05	0.05
query28	9.16	0.81	0.43
query29	12.56	4.06	3.41
query30	0.26	0.09	0.08
query31	2.82	0.57	0.39
query32	3.22	0.53	0.46
query33	2.99	3.01	3.04
query34	15.72	5.11	4.50
query35	4.53	4.53	4.53
query36	0.67	0.50	0.47
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.13	0.12
query41	0.09	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.5 s
Total hot run time: 30 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 0949a77 into apache:master Apr 27, 2025
25 of 26 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 27, 2025
### What problem does this PR solve?

The MetaCache is used to cache the external table instance, like hive
table.
The type of cache value is `Optional<Table>`.

When first loading a key, if the key does not exist(table not exists),
the cacheloader
will return `null`, and finally there will be a cache entry `<key,
EmptyOptional>` in cache.

So when the second time to get this key from cache, the cache will
return the `EmptyOptional`
instead of try loading this key again from remote datasource.
But what we expect is to try loading the key from remote datasource if
it does not exist in cache.

So we need check the return result of the cache, if return result is
`null` or `EmptyOptional`,
we should load the key again. Otherwise, the following case may be
failed:

1. select a non-exist hive table in Doris.
2. create the table in Hive.
3. select the table in Doris, expect to get table succeed, but still
return "table does not exist".
github-actions bot pushed a commit that referenced this pull request Apr 27, 2025
### What problem does this PR solve?

The MetaCache is used to cache the external table instance, like hive
table.
The type of cache value is `Optional<Table>`.

When first loading a key, if the key does not exist(table not exists),
the cacheloader
will return `null`, and finally there will be a cache entry `<key,
EmptyOptional>` in cache.

So when the second time to get this key from cache, the cache will
return the `EmptyOptional`
instead of try loading this key again from remote datasource.
But what we expect is to try loading the key from remote datasource if
it does not exist in cache.

So we need check the return result of the cache, if return result is
`null` or `EmptyOptional`,
we should load the key again. Otherwise, the following case may be
failed:

1. select a non-exist hive table in Doris.
2. create the table in Hive.
3. select the table in Doris, expect to get table succeed, but still
return "table does not exist".
dataroaring pushed a commit that referenced this pull request Apr 28, 2025
…ent #50188 (#50450)

Cherry-picked from #50188

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
morningman added a commit that referenced this pull request Apr 28, 2025
### What problem does this PR solve?

The MetaCache is used to cache the external table instance, like hive
table.
The type of cache value is `Optional<Table>`.

When first loading a key, if the key does not exist(table not exists),
the cacheloader
will return `null`, and finally there will be a cache entry `<key,
EmptyOptional>` in cache.

So when the second time to get this key from cache, the cache will
return the `EmptyOptional`
instead of try loading this key again from remote datasource.
But what we expect is to try loading the key from remote datasource if
it does not exist in cache.

So we need check the return result of the cache, if return result is
`null` or `EmptyOptional`,
we should load the key again. Otherwise, the following case may be
failed:

1. select a non-exist hive table in Doris.
2. create the table in Hive.
3. select the table in Doris, expect to get table succeed, but still
return "table does not exist".
yiguolei pushed a commit that referenced this pull request May 6, 2025
…ent #50188 (#50451)

Cherry-picked from #50188

---------

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
Co-authored-by: morningman <yunyou@selectdb.com>
@yiguolei yiguolei mentioned this pull request May 13, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…50188)

### What problem does this PR solve?

The MetaCache is used to cache the external table instance, like hive
table.
The type of cache value is `Optional<Table>`.

When first loading a key, if the key does not exist(table not exists),
the cacheloader
will return `null`, and finally there will be a cache entry `<key,
EmptyOptional>` in cache.

So when the second time to get this key from cache, the cache will
return the `EmptyOptional`
instead of try loading this key again from remote datasource.
But what we expect is to try loading the key from remote datasource if
it does not exist in cache.

So we need check the return result of the cache, if return result is
`null` or `EmptyOptional`,
we should load the key again. Otherwise, the following case may be
failed:

1. select a non-exist hive table in Doris.
2. create the table in Hive.
3. select the table in Doris, expect to get table succeed, but still
return "table does not exist".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants