Skip to content

Conversation

@eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Aug 7, 2024

Background

Currently, importing nested data formats, such as:

{
  "a": [{"nested1": 1}, {"nested2": "123"}]
}

This results in the a column type becoming JSON, which has worse compression and query performance compared to native arrays, mainly due to the inability to leverage low cardinality optimizations and the overhead of parsing JSON during queries.

A common example:

{
  "eventId": 1,
  "firstName": "Name1",
  "lastName": "Surname1",
  "body": {
    "phoneNumbers": [
      {
        "number": "5550219210",
        "type": "GSM",
        "callLimit": 5
      },
      {
        "number": "02124713252",
        "type": "HOME",
        "callLimit": 3
      },
      {
        "number": "05550219211",
        "type": "WORK",
        "callLimit": 2
      }
    ]
  }
}

Design

Consider storing the expanded nested structure so that the schema merge logic can be utilized directly, and querying becomes easier, for example:

{
  "n": [{"a": 1, "b": 2}, {"a": 10, "b": 11, "c": 12}, {"a": 1001, "d": "12"}]
},
{
  "n": [{"x": 1, "y": 2}]
}

Data would be stored as follows, with following storage format

Column Row 0 Row 1
n.a (array) [1, 10, 1001] [null]
n.b (int) [2, 11, null] [null]
n.c (int) [null, 12, null] [null]
n.d (text) [null, null, "12"] [null]
n.x [null, null, null] [1]
n.y [null, null, null] [1]

Data offsets are aligned (equal size).

Compaction

To maintain the relationship between nested nodes, such as n.a, n.b, n.c, and n.d, during compaction, if any of these columns are missing, their offsets are filled using any sibling column's offset.

Queries

SELECT v['n']['a'] FROM tbl;
--- This outputs [1, 10, 1001].
SELECT v['n'] FROM tbl;
--- This outputs [{"a" : 1, "b" : 2}, {"a" : 10, "b" : 11, "c" : 12}, {"a":1001, "d" : "12"}].

During queries, the path's nested information is not perceived because this information is ignored during path evaluation (not stored in the subcolumn tree).

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@eldenmoon eldenmoon marked this pull request as draft August 7, 2024 06:22
@github-actions github-actions bot added the doing label Aug 7, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@eldenmoon eldenmoon force-pushed the var_nested branch 8 times, most recently from 7b9885c to 5820cda Compare August 7, 2024 10:03
@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41920 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5820cdaa651f1713f77845aae7895f13e866b50e, data reload: false

------ Round 1 ----------------------------------
q1	17646	4177	4056	4056
q2	2032	203	200	200
q3	10464	1323	1415	1323
q4	10172	828	925	828
q5	7655	2977	2988	2977
q6	227	141	145	141
q7	1068	622	620	620
q8	9443	1862	2008	1862
q9	8567	6637	6652	6637
q10	8810	3848	3846	3846
q11	435	250	255	250
q12	422	234	233	233
q13	17936	2968	2983	2968
q14	273	247	253	247
q15	520	486	497	486
q16	526	403	396	396
q17	981	939	931	931
q18	8191	7340	7315	7315
q19	1388	1217	1217	1217
q20	590	325	359	325
q21	5300	4778	4783	4778
q22	370	292	284	284
Total cold run time: 113016 ms
Total hot run time: 41920 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4067	4025	4033	4025
q2	333	232	222	222
q3	3007	3047	3131	3047
q4	2010	2073	2011	2011
q5	5609	5508	5459	5459
q6	216	133	142	133
q7	2094	1754	1839	1754
q8	3339	3379	3378	3378
q9	8759	8673	8781	8673
q10	3952	4050	3952	3952
q11	547	478	485	478
q12	768	608	591	591
q13	16467	3091	3121	3091
q14	308	273	279	273
q15	526	485	494	485
q16	459	444	425	425
q17	1785	1759	1733	1733
q18	8261	7747	7895	7747
q19	1772	1759	1771	1759
q20	2063	1851	1838	1838
q21	5788	5494	5407	5407
q22	531	469	481	469
Total cold run time: 72661 ms
Total hot run time: 56950 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169921 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5820cdaa651f1713f77845aae7895f13e866b50e, data reload: false

query1	905	389	369	369
query2	6469	1760	1724	1724
query3	6666	216	226	216
query4	19358	17614	17304	17304
query5	3638	533	530	530
query6	296	174	173	173
query7	4592	292	294	292
query8	250	192	202	192
query9	8491	2337	2336	2336
query10	411	266	261	261
query11	10496	10179	10104	10104
query12	118	93	87	87
query13	1629	388	382	382
query14	9901	6983	6913	6913
query15	205	164	163	163
query16	6939	466	481	466
query17	974	573	554	554
query18	1872	294	298	294
query19	201	148	166	148
query20	93	91	85	85
query21	208	101	103	101
query22	4185	4133	3964	3964
query23	33996	33669	33524	33524
query24	10300	3128	3070	3070
query25	692	420	427	420
query26	1772	171	165	165
query27	2909	297	298	297
query28	7287	2032	2006	2006
query29	1258	466	472	466
query30	242	157	162	157
query31	980	759	787	759
query32	102	62	58	58
query33	683	345	345	345
query34	959	509	507	507
query35	891	786	756	756
query36	1073	889	889	889
query37	289	84	89	84
query38	2916	2766	2720	2720
query39	919	803	831	803
query40	255	117	116	116
query41	47	45	44	44
query42	121	101	99	99
query43	485	445	424	424
query44	1204	748	762	748
query45	212	180	184	180
query46	1100	803	802	802
query47	1812	1722	1741	1722
query48	374	303	302	302
query49	932	428	516	428
query50	895	444	436	436
query51	6756	6664	6673	6664
query52	98	90	91	90
query53	250	182	188	182
query54	619	455	457	455
query55	78	78	75	75
query56	265	249	257	249
query57	1170	1048	1046	1046
query58	266	266	277	266
query59	2712	2462	2269	2269
query60	297	274	286	274
query61	97	96	118	96
query62	889	643	651	643
query63	213	179	182	179
query64	5562	1898	1895	1895
query65	3182	3112	3076	3076
query66	1304	326	334	326
query67	15390	14767	14859	14767
query68	6601	605	596	596
query69	725	385	326	326
query70	1123	1044	1096	1044
query71	531	272	279	272
query72	7945	2667	2471	2471
query73	961	339	335	335
query74	6066	5649	5677	5649
query75	4249	2752	2737	2737
query76	4528	1259	1294	1259
query77	749	319	317	317
query78	9563	8914	8966	8914
query79	2714	533	530	530
query80	2333	509	510	509
query81	577	222	224	222
query82	771	130	136	130
query83	280	173	176	173
query84	271	82	80	80
query85	1411	308	332	308
query86	459	311	305	305
query87	3330	3104	3134	3104
query88	3724	2511	2529	2511
query89	403	291	287	287
query90	1996	188	193	188
query91	123	101	100	100
query92	60	50	50	50
query93	3448	627	636	627
query94	878	258	312	258
query95	374	276	265	265
query96	632	290	290	290
query97	3269	3049	3046	3046
query98	222	206	195	195
query99	1617	1286	1299	1286
Total cold run time: 273683 ms
Total hot run time: 169921 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.18 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5820cdaa651f1713f77845aae7895f13e866b50e, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.09	0.07
query5	0.50	0.48	0.49
query6	1.14	0.72	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.05
query9	0.57	0.55	0.52
query10	0.57	0.55	0.56
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.63	0.61	0.59
query14	0.75	0.81	0.80
query15	0.90	0.88	0.86
query16	0.35	0.35	0.35
query17	0.97	1.02	0.99
query18	0.22	0.21	0.21
query19	1.88	1.67	1.76
query20	0.01	0.02	0.01
query21	15.41	0.76	0.66
query22	4.12	8.52	1.49
query23	18.02	1.43	1.33
query24	2.27	0.22	0.22
query25	0.19	0.08	0.09
query26	0.32	0.21	0.21
query27	0.46	0.23	0.23
query28	13.16	0.99	0.97
query29	12.61	3.26	3.28
query30	0.25	0.06	0.05
query31	2.88	0.40	0.42
query32	3.25	0.49	0.49
query33	2.95	2.97	2.99
query34	15.42	4.23	4.24
query35	4.30	4.30	4.29
query36	0.68	0.49	0.48
query37	0.18	0.17	0.16
query38	0.16	0.15	0.14
query39	0.05	0.03	0.04
query40	0.16	0.12	0.13
query41	0.09	0.04	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.92 s
Total hot run time: 30.18 s

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39346 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d0ab4c08c630ab5bbfd61e671de08d7ef3be4b88, data reload: false

------ Round 1 ----------------------------------
q1	17631	4436	4281	4281
q2	2020	172	180	172
q3	10516	1194	1093	1093
q4	10138	703	703	703
q5	7516	2557	2523	2523
q6	223	143	142	142
q7	977	605	593	593
q8	9218	1919	1949	1919
q9	8983	6556	6548	6548
q10	7035	2205	2168	2168
q11	477	239	254	239
q12	450	222	219	219
q13	17873	2983	2986	2983
q14	281	241	238	238
q15	524	495	479	479
q16	502	385	383	383
q17	971	657	675	657
q18	8241	7418	7507	7418
q19	3108	975	1019	975
q20	688	323	325	323
q21	5899	4303	4531	4303
q22	1072	987	1008	987
Total cold run time: 114343 ms
Total hot run time: 39346 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4389	4251	4282	4251
q2	400	280	277	277
q3	2817	2621	2556	2556
q4	1963	1701	1707	1701
q5	5535	5621	5476	5476
q6	225	131	143	131
q7	2126	1704	1712	1704
q8	3294	3444	3421	3421
q9	8798	8786	8871	8786
q10	3574	3301	3266	3266
q11	614	512	506	506
q12	837	641	609	609
q13	15896	3153	3223	3153
q14	323	289	284	284
q15	534	501	498	498
q16	488	437	454	437
q17	1847	1575	1503	1503
q18	8191	7988	7953	7953
q19	2437	1654	1656	1654
q20	2185	1921	1904	1904
q21	10143	5325	5225	5225
q22	1111	1000	1011	1000
Total cold run time: 77727 ms
Total hot run time: 56295 ms

@eldenmoon
Copy link
Member Author

run buildall

2 similar comments
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

void ColumnObject::finalize(bool ignore_sparse) {
void ColumnObject::finalize(FinalizeMode mode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'finalize' can be made const [readability-make-member-function-const]

be/src/vec/columns/column_object.h:370:

-     void finalize(FinalizeMode mode);
+     void finalize(FinalizeMode mode) const;
Suggested change
void ColumnObject::finalize(FinalizeMode mode) {
void ColumnObject::finalize(FinalizeMode mode) const {

// and modified by Doris

#pragma once
#include <butil/compiler_specific.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'butil/compiler_specific.h' file not found [clang-diagnostic-error]

#include <butil/compiler_specific.h>
         ^

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39935 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d46ea49a9a569377a43ac0392dcc058c18d9ec18, data reload: false

------ Round 1 ----------------------------------
q1	18219	4635	4405	4405
q2	2463	185	175	175
q3	11127	1189	1212	1189
q4	10510	792	760	760
q5	7847	2637	2575	2575
q6	228	142	151	142
q7	985	604	621	604
q8	9365	1986	1979	1979
q9	9071	6566	6524	6524
q10	7053	2198	2128	2128
q11	474	239	238	238
q12	394	223	222	222
q13	17768	2963	2983	2963
q14	272	234	230	230
q15	529	483	486	483
q16	533	385	383	383
q17	994	610	782	610
q18	8130	7512	7420	7420
q19	3690	1051	1022	1022
q20	651	321	333	321
q21	5291	4643	4532	4532
q22	1128	1032	1030	1030
Total cold run time: 116722 ms
Total hot run time: 39935 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4402	4272	4234	4234
q2	400	274	265	265
q3	2824	2613	2626	2613
q4	1898	1617	1642	1617
q5	5263	5243	5279	5243
q6	223	131	129	129
q7	2033	1676	1622	1622
q8	3140	3326	3325	3325
q9	8446	8333	8351	8333
q10	3386	3145	3163	3145
q11	570	479	474	474
q12	780	577	588	577
q13	17586	2990	2982	2982
q14	292	272	277	272
q15	516	477	478	477
q16	479	406	421	406
q17	1804	1518	1503	1503
q18	7700	7543	7391	7391
q19	2518	1518	1590	1518
q20	2034	1773	1788	1773
q21	5161	5110	5129	5110
q22	1104	1030	1000	1000
Total cold run time: 72559 ms
Total hot run time: 54009 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 202325 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d46ea49a9a569377a43ac0392dcc058c18d9ec18, data reload: false

query1	949	415	404	404
query2	6441	1954	1899	1899
query3	6651	204	212	204
query4	34400	23190	23157	23157
query5	4251	509	492	492
query6	298	208	181	181
query7	4577	293	294	293
query8	238	227	194	194
query9	8367	2453	2409	2409
query10	931	870	927	870
query11	17801	15024	15012	15012
query12	145	99	91	91
query13	1627	372	369	369
query14	10867	7955	7898	7898
query15	403	321	332	321
query16	7832	482	463	463
query17	1731	578	541	541
query18	1973	373	352	352
query19	242	207	224	207
query20	114	103	101	101
query21	210	96	99	96
query22	4284	4245	3948	3948
query23	33632	33085	33115	33085
query24	11216	3028	2952	2952
query25	634	365	375	365
query26	1595	148	149	148
query27	2872	275	279	275
query28	7658	2029	2017	2017
query29	926	396	405	396
query30	301	149	144	144
query31	951	754	764	754
query32	100	60	60	60
query33	749	279	287	279
query34	975	470	471	470
query35	974	843	820	820
query36	1093	926	931	926
query37	158	79	78	78
query38	4290	4160	4235	4160
query39	1448	1364	1355	1355
query40	281	112	113	112
query41	45	44	42	42
query42	112	99	93	93
query43	509	457	496	457
query44	1207	742	727	727
query45	387	375	355	355
query46	1121	749	810	749
query47	1861	1741	1749	1741
query48	375	294	309	294
query49	1164	414	419	414
query50	799	404	400	400
query51	6809	6739	6733	6733
query52	107	91	92	91
query53	254	177	177	177
query54	933	445	450	445
query55	77	76	75	75
query56	278	271	249	249
query57	1134	1045	1027	1027
query58	237	220	231	220
query59	2922	2827	2842	2827
query60	287	259	260	259
query61	98	94	91	91
query62	841	616	632	616
query63	209	183	178	178
query64	10593	2420	1967	1967
query65	3294	3142	3116	3116
query66	1336	337	334	334
query67	15349	14690	14626	14626
query68	5334	552	560	552
query69	413	439	399	399
query70	1127	1083	1093	1083
query71	466	270	274	270
query72	18722	16311	16415	16311
query73	784	324	324	324
query74	8994	8657	8762	8657
query75	3745	2672	2659	2659
query76	3554	945	904	904
query77	667	321	323	321
query78	9663	9059	8954	8954
query79	1362	522	533	522
query80	2179	509	538	509
query81	605	233	224	224
query82	481	137	134	134
query83	295	159	156	156
query84	266	87	80	80
query85	1419	302	305	302
query86	458	287	311	287
query87	4762	4518	4560	4518
query88	4129	2417	2459	2417
query89	387	290	283	283
query90	1859	202	195	195
query91	143	120	121	120
query92	80	48	50	48
query93	2104	541	538	538
query94	965	290	302	290
query95	354	311	258	258
query96	590	273	275	273
query97	3225	3126	3076	3076
query98	222	201	202	201
query99	1437	1262	1259	1259
Total cold run time: 317314 ms
Total hot run time: 202325 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.39 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d46ea49a9a569377a43ac0392dcc058c18d9ec18, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.05	0.04
query3	0.23	0.05	0.05
query4	1.67	0.08	0.08
query5	0.51	0.47	0.50
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.48	0.49
query10	0.54	0.56	0.55
query11	0.15	0.12	0.11
query12	0.15	0.13	0.12
query13	0.58	0.60	0.58
query14	0.75	0.78	0.80
query15	0.87	0.82	0.81
query16	0.37	0.36	0.35
query17	0.96	0.96	0.99
query18	0.23	0.22	0.22
query19	1.85	1.74	1.79
query20	0.01	0.01	0.02
query21	15.39	0.73	0.65
query22	3.92	8.22	1.70
query23	18.24	1.36	1.24
query24	2.11	0.24	0.22
query25	0.15	0.08	0.08
query26	0.31	0.22	0.22
query27	0.46	0.22	0.22
query28	13.18	1.02	0.98
query29	12.60	3.32	3.30
query30	0.23	0.05	0.05
query31	2.89	0.40	0.39
query32	3.28	0.51	0.47
query33	2.94	2.93	2.95
query34	16.78	4.34	4.32
query35	4.41	4.40	4.42
query36	0.65	0.49	0.47
query37	0.19	0.16	0.15
query38	0.15	0.14	0.15
query39	0.05	0.04	0.04
query40	0.15	0.13	0.12
query41	0.10	0.05	0.04
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.05 s
Total hot run time: 30.39 s

@doris-robot
Copy link

ClickBench: Total hot run time: 30.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 57824499d573331cc905d10ed15688fb5397f4d9, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.08
query5	0.50	0.51	0.49
query6	1.14	0.74	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.49	0.48
query10	0.55	0.53	0.55
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.62	0.59	0.59
query14	0.75	0.79	0.79
query15	0.85	0.82	0.83
query16	0.38	0.37	0.38
query17	1.06	0.99	1.04
query18	0.21	0.20	0.20
query19	1.78	1.75	1.90
query20	0.02	0.01	0.01
query21	15.39	0.67	0.67
query22	4.15	6.92	1.63
query23	18.28	1.47	1.31
query24	2.12	0.23	0.22
query25	0.16	0.09	0.08
query26	0.27	0.18	0.18
query27	0.08	0.07	0.08
query28	13.23	1.01	1.02
query29	12.60	3.38	3.38
query30	0.24	0.06	0.06
query31	2.86	0.40	0.40
query32	3.26	0.47	0.47
query33	2.98	3.00	2.97
query34	17.21	4.36	4.38
query35	4.50	4.46	4.41
query36	0.65	0.48	0.47
query37	0.19	0.17	0.15
query38	0.16	0.14	0.16
query39	0.04	0.04	0.03
query40	0.16	0.13	0.12
query41	0.10	0.05	0.04
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.55 s
Total hot run time: 30.5 s

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38272 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0ec54cfa0d02006b55179f15b8ea6fced94c2627, data reload: false

------ Round 1 ----------------------------------
q1	18305	4499	4339	4339
q2	2034	186	177	177
q3	11694	1000	1201	1000
q4	10223	837	728	728
q5	7754	2869	2893	2869
q6	231	144	141	141
q7	984	627	607	607
q8	9348	2105	2104	2104
q9	7009	6560	6618	6560
q10	7003	2222	2241	2222
q11	445	251	243	243
q12	407	227	223	223
q13	17860	3060	3030	3030
q14	290	239	239	239
q15	521	486	506	486
q16	493	399	399	399
q17	1012	730	735	730
q18	7497	6802	6927	6802
q19	1388	1090	1093	1090
q20	688	325	333	325
q21	3933	2941	2964	2941
q22	1122	1060	1017	1017
Total cold run time: 110241 ms
Total hot run time: 38272 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4367	4284	4274	4274
q2	392	262	269	262
q3	2899	2680	2658	2658
q4	1929	1679	1632	1632
q5	5633	5714	5734	5714
q6	234	140	144	140
q7	2248	1831	1828	1828
q8	3350	3472	3486	3472
q9	8949	8890	8771	8771
q10	3617	3390	3373	3373
q11	611	520	509	509
q12	875	669	659	659
q13	14513	3141	3217	3141
q14	333	333	300	300
q15	520	497	500	497
q16	500	468	461	461
q17	1858	1551	1548	1548
q18	8171	7840	7827	7827
q19	1751	1592	1648	1592
q20	2164	1906	1900	1900
q21	5778	5465	5534	5465
q22	1163	1060	1039	1039
Total cold run time: 71855 ms
Total hot run time: 57062 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192313 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0ec54cfa0d02006b55179f15b8ea6fced94c2627, data reload: false

query1	1255	894	847	847
query2	6338	1977	1904	1904
query3	10760	4073	4041	4041
query4	59871	25431	23305	23305
query5	5653	518	516	516
query6	442	164	179	164
query7	5941	311	300	300
query8	307	210	213	210
query9	9157	2506	2483	2483
query10	504	278	265	265
query11	17702	14958	15373	14958
query12	165	114	106	106
query13	1533	402	417	402
query14	10301	7195	7558	7195
query15	237	171	176	171
query16	7542	484	492	484
query17	1186	629	591	591
query18	2032	320	310	310
query19	300	160	178	160
query20	122	116	114	114
query21	224	108	98	98
query22	4468	4284	4440	4284
query23	34466	33766	33327	33327
query24	5926	2898	2882	2882
query25	509	379	394	379
query26	692	156	159	156
query27	1774	296	284	284
query28	3781	2076	2031	2031
query29	688	412	412	412
query30	240	150	147	147
query31	925	754	756	754
query32	78	53	57	53
query33	434	286	286	286
query34	853	485	485	485
query35	837	714	730	714
query36	1082	910	972	910
query37	137	87	88	87
query38	3980	3897	3962	3897
query39	1448	1370	1397	1370
query40	196	114	120	114
query41	47	47	45	45
query42	123	106	102	102
query43	546	488	479	479
query44	1146	766	819	766
query45	197	166	164	164
query46	1090	763	765	763
query47	1874	1829	1785	1785
query48	386	301	312	301
query49	761	422	454	422
query50	814	431	425	425
query51	7254	7138	6960	6960
query52	103	91	94	91
query53	253	182	186	182
query54	565	456	450	450
query55	78	79	77	77
query56	284	260	262	260
query57	1180	1048	1083	1048
query58	228	228	239	228
query59	2847	2810	2744	2744
query60	294	282	281	281
query61	108	139	97	97
query62	760	670	665	665
query63	222	189	190	189
query64	3351	1755	1765	1755
query65	3223	3155	3145	3145
query66	689	335	342	335
query67	15376	15124	15286	15124
query68	2886	586	594	586
query69	443	289	287	287
query70	1163	1131	1138	1131
query71	359	284	281	281
query72	2539	2045	2091	2045
query73	738	332	333	332
query74	9209	8740	8807	8740
query75	3371	2715	2754	2715
query76	1498	1048	973	973
query77	551	334	324	324
query78	9765	9210	9515	9210
query79	1012	565	543	543
query80	685	523	515	515
query81	451	226	227	226
query82	295	138	134	134
query83	173	151	146	146
query84	256	76	78	76
query85	684	286	284	284
query86	308	306	306	306
query87	4354	4398	4354	4354
query88	3032	2427	2439	2427
query89	380	291	293	291
query90	1921	205	207	205
query91	140	111	114	111
query92	65	53	56	53
query93	1049	622	557	557
query94	702	306	318	306
query95	316	265	266	265
query96	591	278	276	276
query97	3237	3066	3092	3066
query98	214	203	210	203
query99	1547	1280	1266	1266
Total cold run time: 301819 ms
Total hot run time: 192313 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.2 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0ec54cfa0d02006b55179f15b8ea6fced94c2627, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.04	0.03
query3	0.22	0.05	0.06
query4	1.67	0.09	0.08
query5	0.52	0.52	0.52
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.05	0.05	0.04
query9	0.54	0.49	0.50
query10	0.55	0.55	0.54
query11	0.14	0.12	0.12
query12	0.15	0.12	0.12
query13	0.62	0.59	0.59
query14	0.77	0.80	0.77
query15	0.84	0.81	0.82
query16	0.38	0.37	0.38
query17	0.99	0.99	1.02
query18	0.21	0.19	0.20
query19	1.82	1.81	1.74
query20	0.01	0.02	0.01
query21	15.39	0.67	0.66
query22	4.06	6.80	2.34
query23	18.25	1.40	1.29
query24	2.09	0.22	0.21
query25	0.15	0.07	0.08
query26	0.28	0.18	0.17
query27	0.08	0.08	0.08
query28	13.22	1.02	1.00
query29	12.56	3.31	3.30
query30	0.25	0.06	0.06
query31	2.88	0.40	0.39
query32	3.24	0.48	0.47
query33	2.97	3.00	2.97
query34	17.06	4.45	4.41
query35	4.46	4.47	4.47
query36	0.65	0.49	0.49
query37	0.18	0.15	0.16
query38	0.15	0.14	0.15
query39	0.05	0.03	0.04
query40	0.16	0.13	0.13
query41	0.09	0.06	0.05
query42	0.07	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.11 s
Total hot run time: 31.2 s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 27, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTO and THRIFR LGTM

@eldenmoon eldenmoon merged commit 53dcf49 into apache:master Aug 27, 2024
@eldenmoon eldenmoon deleted the var_nested branch August 27, 2024 08:24
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Sep 3, 2024
…pache#39022)

Currently, importing nested data formats, such as:

``` json
{
  "a": [{"nested1": 1}, {"nested2": "123"}]
}
```
This results in the a column type becoming JSON, which has worse
compression and query performance compared to native arrays, mainly due
to the inability to leverage low cardinality optimizations and the
overhead of parsing JSON during queries.

A common example:

``` json
{
  "eventId": 1,
  "firstName": "Name1",
  "lastName": "Surname1",
  "body": {
    "phoneNumbers": [
      {
        "number": "5550219210",
        "type": "GSM",
        "callLimit": 5
      },
      {
        "number": "02124713252",
        "type": "HOME",
        "callLimit": 3
      },
      {
        "number": "05550219211",
        "type": "WORK",
        "callLimit": 2
      }
    ]
  }
}
```

Consider storing the expanded nested structure so that the schema merge
logic can be utilized directly, and querying becomes easier, for
example:
``` json
{
  "n": [{"a": 1, "b": 2}, {"a": 10, "b": 11, "c": 12}, {"a": 1001, "d": "12"}]
},
{
  "n": [{"x": 1, "y": 2}]
}
```
Data would be stored as follows, with following storage format
Column | Row 0 | Row 1
-- | -- | --
n.a (array<int>) | [1, 10, 1001] | [null]
n.b (int) | [2, 11, null] | [null]
n.c (int) | [null, 12, null] | [null]
n.d (text) | [null, null, "12"] | [null]
n.x | [null, null, null] | [1]
n.y | [null, null, null] | [1]

Data offsets are aligned (equal size).

To maintain the relationship between nested nodes, such as n.a, n.b,
n.c, and n.d, during compaction, if any of these columns are missing,
their offsets are filled using any sibling column's offset.

```sql
SELECT v['n']['a'] FROM tbl;
--- This outputs [1, 10, 1001].
```

```  sql
SELECT v['n'] FROM tbl;
--- This outputs [{"a" : 1, "b" : 2}, {"a" : 10, "b" : 11, "c" : 12}, {"a":1001, "d" : "12"}].
```

During queries, the path's nested information is not perceived because
this information is ignored during path evaluation (not stored in the
subcolumn tree).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants