Skip to content

Conversation

@CalvinKirs
Copy link
Member

@CalvinKirs CalvinKirs commented Oct 11, 2025

What's Changed

  1. Refined Azure Blob Configuration Naming

    • Adopted Azure-native property names for better consistency with Azure SDK conventions:
      • account_name → Azure Storage Account Name
      • account_key → Azure Storage Account Key
    • Ensures compatibility, clarity, and alignment with Azure Blob attribute definitions.
  2. Full Feature Support for Azure Blob Storage

    • Added comprehensive integration for the following modules:
      • TVF (Table-Valued Function)
      • LOAD (Data Loading)
      • CATALOG (Metadata Querying)
    • Azure Blob can now be used as both a data source and destination across all modules.
  3. Protocol Compatibility

    • Added full support for multiple Azure storage access protocols:
      • abfs://
      • abfss://
      • wasb://
      • wasbs://
    • Automatically recognizes protocol prefixes and maps them to the correct Azure storage client implementation.

todo

Unified Connectivity Testing Framework

  • Refactored the connectivity test logic into a unified implementation shared across all object storage backends (S3, OSS, COS, OBS, BOS, and Azure).
  • Improves code reusability and simplifies the process of adding new storage providers.

@Thearas
Copy link
Contributor

Thearas commented Oct 11, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

1 similar comment
@CalvinKirs
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 75.00% (63/84) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-DS: Total hot run time: 190544 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f498eaa3adfae99ba962e769c5f52ad8957f4dd5, data reload: false

query1	1105	428	417	417
query2	6567	1716	1717	1716
query3	6757	228	222	222
query4	26203	23516	23400	23400
query5	5829	666	526	526
query6	345	254	213	213
query7	4659	510	291	291
query8	305	270	242	242
query9	8707	2571	2584	2571
query10	552	362	303	303
query11	15757	14990	14814	14814
query12	171	120	117	117
query13	1673	559	440	440
query14	12194	9169	9187	9169
query15	218	201	175	175
query16	7678	661	520	520
query17	1586	788	619	619
query18	2169	463	369	369
query19	270	215	196	196
query20	174	139	140	139
query21	221	142	119	119
query22	4874	4759	4600	4600
query23	34932	34113	33745	33745
query24	8429	2635	2496	2496
query25	614	559	504	504
query26	1249	308	166	166
query27	2715	522	400	400
query28	4384	2237	2224	2224
query29	969	662	562	562
query30	363	248	216	216
query31	938	886	766	766
query32	86	76	87	76
query33	601	415	361	361
query34	833	900	536	536
query35	835	876	793	793
query36	995	1034	979	979
query37	141	121	96	96
query38	3661	3696	3676	3676
query39	1550	1503	1483	1483
query40	294	128	124	124
query41	62	60	60	60
query42	126	113	126	113
query43	498	515	456	456
query44	1322	853	827	827
query45	175	181	176	176
query46	839	982	630	630
query47	1755	1850	1730	1730
query48	386	423	329	329
query49	767	480	421	421
query50	641	679	405	405
query51	3949	3925	3800	3800
query52	109	106	104	104
query53	231	265	198	198
query54	597	582	543	543
query55	89	80	91	80
query56	343	317	304	304
query57	1185	1223	1136	1136
query58	285	295	276	276
query59	2548	2587	2494	2494
query60	341	342	327	327
query61	147	148	157	148
query62	793	738	690	690
query63	226	196	195	195
query64	4373	1154	839	839
query65	4093	3941	3980	3941
query66	1038	430	327	327
query67	15816	15302	15128	15128
query68	9612	950	597	597
query69	535	328	282	282
query70	1386	1196	1303	1196
query71	530	338	321	321
query72	5913	4924	4866	4866
query73	701	577	371	371
query74	9240	8807	8944	8807
query75	4590	3334	2878	2878
query76	4011	1155	722	722
query77	1017	411	322	322
query78	9630	9699	8976	8976
query79	5340	830	578	578
query80	707	552	513	513
query81	494	257	237	237
query82	620	162	130	130
query83	296	269	257	257
query84	300	116	99	99
query85	888	469	498	469
query86	337	314	298	298
query87	3759	3756	3645	3645
query88	2826	2248	2220	2220
query89	444	329	296	296
query90	2049	216	213	213
query91	165	160	132	132
query92	90	67	70	67
query93	3498	944	638	638
query94	681	449	345	345
query95	397	304	322	304
query96	488	579	284	284
query97	2945	2970	2883	2883
query98	237	214	217	214
query99	1409	1414	1326	1326
Total cold run time: 289472 ms
Total hot run time: 190544 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f498eaa3adfae99ba962e769c5f52ad8957f4dd5, data reload: false

query1	0.06	0.05	0.04
query2	0.09	0.06	0.06
query3	0.25	0.09	0.08
query4	1.61	0.12	0.11
query5	0.29	0.26	0.24
query6	1.19	0.65	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.66	0.53	0.53
query10	0.58	0.57	0.58
query11	0.16	0.12	0.12
query12	0.16	0.12	0.12
query13	0.63	0.61	0.62
query14	1.05	1.03	1.05
query15	0.87	0.85	0.88
query16	0.41	0.40	0.41
query17	1.04	1.02	1.01
query18	0.21	0.20	0.20
query19	2.01	1.92	1.89
query20	0.01	0.01	0.02
query21	15.42	0.96	0.60
query22	0.76	1.07	0.76
query23	15.02	1.39	0.62
query24	7.51	1.42	0.83
query25	0.48	0.19	0.11
query26	0.59	0.17	0.13
query27	0.07	0.05	0.06
query28	9.90	1.36	0.93
query29	12.59	3.97	3.26
query30	0.28	0.14	0.11
query31	2.83	0.59	0.40
query32	3.25	0.55	0.48
query33	3.11	3.08	3.12
query34	16.12	5.47	4.87
query35	4.94	4.97	4.99
query36	0.70	0.52	0.50
query37	0.11	0.07	0.08
query38	0.07	0.04	0.04
query39	0.06	0.03	0.03
query40	0.18	0.15	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 105.54 s
Total hot run time: 30.67 s

@CalvinKirs
Copy link
Member Author

run buildall

[feat](storage)Support Azure Blob Storage
@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 31.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3318a0883aa9921421a5a37b7cfa22cb7418a5dd, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.06
query3	0.25	0.09	0.08
query4	1.61	0.11	0.12
query5	0.27	0.26	0.25
query6	1.18	0.65	0.66
query7	0.03	0.02	0.02
query8	0.05	0.05	0.04
query9	0.63	0.53	0.52
query10	0.61	0.57	0.57
query11	0.16	0.12	0.11
query12	0.16	0.12	0.12
query13	0.64	0.61	0.62
query14	1.04	1.05	1.04
query15	0.87	0.84	0.86
query16	0.40	0.40	0.39
query17	1.05	1.06	1.06
query18	0.21	0.21	0.20
query19	1.98	1.82	1.82
query20	0.02	0.01	0.01
query21	15.42	0.94	0.57
query22	0.78	1.24	0.69
query23	14.85	1.41	0.62
query24	6.44	1.77	1.43
query25	0.49	0.32	0.08
query26	0.62	0.15	0.14
query27	0.06	0.06	0.06
query28	10.51	1.37	0.96
query29	12.56	3.94	3.29
query30	0.28	0.13	0.11
query31	2.83	0.59	0.39
query32	3.27	0.55	0.49
query33	3.09	3.19	3.15
query34	16.16	5.46	4.86
query35	4.90	4.95	4.95
query36	0.72	0.51	0.49
query37	0.10	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.17	0.16	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.82 s
Total hot run time: 31.19 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.54% (17747/33781)
Line Coverage 37.72% (161176/427340)
Region Coverage 32.19% (123090/382344)
Branch Coverage 33.59% (53984/160715)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.16% (23554/33098)
Line Coverage 57.58% (245841/426926)
Region Coverage 52.67% (203915/387169)
Branch Coverage 54.50% (88022/161522)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 9.20% (8/87) 🎉
Increment coverage report
Complete coverage report

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 30.06 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bb1a5a38e4d16cf53208353cbecc51930df91af1, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.06	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.28	0.26	0.26
query6	1.19	0.63	0.65
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.63	0.52	0.51
query10	0.58	0.58	0.58
query11	0.16	0.11	0.11
query12	0.15	0.14	0.12
query13	0.63	0.63	0.62
query14	1.02	1.00	1.02
query15	0.87	0.88	0.85
query16	0.40	0.41	0.39
query17	1.03	1.07	1.04
query18	0.22	0.20	0.20
query19	1.94	1.85	1.80
query20	0.01	0.02	0.01
query21	15.44	0.94	0.56
query22	0.77	1.18	0.67
query23	14.95	1.38	0.63
query24	6.74	1.56	0.48
query25	0.49	0.26	0.07
query26	0.50	0.17	0.14
query27	0.08	0.06	0.06
query28	9.82	1.38	0.92
query29	12.66	3.94	3.33
query30	0.27	0.14	0.12
query31	2.83	0.59	0.38
query32	3.23	0.56	0.50
query33	3.03	3.19	3.12
query34	16.19	5.49	4.81
query35	4.95	4.94	4.93
query36	0.68	0.53	0.51
query37	0.10	0.07	0.08
query38	0.07	0.05	0.04
query39	0.04	0.03	0.04
query40	0.17	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.02	0.03
query43	0.04	0.03	0.04
Total cold run time: 104.36 s
Total hot run time: 30.06 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.57% (17787/33834)
Line Coverage 37.77% (161646/428015)
Region Coverage 32.22% (123286/382682)
Branch Coverage 33.63% (54144/160995)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.17% (23605/33166)
Line Coverage 57.60% (246410/427758)
Region Coverage 52.76% (204519/387629)
Branch Coverage 54.56% (88328/161884)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 8.74% (16/183) 🎉
Increment coverage report
Complete coverage report

s3Props.put("AWS_SECRET_KEY", secretKey);
s3Props.put("AWS_NEED_OVERRIDE_ENDPOINT", "true");
s3Props.put("provider", "azure");
s3Props.put("PROVIDER", "AZURE");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need a uppercase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops...removed

@Override
public String getStorageName() {
return "Azure";
return "AZURE";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why uppercase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we’ve updated the logic to keep Storage names fully uppercase for consistency, since both HDFS and S3 follow that convention. All callers already perform case-insensitive matching, so this change ensures uniform style without affecting compatibility.

@ConnectorProperty(names = {"azure.account_name", "azure.access_key", "s3.access_key",
"AWS_ACCESS_KEY", "ACCESS_KEY", "access_key"},
description = "The access key of S3.")
protected String accessKey = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about change this to "accountName"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"AWS_SECRET_KEY", "secret_key"},
sensitive = true,
description = "The secret key of S3.")
protected String secretKey = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito


boolean isPrefix = false;
while (blobPath.normalize().toString().startsWith(listPrefix)) {
while (null != blobPath && blobPath.normalize().toString().startsWith(listPrefix)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding null != blobPath, is this a bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the name to objCommit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CalvinKirs
Copy link
Member Author

run buildall

*/
public static String encodeToBase64(int id) {
ByteBuffer buf = ByteBuffer.allocate(4)
.order(ByteOrder.BIG_ENDIAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LE align as the BE side

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 21, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run performance

@doris-robot
Copy link

ClickBench: Total hot run time: 29.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 81d1e0a1106b00532e1050d0326a37852cebf466, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.09	0.08
query4	1.62	0.13	0.11
query5	0.28	0.25	0.26
query6	1.17	0.67	0.68
query7	0.04	0.03	0.03
query8	0.06	0.04	0.04
query9	0.63	0.55	0.52
query10	0.61	0.59	0.58
query11	0.17	0.12	0.12
query12	0.16	0.13	0.12
query13	0.65	0.64	0.62
query14	1.04	1.03	1.03
query15	0.90	0.86	0.87
query16	0.42	0.41	0.39
query17	1.09	1.07	1.07
query18	0.22	0.20	0.21
query19	2.01	1.91	1.90
query20	0.02	0.02	0.01
query21	15.55	0.22	0.13
query22	4.95	0.08	0.04
query23	15.67	0.26	0.12
query24	3.28	0.56	0.86
query25	0.10	0.07	0.06
query26	0.15	0.13	0.14
query27	0.07	0.06	0.07
query28	4.38	1.18	0.94
query29	12.59	4.19	3.50
query30	0.27	0.14	0.12
query31	2.83	0.61	0.41
query32	3.24	0.56	0.51
query33	3.29	3.08	3.13
query34	16.21	5.52	4.89
query35	5.03	4.96	4.92
query36	0.71	0.53	0.52
query37	0.11	0.08	0.08
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.15
query41	0.09	0.04	0.03
query42	0.04	0.03	0.04
query43	0.05	0.04	0.04
Total cold run time: 100.4 s
Total hot run time: 29.09 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.60% (17898/34026)
Line Coverage 37.85% (162365/429023)
Region Coverage 32.24% (123740/383765)
Branch Coverage 33.67% (54247/161127)

@morningman
Copy link
Contributor

run check_coverage

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.46% (23839/33359)
Line Coverage 57.84% (247992/428775)
Region Coverage 52.78% (205177/388723)
Branch Coverage 54.61% (88481/162022)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 9.64% (19/197) 🎉
Increment coverage report
Complete coverage report

@morningman morningman merged commit 9177047 into apache:master Oct 22, 2025
27 of 30 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 22, 2025
## What's Changed

1. **Refined Azure Blob Configuration Naming**
- Adopted Azure-native property names for better consistency with Azure
SDK conventions:
     - `account_name` → Azure Storage Account Name  
     - `account_key` → Azure Storage Account Key
- Ensures compatibility, clarity, and alignment with Azure Blob
attribute definitions.

2. **Full Feature Support for Azure Blob Storage**
   - Added comprehensive integration for the following modules:
     - **TVF (Table-Valued Function)**
     - **LOAD (Data Loading)**
     - **CATALOG (Metadata Querying)**
- Azure Blob can now be used as both a data source and destination
across all modules.

3. **Protocol Compatibility**
   - Added full support for multiple Azure storage access protocols:
     - `abfs://`
     - `abfss://`
     - `wasb://`
     - `wasbs://`
- Automatically recognizes protocol prefixes and maps them to the
correct Azure storage client implementation.

## todo
 **Unified Connectivity Testing Framework**
- Refactored the connectivity test logic into a unified implementation
shared across all object storage backends (S3, OSS, COS, OBS, BOS, and
Azure).
- Improves code reusability and simplifies the process of adding new
storage providers.
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 9.64% (19/197) 🎉
Increment coverage report
Complete coverage report

@CalvinKirs CalvinKirs deleted the master-azure branch October 22, 2025 03:12
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Oct 22, 2025
yiguolei pushed a commit that referenced this pull request Oct 23, 2025
Cherry-picked from #56861

---------

Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
Co-authored-by: Pxl <xl@selectdb.com>
dwdwqfwe pushed a commit to dwdwqfwe/doris that referenced this pull request Oct 24, 2025
## What's Changed

1. **Refined Azure Blob Configuration Naming**
- Adopted Azure-native property names for better consistency with Azure
SDK conventions:
     - `account_name` → Azure Storage Account Name  
     - `account_key` → Azure Storage Account Key
- Ensures compatibility, clarity, and alignment with Azure Blob
attribute definitions.

2. **Full Feature Support for Azure Blob Storage**
   - Added comprehensive integration for the following modules:
     - **TVF (Table-Valued Function)**
     - **LOAD (Data Loading)**
     - **CATALOG (Metadata Querying)**
- Azure Blob can now be used as both a data source and destination
across all modules.

3. **Protocol Compatibility**
   - Added full support for multiple Azure storage access protocols:
     - `abfs://`
     - `abfss://`
     - `wasb://`
     - `wasbs://`
- Automatically recognizes protocol prefixes and maps them to the
correct Azure storage client implementation.

## todo
 **Unified Connectivity Testing Framework**
- Refactored the connectivity test logic into a unified implementation
shared across all object storage backends (S3, OSS, COS, OBS, BOS, and
Azure).
- Improves code reusability and simplifies the process of adding new
storage providers.
morrySnow pushed a commit that referenced this pull request Oct 30, 2025
cherry pick #56861

---------

Co-authored-by: Pxl <xl@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.3-merged dev/4.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants