-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-3.1: [opt](kerberos) opt hdfs kerberos logic (#47299 #47826 #48655) #52193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…de (apache#47299) Previously, BE node use principal and keytab to do the kerberos authentication. But only the modified hadoop libhdfs support authenticating in this way, the origin libhdfs only support setting kerberos ticket cache path, or use system level kerberos authentication context. This pull request introduces a comprehensive Kerberos authentication module for the BE. The module is designed to handle Kerberos ticket management, including initialization, authentication, and periodic ticket refresh. It provides a robust interface for integrating Kerberos authentication, ensuring secure and efficient credential management. 1. **KerberosConfig** (`kerberos_config.h` and `kerberos_config.cpp`): - This class encapsulates the configuration settings required for Kerberos authentication, such as principal, keytab path, and refresh intervals. - Provides methods to set and retrieve configuration parameters. 2. **KerberosTicketCache** (`kerberos_ticket_cache.h` and `kerberos_ticket_cache.cpp`): - Manages the Kerberos ticket cache, including initialization, login, and periodic refresh of tickets. - Supports operations like writing to the ticket cache and checking if a refresh is needed. - Utilizes a background thread to periodically refresh tickets based on configured intervals. - The default cache file will be written in `/tmp` dir, but can be modified using `kerberos_ccache_path` in be.conf 3. **KerberosTicketMgr** (`kerberos_ticket_mgr.h` and `kerberos_ticket_mgr.cpp`): - Acts as a manager for multiple Kerberos ticket caches, handling their lifecycle, including creation, access, and cleanup. - Provides methods to get or set ticket caches and retrieve cache file paths. - Includes a background thread for cleaning up expired ticket caches every 1 hour. If a cache is longer being referenced, it will be removed. 4. **HdfsMgr** - A simple and new class to manager the hdfs fs handler. - It replace the old `HdfsHandlerCache` - It will check HdfsHandler every 1 hour, and remove unused HdfsHandler after 24 hours. 1. Introduce a comprehensive kerberos ticket cache management on BE side 1. Use ticket cache path instead of principal and keytab to do the kerberos authentication of libhdfs. 2. Fix the issue that `kerberos_krb5_conf_path` in be.conf does not take effect. 3. Add a new system table `backend_kerberos_ticket_cache`, to view the krb ticket cache of each backend: ``` Doris > select * from information_schema.backend_kerberos_ticket_cache\G *************************** 1. row *************************** BE_ID: 1738304534666 BE_IP: 172.20.32.136 PRINCIPAL: hdfs/master-1-1@EMR.C-0596176698BD4D17.COM KEYTAB: /path/to/hdfs.keytab SERVICE_PRINCIPAL: krbtgt/EMR@EMR.C-0596176698BD4D17.COM TICKET_CACHE_PATH: /tmp/doris_krb_ce93d5ebb2a6554c7ba9f43aee3a9e6c HASH_CODE: ce93d5ebb2a6554c7ba9f43aee3a9e6c START_TIME: 2025-02-01 00:08:26 EXPIRE_TIME: 2025-02-01 00:09:26 AUTH_TIME: 2025-02-01 00:08:26 REF_COUNT: 1 REFRESH_INTERVAL_SECOND: 3600 ``` The user interface remains unchanged. 1. set krb5.conf path in be.conf `kerberos_krb5_conf_path`, default is `/etc/krb5.conf` 2. provide kerberos principal the keytab path as usual. be.conf 1. `kerberos_ccache_path` The dir where kerber ticket cache file saved. the file name as format `doris_krb_xxxx` 2. `kerberos_krb5_conf_path` The path of krb5.conf file 6. `kerberos_refresh_interval_second` The min interval to refresh a kerberos ticket cache file. default is 1h. 7. cleanup logic If the ticket cache is not used for 1 day, it will be deleted.
…kerberos ticket. (apache#47826) ### What problem does this PR solve? Related PR: apache#47299 Problem Summary: fix the `KerberosTicketEntry entry` initialization to enable compile pass.
…stead of using kerberos ticket cache. (apache#48655) Related PR: apache#47299, apache#49181 This PR mainly changes: 1. Back to use principal and keytab to login kerberos instead of using kerberos ticket cache. Discard what I did in apache#47299. It looks like there are a lot of issue when using ticket cache in multi-kerberos env. So I abandoned that logic. 2. Config's default value Change the default value of related to hdfs file handle cache 1. `max_hdfs_file_handle_cache_num`: from 1000 to 20000 2. `max_hdfs_file_handle_cache_time_sec`: from 3600 to 28800 3. Fix a bug the cleanup thread of `FileHandleCache` is not working
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 39834 ms |
TPC-DS: Total hot run time: 195535 ms |
ClickBench: Total hot run time: 31.68 s |
morrySnow
approved these changes
Jun 24, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
bp #47299 #47826 #48655 #49529