[improvement](FQDN)Change the implementation of fqdn #19123

zddr · 2023-04-26T14:08:54Z

Proposed changes

Issue Number: close #xxx

Problem summary

主要改动：

1.如果配置文件开启fqdn，fe启动的时候localAddr会获取fqdn，而不是ip，priority_networks会失效

2.Backend和Frontend的ip和hostname合为一个字段host，开启fqdn的时候代表hostname，不开启的时候代表ip

3.集群间通信直接使用fqdn，各种连接池增加验证机制，防止域名的ip发生变化，节点间连接出错

4.不再需要轮询验证ip是否发生变化，删除fqdnManager

5.改变fe间验证节点合法性的方式，由获取客户端ip改为在http的请求头或thrift的消息体里面显示传递节点自身标识

6.处理心跳时，如果be发现自己存储的host和master存储的host不一致，验证host合法性后，会更改自身host，而不是直接报错

7.简化fe name的生成逻辑

Main changes:

If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail
The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address
The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors
No longer requires polling to verify if the IP has changed, delete fqdnManager
Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle
When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error
Simplify the generation logic of fe name

影响范围：

1.集群间通信建立连接

2.通过ip等属性判断是否为同一节点

3.打印日志

4.信息展示

5.地址拼接

6.k8s部署

7.升级兼容性

Scope of influence:

Establishing communication connections between clusters
Determine whether it is the same node through attributes such as IP
Print Log
Information display
Address Splicing
k8s deployment
Upgrade compatibility

测试方案：

1.节点更换ip，在fqdn保持不变的情况下，改变fe和be的ip，验证集群能否正常读写数据

2.使用master的代码生成元数据，在当前pr上使用之前的元数据，验证能否兼容旧版本（之前就开启过fqdn的不再支持升级）

3.使用k8s部署fe和be集群，验证集群能否正常读写数据

4.按照https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn升级旧集群

5.使用streamload分别指定fe，be的fqdn导入数据

6.使用不同用户开启事务用insert语句写入数据

Test plan:

Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally
Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before)
Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally
According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters
Use streamload to specify the fqdn of fe and be to import data separately
Use different users to start transactions and write data using insert statements

Checklist(Required)

Does it affect the original behavior
Has unit tests been added
Has document been added or modified
Does it need to update dependencies
Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

github-actions · 2023-04-26T14:16:46Z