-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](k8s) Support fqdn mode for fe in k8s enviroment #16315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8700259 to
7c8d825
Compare
|
TeamCity pipeline, clickbench performance test result: |
7c8d825 to
c06dbf8
Compare
| return localAddr.getHostName(); | ||
| } | ||
|
|
||
| public static String getCanonicalHostName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the hostname and IP in the etc/hosts file, but it can't be resolved,
when this method can get HostName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you running on Docker or physical machines?
This issue may not be related to gethostname or getCanonicalHostName.
c06dbf8 to
6851b61
Compare
|
run buildall |
aca49c9 to
68c67e0
Compare
|
run p0 |
|
run buildall |
68c67e0 to
bc57b6d
Compare
|
run buildall |
|
run p0 |
2c865f6 to
f76f960
Compare
|
run buildall |
|
run p0 |
f76f960 to
0c9bb0e
Compare
|
run ut |
|
run clickbench |
|
run fe ut |
zddr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
yangzhg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
…apache#16315)" This reverts commit 48afd77.
…apache#16315)" This reverts commit 48afd77.
…apache#16315)" This reverts commit 48afd77.
| this.host = pair.first; | ||
| this.port = pair.second; | ||
| Preconditions.checkState(!Strings.isNullOrEmpty(host)); | ||
| HostInfo pair = SystemInfoService.getIpHostAndPort(hostPort, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HostInfo hostInfo
| private int masterRpcPort; | ||
| private int masterHttpPort; | ||
| private String masterIp; | ||
| private String masterHostName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better merge masterIp and masterHostName into one field
| + " in fe.conf to match the host " + split[0]); | ||
| } | ||
| } | ||
| // Notice: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still keep this check if Config.enable_fqdn_mode is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If enable_fqdn_mode has been enabled and nodename is already in hostname format, disabling enable_fqdn_mode will result in consistency check failure and startup failure.
| String hostName = FrontendOptions.getHostname(); | ||
| if (hostName.equals(FrontendOptions.getLocalHostAddress())) { | ||
| if (Config.enable_fqdn_mode) { | ||
| LOG.fatal("Can't get hostname in FQDN mode. Please check your network configuration." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this logic is strange, I think it can be done inside the FrontendOptions class, because all info is got from FrontendOptions
| int remoteClusterId = Integer.parseInt(clusterIdString); | ||
| if (remoteClusterId != clusterId) { | ||
| LOG.error("cluster id is not equal with helper node {}. will exit.", rightHelperNode.first); | ||
| LOG.error("cluster id is not equal with helper node {}. will exit.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw IOException, and the caller will do the rest
| throw new DdlException("frontend does not exist, nodeName:" + nodeName); | ||
| } | ||
| boolean needLog = false; | ||
| // we use hostname as address of bdbha, so we not need to update node address when ip changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we not -> we don't
| if (fe.getRole() == FrontendNodeType.FOLLOWER || fe.getRole() == FrontendNodeType.REPLICA) { | ||
| haProtocol.removeElectableNode(fe.getNodeName()); | ||
| helperNodes.remove(Pair.of(host, port)); | ||
| // ip may be changed, so we need use both ip and hostname to check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate code with line 3335, extract to a method
| String oldIp = fe.getIp(); | ||
| String newIp = inetAddress.getHostAddress(); | ||
| Env.getCurrentEnv().modifyFrontendIp(fe.getNodeName(), newIp); | ||
| LOG.info("ip for {} of fe has been changed from {} to {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.warn
| Text.writeString(out, json); | ||
| } | ||
|
|
||
| public void readFields(DataInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public -> private, and mark it as @deprecated
| public class MasterInfo implements Writable { | ||
|
|
||
| private String ip; | ||
| private String hostName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hostName maybe null, we need to handle it.
And you can change the serde to GSON for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is every error prone to be null for a field.
How about keep it as empty string is not set, and use Strings.isEmpty() to check it when use it?
Same suggestion for host in Frontend.java
…viroment (apache#16315)" (apache#17278)" This reverts commit 201cf9c.
…viroment (apache#16315)" (apache#17278)" This reverts commit 201cf9c.
…viroment (apache#16315)" (apache#17278)" This reverts commit 201cf9c.
…viroment (apache#16315)" (apache#17278)" This reverts commit 201cf9c.
…apache#16315)" (apache#17278) This reverts commit 48afd77. There is meta problem

Proposed changes
Problem summary
大体逻辑沿用#9172
关键改动点:
FE的nodename从ip_port_timestamp改为hostname_port_timestamp,如下图中的fe3就是这个实例在k8s中的域名:
原因:为了避免IP变更后,其它FE实例的IP使用了该IP,比如FE3的原始IP为172.18.0.4,动态变更为172.18.0.50,而后新增了一个FE4,它的IP为172.18.0.4,这时如果还使用ip_port_timestamp形式,则无法直观的从name中区分彼此。
风险点:可能影响依赖Name解析ip、port的外部程序,无法识别正确的IP。
FDQNManager定期便利所有的FE,检查域名对应的IP是否已经改变,如果改变:
2.1
通过BDBHA的updateAddress方法通知所有peers,保证bdbje层面的一致性。【现在直接以域名作为BDBJE的address,不需要更改IP时再更新】2.2 改变master自身内存记录的IP
2.3 通过editlog同步给其它FE,改变其它follower内存中记录的FE ip信息。
Frontend的meta持久化方式改为了json格式,方便后续字段的变更。
deploy manger K8s支持IP变更(delete pod后stateful自动加回,ip跟原始的不一致)
Changes:
Reason: To avoid the scenario that other FE instances use the changed IP after IP change, for example, the original IP of FE3 is 172.18.0.4, changed dynamically to 172.18.0.50, and later a new FE4 is added with IP 172.18.0.4, if the ip_port_timestamp form is still used, the existing program cannot distinguish each other from the name.
Risk: The external program that relies on Name resolution for IP and port recognition may be affected and cannot recognize the correct IP.
2.1 Notify all peers through the updateAddress method of BDBHA to ensure consistency at the bdbje level.
2.2 Change the IP information recorded in memory by the master itself
2.3 Synchronize to other FEs through the editlog to change the IP information recorded in memory by other followers.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...