Skip to content

Conversation

@yinzhijian
Copy link
Contributor

@yinzhijian yinzhijian commented Feb 1, 2023

Proposed changes

Problem summary

大体逻辑沿用#9172

关键改动点:

  1. FE的nodename从ip_port_timestamp改为hostname_port_timestamp,如下图中的fe3就是这个实例在k8s中的域名:image
    原因:为了避免IP变更后,其它FE实例的IP使用了该IP,比如FE3的原始IP为172.18.0.4,动态变更为172.18.0.50,而后新增了一个FE4,它的IP为172.18.0.4,这时如果还使用ip_port_timestamp形式,则无法直观的从name中区分彼此。
    风险点:可能影响依赖Name解析ip、port的外部程序,无法识别正确的IP。

  2. FDQNManager定期便利所有的FE,检查域名对应的IP是否已经改变,如果改变:
    2.1 通过BDBHA的updateAddress方法通知所有peers,保证bdbje层面的一致性。【现在直接以域名作为BDBJE的address,不需要更改IP时再更新】
    2.2 改变master自身内存记录的IP
    2.3 通过editlog同步给其它FE,改变其它follower内存中记录的FE ip信息。

  3. Frontend的meta持久化方式改为了json格式,方便后续字段的变更。

  4. deploy manger K8s支持IP变更(delete pod后stateful自动加回,ip跟原始的不一致)

Changes:

  1. The nodename of FE changes from ip_port_timestamp to hostname_port_timestamp, as shown in the following figure, fe3 is the hostname of this instance in k8s:

image

Reason: To avoid the scenario that other FE instances use the changed IP after IP change, for example, the original IP of FE3 is 172.18.0.4, changed dynamically to 172.18.0.50, and later a new FE4 is added with IP 172.18.0.4, if the ip_port_timestamp form is still used, the existing program cannot distinguish each other from the name.

Risk: The external program that relies on Name resolution for IP and port recognition may be affected and cannot recognize the correct IP.

  1. FDQNManager regularly visits all FEs to check if the IP corresponding to the hostname has changed. If changed:
    2.1 Notify all peers through the updateAddress method of BDBHA to ensure consistency at the bdbje level.
    2.2 Change the IP information recorded in memory by the master itself
    2.3 Synchronize to other FEs through the editlog to change the IP information recorded in memory by other followers.
  2. The persistence method of Frontend meta is changed to json format for ease of future field changes.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Feb 1, 2023
@hello-stephen
Copy link
Contributor

hello-stephen commented Feb 1, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.47 seconds
stream load tsv: 468 seconds loaded 74807831229 Bytes, about 152 MB/s
stream load json: 37 seconds loaded 2358488459 Bytes, about 60 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 27 seconds loaded 861443392 Bytes, about 30 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230228110332_clickbench_pr_105675.html

return localAddr.getHostName();
}

public static String getCanonicalHostName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the hostname and IP in the etc/hosts file, but it can't be resolved,
when this method can get HostName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you running on Docker or physical machines?
This issue may not be related to gethostname or getCanonicalHostName.

@yinzhijian
Copy link
Contributor Author

run buildall

@yinzhijian yinzhijian force-pushed the dev.fqdn_fe branch 3 times, most recently from aca49c9 to 68c67e0 Compare February 27, 2023 08:40
@yinzhijian
Copy link
Contributor Author

run p0

@yinzhijian
Copy link
Contributor Author

run buildall

@yinzhijian
Copy link
Contributor Author

run buildall

@yinzhijian
Copy link
Contributor Author

run p0

@yinzhijian yinzhijian force-pushed the dev.fqdn_fe branch 2 times, most recently from 2c865f6 to f76f960 Compare February 28, 2023 03:02
@yinzhijian
Copy link
Contributor Author

run buildall

@yinzhijian
Copy link
Contributor Author

run p0

@yinzhijian
Copy link
Contributor Author

run ut

@yinzhijian
Copy link
Contributor Author

run clickbench

@yinzhijian
Copy link
Contributor Author

run fe ut

@morningman morningman added the kind/meta-version-change Categorizes issue or PR as related to changing meta version label Mar 1, 2023
Copy link
Contributor

@zddr zddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

PR approved by anyone and no changes requested.

Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

PR approved by at least one committer and no changes requested.

@yinzhijian yinzhijian merged commit 48afd77 into apache:master Mar 1, 2023
@Gabriel39
Copy link
Contributor

This PR incurs a compatible problem. I got the message after this PR:
555a1abc-cd39-43f0-9366-cc2efe866230

@apache apache deleted a comment from yinzhijian Mar 1, 2023
@apache apache deleted a comment from yinzhijian Mar 1, 2023
@apache apache deleted a comment from yinzhijian Mar 1, 2023
@apache apache deleted a comment from yinzhijian Mar 1, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 1, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 1, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 1, 2023
morningman pushed a commit that referenced this pull request Mar 1, 2023
this.host = pair.first;
this.port = pair.second;
Preconditions.checkState(!Strings.isNullOrEmpty(host));
HostInfo pair = SystemInfoService.getIpHostAndPort(hostPort, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HostInfo hostInfo

private int masterRpcPort;
private int masterHttpPort;
private String masterIp;
private String masterHostName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better merge masterIp and masterHostName into one field

+ " in fe.conf to match the host " + split[0]);
}
}
// Notice:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still keep this check if Config.enable_fqdn_mode is false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If enable_fqdn_mode has been enabled and nodename is already in hostname format, disabling enable_fqdn_mode will result in consistency check failure and startup failure.

String hostName = FrontendOptions.getHostname();
if (hostName.equals(FrontendOptions.getLocalHostAddress())) {
if (Config.enable_fqdn_mode) {
LOG.fatal("Can't get hostname in FQDN mode. Please check your network configuration."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.warn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this logic is strange, I think it can be done inside the FrontendOptions class, because all info is got from FrontendOptions

int remoteClusterId = Integer.parseInt(clusterIdString);
if (remoteClusterId != clusterId) {
LOG.error("cluster id is not equal with helper node {}. will exit.", rightHelperNode.first);
LOG.error("cluster id is not equal with helper node {}. will exit.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw IOException, and the caller will do the rest

throw new DdlException("frontend does not exist, nodeName:" + nodeName);
}
boolean needLog = false;
// we use hostname as address of bdbha, so we not need to update node address when ip changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we not -> we don't

if (fe.getRole() == FrontendNodeType.FOLLOWER || fe.getRole() == FrontendNodeType.REPLICA) {
haProtocol.removeElectableNode(fe.getNodeName());
helperNodes.remove(Pair.of(host, port));
// ip may be changed, so we need use both ip and hostname to check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code with line 3335, extract to a method

String oldIp = fe.getIp();
String newIp = inetAddress.getHostAddress();
Env.getCurrentEnv().modifyFrontendIp(fe.getNodeName(), newIp);
LOG.info("ip for {} of fe has been changed from {} to {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.warn

Text.writeString(out, json);
}

public void readFields(DataInput in) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public -> private, and mark it as @deprecated

public class MasterInfo implements Writable {

private String ip;
private String hostName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hostName maybe null, we need to handle it.
And you can change the serde to GSON for this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is every error prone to be null for a field.
How about keep it as empty string is not set, and use Strings.isEmpty() to check it when use it?
Same suggestion for host in Frontend.java

yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 2, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 2, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 3, 2023
yinzhijian added a commit to yinzhijian/doris that referenced this pull request Mar 3, 2023
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner kind/meta-version-change Categorizes issue or PR as related to changing meta version reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants