Skip to content

Conversation

@caiconghui
Copy link
Contributor

@caiconghui caiconghui commented Apr 22, 2022

Proposed changes

Issue Number: close #9171

Problem Summary:

In the k8s environment, the ip of the pod can be changed, but the hostname of pod is stable. When the host machine of the pod fails, the k8s can schedule the failed pod to the new host machine for reconstruction. After that, the newly created pod's hostname remains unchanged, and the ip address has been changed. The change of the be node's ip address can be detected by FQDNManager when enable_fqdn_mode is true

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/docs Categorizes issue or PR as related to documentation. labels Apr 22, 2022
@stalary
Copy link
Contributor

stalary commented Apr 26, 2022

triple.getMiddle() == null ? triple.getLeft() : triple.getMiddle()
This code appears many times, can we do it in the generated place?
return Triple.of(ip, Config.enable_k8s_container_drift_mode ? hostName : ip, heartbeatPort) whether it can be satisfied?

@caiconghui
Copy link
Contributor Author

triple.getMiddle() == null ? triple.getLeft() : triple.getMiddle() This code appears many times, can we do it in the generated place? return Triple.of(ip, Config.enable_k8s_container_drift_mode ? hostName : ip, heartbeatPort) whether it can be satisfied?

Config.enable_k8s_container_drift_mode config is master only, replay operation no need depend on this config.

@stalary
Copy link
Contributor

stalary commented Apr 27, 2022

LGTM

@caiconghui caiconghui added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Apr 27, 2022
@caiconghui caiconghui changed the title [enhancement](k8s) Support k8s_container_drift_mode for be and broker in k8s enviroment [enhancement](k8s) Support k8s_container_detect_drift_mode for be and broker in k8s enviroment Apr 29, 2022
@morningman
Copy link
Contributor

This PR needs some more detailed description:

  1. Background motivation for the issue
  2. Detail Solutions
  3. Which interfaces and behaviors are affected
  4. Documentation corresponding to the function
    1. What scenarios does this function apply to
    2. what problems does it solve
    3. whether there are any precautions.
  5. Unit test

@morningman morningman added dev/backlog waiting to be merged in future dev branch and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels May 15, 2022
@caiconghui caiconghui changed the title [enhancement](k8s) Support k8s_container_detect_drift_mode for be and broker in k8s enviroment [enhancement](k8s) Support enable_fqdn_mode for be in k8s enviroment Nov 26, 2022
@caiconghui caiconghui changed the title [enhancement](k8s) Support enable_fqdn_mode for be in k8s enviroment [enhancement](k8s) Support fqdn mode for be in k8s enviroment Nov 26, 2022
@caiconghui caiconghui force-pushed the ip_change branch 2 times, most recently from 94dc530 to 430342d Compare November 26, 2022 15:26
@caiconghui caiconghui removed the area/planner Issues or PRs related to the query planner label Nov 26, 2022
@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Nov 26, 2022
@hello-stephen
Copy link
Contributor

hello-stephen commented Nov 26, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.81 seconds
load time: 436 seconds
storage size: 17123343166 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221130092924_clickbench_pr_55460.html

@caiconghui caiconghui force-pushed the ip_change branch 2 times, most recently from 74e0b66 to e03a45e Compare November 27, 2022 01:33
@caiconghui
Copy link
Contributor Author

caiconghui commented Nov 27, 2022

  1. enable_fqdn_mode is mainly used in k8s environment, not be compatible with old cluster, when set true, the FQDN may not work, because origin's be hostName is null, but would not affect the behavior of the whole cluster.
  2. when enable_fqdn_mode is true, the FDQNManager would detect be ip change every some seconds, and would check hostName needed by backend when add backend, you sholud specify "hostName:port" or still give "ip:port" with that cluster can get hostName by ip, otherwise, add add backend would failed.
  3. now for be, if drop or modify backend cluster would check backend's hostname and port is same or then ip and port is same.
  4. check interval is 5 seconds, it means FDQNManager would check every 5 seconds, and other value is also ok, it just affect the be ip detect time
  5. if be's ip change, then the FDQNManager would invalidate all client cache to be, and set the new ip for the be.
  6. finally all be and fe connection is still based on ip, this logic keep unchanged

private long id;
@SerializedName("host")
private String host;
private volatile String host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们最好在这里做一下区分,ipaddress 和 host name, 感觉现在这个host 和 hostname 区分不清楚。

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 30, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@caiconghui caiconghui added area/compute-node k8s and removed area/compute-node area/planner Issues or PRs related to the query planner labels Nov 30, 2022
@caiconghui caiconghui merged commit 9bbbcf0 into apache:master Nov 30, 2022
@zddr zddr mentioned this pull request May 23, 2023
3 tasks
@caiconghui caiconghui deleted the ip_change branch June 13, 2023 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/backlog waiting to be merged in future dev branch k8s kind/docs Categorizes issue or PR as related to documentation. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Support fqdn mode for be in k8s enviroment

5 participants