Skip to content

Conversation

@yinzhijian
Copy link
Contributor

Proposed changes

Problem summary

大体逻辑沿用#9172

关键改动点:

  1. FE的nodename从ip_port_timestamp改为hostname_port_timestamp,如下图中的fe3就是这个实例在k8s中的域名:image
    原因:为了避免IP变更后,其它FE实例的IP使用了该IP,比如FE3的原始IP为172.18.0.4,动态变更为172.18.0.50,而后新增了一个FE4,它的IP为172.18.0.4,这时如果还使用ip_port_timestamp形式,则无法直观的从name中区分彼此。
    风险点:可能影响依赖Name解析ip、port的外部程序,无法识别正确的IP。

  2. FDQNManager定期便利所有的FE,检查域名对应的IP是否已经改变,如果改变:
    2.1 通过BDBHA的updateAddress方法通知所有peers,保证bdbje层面的一致性。【现在直接以域名作为BDBJE的address,不需要更改IP时再更新】
    2.2 改变master自身内存记录的IP
    2.3 通过editlog同步给其它FE,改变其它follower内存中记录的FE ip信息。

  3. Frontend的meta持久化方式改为了json格式,方便后续字段的变更。

  4. deploy manger K8s支持IP变更(delete pod后stateful自动加回,ip跟原始的不一致)

Changes:

  1. The nodename of FE changes from ip_port_timestamp to hostname_port_timestamp, as shown in the following figure, fe3 is the hostname of this instance in k8s:

image

Reason: To avoid the scenario that other FE instances use the changed IP after IP change, for example, the original IP of FE3 is 172.18.0.4, changed dynamically to 172.18.0.50, and later a new FE4 is added with IP 172.18.0.4, if the ip_port_timestamp form is still used, the existing program cannot distinguish each other from the name.

Risk: The external program that relies on Name resolution for IP and port recognition may be affected and cannot recognize the correct IP.

  1. FDQNManager regularly visits all FEs to check if the IP corresponding to the hostname has changed. If changed:
    2.1 Notify all peers through the updateAddress method of BDBHA to ensure consistency at the bdbje level.
    2.2 Change the IP information recorded in memory by the master itself
    2.3 Synchronize to other FEs through the editlog to change the IP information recorded in memory by other followers.
  2. The persistence method of Frontend meta is changed to json format for ease of future field changes.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Mar 2, 2023
@yinzhijian
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

hello-stephen commented Mar 2, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.37 seconds
stream load tsv: 471 seconds loaded 74807831229 Bytes, about 151 MB/s
stream load json: 39 seconds loaded 2358488459 Bytes, about 57 MB/s
stream load orc: 74 seconds loaded 1101869774 Bytes, about 14 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230304043511_clickbench_pr_108371.html

@yinzhijian
Copy link
Contributor Author

run buildp0

1 similar comment
@yinzhijian
Copy link
Contributor Author

run buildp0

@yinzhijian
Copy link
Contributor Author

run p0

1 similar comment
@yinzhijian
Copy link
Contributor Author

run p0

@yinzhijian
Copy link
Contributor Author

run buildall

morningman
morningman previously approved these changes Mar 2, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

PR approved by anyone and no changes requested.

@yinzhijian
Copy link
Contributor Author

run buildall

1 similar comment
@yinzhijian
Copy link
Contributor Author

run buildall

@yinzhijian
Copy link
Contributor Author

run p0

@yinzhijian
Copy link
Contributor Author

run beut build CheckStyle ClangFormatter LicenseCheck ShellCheck

@yinzhijian
Copy link
Contributor Author

sh buildall

@morningman
Copy link
Contributor

run buildall

@yinzhijian yinzhijian force-pushed the dev.fixed_fqdn branch 3 times, most recently from a3ccd92 to 04dd2ed Compare March 3, 2023 13:52
@yinzhijian
Copy link
Contributor Author

run buildall

@morningman
Copy link
Contributor

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 627b5ee into apache:master Mar 5, 2023
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
morningman added a commit that referenced this pull request Apr 10, 2023
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
morningman added a commit that referenced this pull request Apr 11, 2023
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
…#18485)

1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up apache#18397
3. Add KW_HOSTNAME to keywords region, follow up apache#17329
4. Fix tvf not working with pipeline engine, follow up apache#18376
@zddr zddr mentioned this pull request May 23, 2023
3 tasks
mongo360 pushed a commit to mongo360/doris that referenced this pull request Jul 12, 2023
…#18485)

1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up apache#18397
3. Add KW_HOSTNAME to keywords region, follow up apache#17329
4. Fix tvf not working with pipeline engine, follow up apache#18376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants