Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #40764

## Proposed changes

At present, the version of separation of storage and computation version
and the version of computational storage cannot be converted to each
other. But if the user insists on mixing the two, there is no way to
avoid it at the code level. The following are possible scenarios that
may occur:

Case | The node has been in Cloud cluster before | The node has been in
Local cluster before | The node never been in any cluster
-- | -- | -- | --
add BE to local cluster | Add successfully, but error `invalid cluster
id. ignore. ` will be occurred. No negative impact on the original two
clusters. | Add successfully, but error `invalid cluster id. ignore. `
will be occurred. No negative impact on the original two clusters. | If
cloud configuration is not added, it can work normally<br />If cloud
configuration has been added, it will resulting in the inability to
start normally
add FE to local cluster | Add successfully, but error `Socket is closed
by peer. ` will be occurred. No negative impact on the original two
clusters. | Add successfully, but error `Socket is closed by peer. `
will be occurred. No negative impact on the original two clusters. | If
cloud configuration is not added, it can work normally<br />If cloud
configuration has been added, it will resulting in the inability to
start normally
add BE to cloud cluster | Add successfully, but error `invalid cluster
id. ignore. ` will be occurred. No negative impact on the original two
clusters. | Add successfully, but error `invalid cluster id. ignore. `
will be occurred. No negative impact on the original two clusters. | If
cloud configuration is not added, BE can run successfully, but error
will occur when execute inserting.<br />If cloud configuration has been
added, it can work normally
add FE to cloud cluster | Add successfully, but error `Socket is closed
by peer. ` will be occurred. No negative impact on the original two
clusters. | Add successfully, but error `Socket is closed by peer. `
will be occurred. No negative impact on the original two clusters. | If
cloud configuration is not added, FE will be hang and error `Unknown
meta module: cloudWarmUpJob.`<br />If cloud configuration has been
added, it can work normally

----

| Case | Situation |
| --------------------------------------------- |
------------------------------------------------------------ |
| BE in Local cluster add cloud config items | Hang up |
| FE in Local cluster add cloud config items | Hang up |
| BE in Cloud cluster remove cloud config items | run successfully, but
error occur when do query or insert |
| FE in Cloud cluster remove cloud config items | service down |

In this PR, I will check Doris' deployment mode. If the deployment mode
is modified later, the service will be down and a clear error message
will be given.

----

## 拟议变更


目前存算分离和存算一体模式不能互相转换,大部分情况下,这两种模式的部署应该不会搞混,但也不排除有些用户稀里糊涂,添加错了。另一个就是用户可能误删cloud相关的配置(比如从其他地方拷贝配置覆盖当前配置),导致以local模式启动。

针对不同集群的不同节点的情况:

| 情况 | 此节点之前已在其他Cloud集群 | 此节点之前已在其他Local集群 | 此节点之前从未添加到任何集群 |
| :-------------------- |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
| 把BE添加到Local的集群 | 可以添加,但心跳的时候会报invalid cluster id. ignore.
不影响原来两个集群的正常使用 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用
| 如果未加cloud相关配置信息,能正常工作如果已加cloud相关配置信息,会以cloud的逻辑启动,导致不能正常启动 |
| 把FE添加到Local的集群 | 可以添加,但心跳的时候会报 Socket is closed by peer.
不影响原来两个FE的正常使用 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用
| 如果未加cloud相关配置信息,能正常工作如果已加cloud相关配置信息,会以cloud的逻辑启动,导致不能正常启动 |
| 把BE添加到Cloud的集群 | 可以添加,但心跳的时候会报invalid cluster id. ignore.
不影响原来两个集群的正常使用 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用
| 如果未加cloud相关配置信息,能添加成功,但比如insert会报错,甚至会导致原有正常的be
core如果已加cloud相关配置信息,能正常工作 |
| 把FE添加到Cloud的集群 | 可以添加,但心跳的时候会报 Socket is closed by peer.
不影响原来两个FE的正常使用 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用
| 如果未加cloud相关配置信息如果没加入cloud集群,会报failed to get local fe's type, sleep 5
s, try again.如果已加入cloud集群,读取元数据会报错Unknown meta module:
cloudWarmUpJob.,卡住如果已加cloud相关配置信息,能正常工作 |

----

| 情况 | 现象 |
| :--------------------------- |
:--------------------------------------------------- |
| Local集群的BE添加cloud的配置 | 会以cloud的逻辑启动,导致启动卡住                    |
| Local集群的FE添加cloud的配置 | 会以cloud的逻辑启动,导致启动卡住                    |
| Cloud集群的BE删除cloud的配置 | 能正常启动,但查询导入会报错                         |
| Cloud集群的FE删除cloud的配置 | 不断刷get version from meta service failed,然后挂掉 |

针对这些情况,节点切换cloud/local模式的,应该快速失败,然后告知用户

---------

Co-authored-by: yagagagaga <zhangminke@selectdb.com>
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

run buildall

@gavinchou gavinchou closed this Nov 13, 2024
@gavinchou gavinchou reopened this Nov 13, 2024
@github-actions
Copy link
Contributor Author

clang-tidy review says "All clean, LGTM! 👍"

@dataroaring dataroaring merged commit bb09971 into branch-3.0 Nov 14, 2024
@CalvinKirs CalvinKirs deleted the auto-pick-40764-branch-3.0 branch November 14, 2024 06:20
@gavinchou gavinchou mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants