Skip to content

can't remove a router if its setup aborts unexpectedly #2152

@yongshengma

Description

@yongshengma

I open this issue as new one because I think its scenario is different from #2149 .

I have finished installing and setting up 192.168.2.181 as the first storage router of cluster. Then I installed 192.168.2.182 as the second router but have not set up yet. Another guy was doing some weird thing on 192.168.2.182. Then I ran ovs setup but it failed with error:

Configuring/updating model
root@192.168.2.182's password: 
root@192.168.2.182's password: 
root@192.168.2.182's password: 
ERROR: Failed to setup extra node
ERROR: Command line: [u'/usr/bin/ssh', u'root@192.168.2.182', u'cd', u'/root', u'&&', u'/usr/bin/python2.7', u'/root/tmp.NMakYxqDhg/deployed-rpyc.py']
Exit code: 255
Stderr:  | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
         | Permission denied, please try again.
         | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
         | Permission denied, please try again.
         | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
         | Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++  An unexpected error occurred:                                                    +++
+++  Command line: [u'/usr/bin/ssh', u'root@192.168.2.182', u'cd', u'/root', u'&&',   +++
+++  u'/usr/bin/python2.7', u'/root/tmp.NMakYxqDhg/deployed-rpyc.py']                 +++
+++  Exit code: 255                                                                   +++
+++  Stderr:  | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or  +++
+++  directory                                                                        +++
+++           | Permission denied, please try again.                                  +++
+++  | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or           +++
+++  directory                                                                        +++
+++           | Permission denied, please try again.                                  +++
+++  | ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or           +++
+++  directory                                                                        +++
+++           | Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).  +++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I find it was caused by abusing mode 777 to /root directory. This is not ovs' fault. It was corrected later.

However, the second router's info obviously has been stored as it shows up on UI . I retried ovs setup on second router (192.168.2.182) but it said this node already exists. I also tried to run ovs remove node 192.168.2.182 on the first node but it failed with no details

[root@test-1 ~]# ovs remove node 192.168.2.182
+++++++++++++++++++++
+++  Remove node  +++
+++++++++++++++++++++
WARNING: Some of these steps may take a very long time, please check the logs for more information


Creating SSH connections to remaining master nodes
  * Node with IP 192.168.2.181  - Successfully connected
  * Node with IP 192.168.2.182  - Successfully connected

+++ Running "noderemoval - validate_removal" hooks +++

Executing alba._validate_removal
Are you sure you want to remove node test-2? (y/[n]): y
Starting removal of node test-2 - 192.168.2.182
  Removing vPools from node
Stopping and removing services
Removing services
Removing service workers
Removing service support-agent
Removing service watcher-framework
Removing service watcher-config

+++ Running "noderemoval - remove" hooks +++

Executing storagedriver._on_remove
Executing alba._on_remove
Removing node from model
  [192.168.2.181] watcher-framework stopped
  [192.168.2.181] memcached restarted
  [192.168.2.181] watcher-framework started
  [192.168.2.181] support-agent restarted


+++++++++++++++++++++++++++++++++++++++
+++  An unexpected error occurred:  +++
+++++++++++++++++++++++++++++++++++++++


So far the second router looks dangling in this cluster and it might prevent next router from joining in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions