Skip to content

Conversation

@jacobtomlinson
Copy link
Member

There is a discussion in #3420 around older versions of OpenSSH not respecting the kill command, which results in SSH connections being left open after SSHCluster is done.

This issue (ronf/asyncssh#112) discusses the problem and explains that upgrading your OpenSSH version will resolve it. However as not all users are able to do this is also explains a workaround which involves sending a keyboard interrupt character to stdin on the process to manually kill it.

This PR implements that workaround.

@Keou0007
Copy link

Keou0007 commented Jun 12, 2020

Tested this on macOS and it works as expected, but mac worked before anyway...

Tried on CentOS 7 and I run into an issue where the scheduler process is spawned, but the worker processes never appear. It sits stalled seemingly indefinitely, and when I killed with CTRL-C, I get this traceback.

>>> from distributed import SSHCluster 
>>> cluster = SSHCluster(['localhost']*4, connect_options={'known_hosts':None}) 
^CTraceback (most recent call last): 
File “<stdin>”, line 1, in <module>
File "/opt/miniconda3/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 353, in SSHCluster 
	return SpecCluster(workers, scheduler, name="SSHCluster", **kwargs) 
File "/opt/miniconda3/lib/python3.8/site-packages/distributed/deploy/spec.py", line 256, in __init__ 
	self.sync(self._start) 
File "/opt/miniconda3/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 163, in sync 
	return sync(self.loop, func, *args, **kwargs) 
File "/opt/miniconda3/lib/python3.8/site-packages/distributed/utils.py", line 336, in sync 
	e.wait(10) 
File "/opt/miniconda3/lib/python3.8/threading.py", line 558, in wait 
	signaled = self._cond.wait(timeout) 
File "/opt/miniconda3/lib/python3.8/threading.py", line 306, in wait 
	gotit = waiter.acquire(True, timeout) 
KeyboardInterrupt 

Seems specifying term_type="ansi" is preventing the scheduler from initialising correctly, since the INFO messages that usually show up on launching the SSHCluster aren't even appearing.

@Keou0007
Copy link

I made a mistake, I also see the stalling behaviour on macOS. I had applied the workaround to dask in a different environment and was testing the old version.

@mrocklin
Copy link
Member

Checking in on this. Do you think that this is worth continuing @jacobtomlinson ?

@jacobtomlinson
Copy link
Member Author

Thanks for trying this out @Keou0007. Sorry it didn't work out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants