Skip to content

Queue cleanup#159

Merged
alexcjohnson merged 15 commits intomasterfrom
queue-cleanup
May 24, 2016
Merged

Queue cleanup#159
alexcjohnson merged 15 commits intomasterfrom
queue-cleanup

Conversation

@alexcjohnson
Copy link
Contributor

Taking care of the last 2 points in #136 to make a more robust server architecture:

  • there's a new BaseServer class that all the servers inherit from, so we have a unified, well-tested pair of pathways for ask and write queries.
  • query handlers are no longer responsible for replying to the query; instead, the caller can invoke any handler with the ServerManager methods ask or write, and these methods encode into the query which flavor it is, invoking the appropriate processor on the other end: ask will reply with the handler's return value, write puts nothing on the queue.
  • there is no error_queue anymore, the response (to ask only - write never responds) just indicates whether it is valid or an error and is handled accordingly.
  • write queries report their errors locally (using logging), rather than to the server's response queue. While this might make these errors less visible because they will never bubble up to the main process, it means these errors get reported faster (you will see them in the subprocess monitor right away) and it relieves potential confusion in that the next query after the error occurs will be the one to throw the error, even though that next query is likely not a problem itself.

This is all kind of low level so I'm not sure who's interested in looking through it ( @guenp ? @spauka ? @AdriaanRol ? @damazter ?) , but it makes me a lot happier anyway about deleting the timeout in #144

try:
# the only way you'll get a response without asking for one
# is if we don't understand the query type code
self._response_queue.put((RESPONSE_ERROR, error_str))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is dangerous (or at least annoying) to put something in the return queue if the user doesn't expect it. Granted, if the user uses QUERY_WRITE then there is no problem, but I see mysef making the typo WIRTE [sic] which would lead to quite annoying situations.

I also see that there is no solution to this as the opposite solution will cause problems when the user types AKS [sic].

When I write my servers in python I use the convention that every command returns something. In the case of a write command, the return could just be OK.
Then I can envision problems with multiprocessing as you don't want to wait for your write to finish before continuing to the next command, but I guess that there is always a moment that you want to make sure that the WRITE command executed well, and that would be the time to check what is in your return_queue.

bottom line: I don't like the Write command a lot. but I would like to see the error handeling make an explicit case for a code_value error:

if code == QUERY_WRITE:
    logging.error(error_str)
elif code == QUERY_ASK:
    self._response_queue.put((RESPONSE_ERROR, error_str))
else:
    logging.error(error_str)
     self._response_queue.put((RESPONSE_ERROR, error_str))

thereby You are posting the error double, but you are sure the user will actually see the error some way or the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, double-posting if the mode is unclear. Granted, these codes are constants, not strings - well, the constants are strings... but if you use them as constants, typos will appear while you're coding rather than at run time.

You're right to be wary of write - there's really only one place I strongly prefer it, and that's posting data from a Loop to the DataServer. In that case I really don't even want to wait for confirmation that the write has happened, because the DataServer could be doing other long-running things like writing a data file, or syncing to a live plot, and we don't want that to slow down the Loop. But if we want to have other commands happen in a non-blocking way we could certainly make a third mode, QUERY_CONFIRMED_WRITE or something, that would acknowledge receipt after it parsed the basic structure of the query but before acting on it.

@damazter
Copy link
Contributor

Reading this code, I get quite confused. (this is not specific to this pull request, but If I am to look at this code, I need a bit better understanding)

While reading the code I get confused which thread is containing which classes.
I got as far as that ServerManager creates a new thread, which will behave comensurate to server_class, I think this serverclass is intended to be a child of BaseServer.
would it be i good idea to put this intention explicitely in the docstring of ServerManager?
I get that this intention is not a necessity (it shouldn't be), but it would make it more clear to get a quick understanding of the grammar while reading the code. That the reader can look up BaseServer, To understand what kind of class is intended here.

similarly the QcodesProcess object is not very intuitive for readers not familiar with this muliprocessing package.
Would it be possible to put a few helpful comments in the docstring which explains that this object represents a runnable thread, and explain what the target kwarg does (although it is part of the parent, mp.Process)

From the code it very hard to see now, that QcodesProcess will behave according to the server_class argument passed in ServerManager.
In the QcodesProcess code it is completely invisable how behavior is passed to this class.
A simple line stating that QcodesProcess will "run", using the command passed to target which is fetched from the server_class variable.

tl;dr
Thinking about it, I think the solution should be, which I would think is a lot more clear, is passing the server_class to QcodesProcess instead of target. Then, looking at QCodesProcess it is immediately clear that this is some sort of server.

@damazter
Copy link
Contributor

Now for some annoying suggestions (sorry :( )
Shouldn't the class InstrumentManager be called InstrumentServerManager
the managing refers to a server of the InstrumentServer kind, not to the managing of instruments and definitely not to managing a single instrument.

I don't find the name InstrumentConnection intuitive. It seems to serve the exact same purpouse as the RemoteInstrument classs. Shouldn't these two classes be merged? This merging isn't trivial, but it feels like everything that is part of InstrumentConnection should be part of RemoteInstrument

This would make the hierargy of the classes a bit more clear. An InstrumentServerManager contains an InstrumentServer (which will contain all real instrument drivers) and RemoteInstruments which represent the local mirror version of the real drivers. Hence there is nice symmetry between managers containing remoteinstruments and servers containing real instruments,
managers are responsible for talking to servers and remoteinstruments are responsible for talking to real instruments (via the manager and the server)

@alexcjohnson
Copy link
Contributor Author

Shouldn't the class InstrumentManager be called InstrumentServerManager

Yes, that makes sense.

it feels like everything that is part of InstrumentConnection should be part of RemoteInstrument

Yep, that would be much cleaner too, I will try to do this. It only exists because I was originally using the instrument class itself as the remote, but as the remote is now a separate dedicated class it is unnecessary and confusing.

@damazter
Copy link
Contributor

@alexcjohnson , Then I will wait with continuing looking at this code until you make these changes, and then go through it again.

I think QUERY_CONFIRMED_WRITE is not really a priority at this point as a third mode, as if you want confirmation, you should just use ask.
On the other hand, it might be a good idea though to change WRITE such that it behaves as QUERY_CONFIRMED_WRITE, then you are always garantueed to get a reply back from the server after every command, unless stuff is really going wrong, which will then also be indicated by not getting a response from the server.

forcibly terminating

traceback: (default True) whether to print a traceback at the point
of interrupt, for debugging purposes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MerlinSmiles re #178 - plain qc.halt_bg(), if you enter this in a cell, will only wait max 5 sec, and try to print a traceback (into the subprocess widget, if you're in a notebook). But the abort button in the subprocess widget (here) will wait longer (for whatever get() or set() is happening to finish and leave the system in a clean state) and not print any traceback.

@MerlinSmiles
Copy link
Contributor

@alexcjohnson
Great! The last changes seem to work just fine! Nice to have the option with and without traceback.

manager.connect(self, instrument_class, args, kwargs)
self._server_name = server_name
self._shared_kwargs = shared_kwargs
self._manager = get_instrument_server(self._server_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson looking at this line of code,

self._manager = get_instrument_server(self._server_name,

is this really the manager?
then get_instrument_server maybe be ``get_instrument_server_manager ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was a sloppy name :)

@alexcjohnson
Copy link
Contributor Author

@damazter I think I've addressed all your comments, want to take another look? Since I had already made so many changes to multiprocessing.py I took the opportunity per #181 to rename and break it up. Instead of qcodes.utils.multiprocessing it now lives in qcodes.process.* - I figured this stuff had graduated beyond just being utils :)

@alexcjohnson alexcjohnson mentioned this pull request May 19, 2016
@peendebak
Copy link
Contributor

@alexcjohnson

While this might make these errors less visible because they will never bubble up to the main process, it means these errors get reported faster (you will see them in the subprocess monitor right away) and ...

I have not read the entire PR, but is the subprocess monitor the qc.show_subprocess_widget()? In that case anyone not using the notebook will not see the error messages.

@alexcjohnson
Copy link
Contributor Author

@peendebak

I have not read the entire PR, but is the subprocess monitor the qc.show_subprocess_widget()? In that case anyone not using the notebook will not see the error messages.

There are two safeguards against this (which are not new in this PR, they've been there since I made the StreamQueue):

  • If in_notebook() is False the subprocesses will not connect to this anyway. We should make sure this test is as accurate as possible - refining it ala your spyder test in make plot updates work outside ipython notebook #155
  • If the subprocess does connect, then prints something but the queue hasn't been read recently (3 seconds by default) it will also print to the default stdout or stderr (sys.__stdout__ and sys.__stderr__). Though in some of these situations the initial stdout is different from sys.__stdout__. In a notebook for example, sys.stdout is an ipykernel stream that gets picked up by the notebook, but sys.__stdout__ prints to the notebook server terminal console. If in this case we just want to make sure the user sees the message no matter what, we might be better off printing to both streams if they're different...

@MerlinSmiles
Copy link
Contributor

I can still mess the halt things up.
When I start a loop, hit the stop button in the notebook, and then hit the abort button in the widget, the notebook freezes. Once I got this error in the terminal:

Traceback (most recent call last):
  File "c:\users\qdev\anaconda3\lib\site-packages\ipykernel\eventloops.py", line 25, in pr
ocess_stream_events
    def process_stream_events():
KeyboardInterrupt

But that was only once out of 10 times.

@alexcjohnson
Copy link
Contributor Author

@MerlinSmiles

When I start a loop, hit the stop button in the notebook, and then hit the abort button in the widget, the notebook freezes.

Interesting... I can also get some horrible things to happen by pressing the stop button while pyqtgraph is open (even if no plots are open but pyqtgraph has been called in the past). Perhaps we could find a way to have pyqtgraph processes also ignore KeyboardInterrupt?

Anyway, this symptom is the same on master and on this branch, can we get this merged and then troubleshoot that separately?

@MerlinSmiles
Copy link
Contributor

Yes, sure. Not sure if I should send the 💃 if thats the case, here you have it :)

@alexcjohnson
Copy link
Contributor Author

Thanks @MerlinSmiles
@damazter I think I've addressed all your earlier comments, feel free to create an issue (or discuss on slack!) for anything else that occurs to you after I merge this.

@alexcjohnson alexcjohnson merged commit 6375f61 into master May 24, 2016
@alexcjohnson alexcjohnson deleted the queue-cleanup branch May 24, 2016 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants