Johnsonshi/refactor azure report ready codepath#468
Conversation
This PR refactors Azure report ready code to include more robust tests and telemetry.
|
Adding @anhvoms as well for visibility. |
OddBloke
left a comment
There was a problem hiding this comment.
Hi @johnsonshi, thanks for this! This was much easier to review and, at a high-level, it looks good. I have a bunch of inline comments: most of these are minor (including a few suggested type annotations).
I have one high-level comment on the tests: per our Unit Testing docs:
Variables/parameter names for
MockorMagicMockinstances should start withm_to clearly distinguish them from non-mock variables
Could you go through and ensure that this naming convention is being followed for the new tests you are adding here?
Thanks again!
|
Hi @OddBloke, thanks for the very detailed and thorough review! I have followed the mock test naming conventions you mentioned earlier, your type annotation PR comments, refactoring |
|
Hey @johnsonshi, thanks for the update! I was off at the end of last week, so I'm playing catch-up; I'm expecting to get to a re-review of this tomorrow. |
OddBloke
left a comment
There was a problem hiding this comment.
This is looking really good now, @johnsonshi, thanks for your work! I have a few inline comments: the two most important are a concern remaining around flushing, and around some testing. But we're very close!
|
Looking at the Hyper-V KVP Reporter's threaded publisher subroutine (here and here), we see that it calls We see that We also see that You are right that In fact, it seems like |
|
Thank you so much @OddBloke. I can sense that this is getting really close indeed. Thanks for your extensive reviews each pass :) |
HyperVKvpReportingHandler.flush() is reliable since it calls a queue join, but you risk blocking the report ready if the KVPs could not be flushed due to some reasons. I ran into this once when I had some malformed KVP due to utf-8 encoding. Regarldess, my opinion is that the telemetry channel should not fail the VM reporting ready. There could be bugs in the telemetry channel that we should figure out to make sure it would never cause issues, but that should not be the scope of this PR. What we observed was that at the time of reporting ready, sometimes there were KVPs that weren't processed yet in the queue. Time.sleep(0) was a low cost best-effort to yield the scheduler so that some of these events could be processed. Is it random luck? I'm not quite sure. When i looked into this the last time, running strace showed that time.sleep(0) did trigger a context switch and schedule yielding. |
|
@OddBloke I believe this is ready for merging now :) |
OddBloke
left a comment
There was a problem hiding this comment.
Hey Johnson, apologies for the delay in this review! Two minor requests, plus a follow-up on that time.sleep(0) part. This is really close, thanks for your continuing work!
|
@OddBloke As per @anhvoms 's comment above, there were issues with Hyper-V's flush KVP method due to encoding errors, causing the telemetry channel to be blocked and preventing cloud-init from ever flushing the KVPs. We'd prefer to scope this PR to only be refactoring, and have the next PR deal with telemetry/reporting/logging/flushing KVPs as previously discussed :) |
I'm perfectly happy with that approach. If we are going to deal with such things in a follow-up PR, then we should drop the |
|
Just to clarify @OddBloke, the |
TL;DR: no. :-)
Function annotations have been supported in every version of Python 3: https://www.python.org/dev/peps/pep-3107/ What was introducing in Python 3.5 was the
cloud-init upstream is Python 3-only, we have already introduced changes which break Python 2 compatibility (including function annotations), and our codebase is better with type annotations than without: I don't think we should drop them. |
Aha, I see that now: thanks for pushing back! Given that, we should certainly leave it in. |
Refactor Azure report ready and Goal State code path.