Johnsonshi/refactor azure report ready codepath by johnsonshi · Pull Request #468 · canonical/cloud-init

johnsonshi · 2020-06-30T21:29:32Z

Refactor Azure report ready and Goal State code path.

This PR refactors Azure report ready code to include more robust tests and telemetry.

johnsonshi · 2020-06-30T21:35:03Z

Hi @OddBloke, as discussed the closed PR #422, I have removed the diagnostic/error logging changes to the PR. The code changes now solely focus on refactoring the Azure report ready codepath and Goal State related code. Thanks for the suggestions!

…epath

johnsonshi · 2020-06-30T21:41:33Z

Adding @anhvoms as well for visibility.

OddBloke

Hi @johnsonshi, thanks for this! This was much easier to review and, at a high-level, it looks good. I have a bunch of inline comments: most of these are minor (including a few suggested type annotations).

I have one high-level comment on the tests: per our Unit Testing docs:

Variables/parameter names for Mock or MagicMock instances should start with m_ to clearly distinguish them from non-mock variables

Could you go through and ensure that this naming convention is being followed for the new tests you are adding here?

Thanks again!

…epath

johnsonshi · 2020-07-09T10:03:00Z

Hi @OddBloke, thanks for the very detailed and thorough review! I have followed the mock test naming conventions you mentioned earlier, your type annotation PR comments, refactoring AzureEndpointHttpClient to be in self (instead of getting passed around), and a bunch of other PR comments. Let me know if you need anything else.

OddBloke · 2020-07-13T20:25:03Z

Hey @johnsonshi, thanks for the update! I was off at the end of last week, so I'm playing catch-up; I'm expecting to get to a re-review of this tomorrow.

OddBloke

This is looking really good now, @johnsonshi, thanks for your work! I have a few inline comments: the two most important are a concern remaining around flushing, and around some testing. But we're very close!

…epath

johnsonshi · 2020-07-17T10:55:47Z

Looking at the Hyper-V KVP Reporter's threaded publisher subroutine (here and here), we see that it calls _append_kvp_item in order to write encoded_data (encoded kvp diagnostic messages) to the kvp pool file.

We see that _append_kvp_item does hold an exclusive lock and flushes output to the file, so we can be sure that diagnostic messages are indeed flushed to the Hyper-V kvp file.

We also see that HyperVKvpReportingHandler.flush() does block the main thread until the publisher thread completely empties queued kvp diagnostic messages.

You are right that time.sleep(0) is not needed since (1) the threaded publishing subroutine flushes output when it writes and (2) the main thread blocks until threaded publisher empties and writes all queued events. Thus we can be sure that all queued events are written up to this point.

In fact, it seems like HyperVKvpReportingHandler.flush() is a reliable method to ensure that queued events are indeed empties and flushed. The previous time.sleep(0) did not provide any guarantees of flushing all events. In fact it was just relying on yielding to the task scheduler and random luck. :)

johnsonshi · 2020-07-17T11:08:16Z

Thank you so much @OddBloke. I can sense that this is getting really close indeed. Thanks for your extensive reviews each pass :)

anhvoms · 2020-07-17T12:00:14Z

Looking at the Hyper-V KVP Reporter's threaded publisher subroutine (here and here), we see that it calls _append_kvp_item in order to write encoded_data (encoded kvp diagnostic messages) to the kvp pool file.

We see that _append_kvp_item does hold an exclusive lock and flushes output to the file, so we can be sure that diagnostic messages are indeed flushed to the Hyper-V kvp file.

We also see that HyperVKvpReportingHandler.flush() does block the main thread until the publisher thread completely empties queued kvp diagnostic messages.

You are right that time.sleep(0) is not needed since (1) the threaded publishing subroutine flushes output when it writes and (2) the main thread blocks until threaded publisher empties and writes all queued events. Thus we can be sure that all queued events are written up to this point.

In fact, it seems like HyperVKvpReportingHandler.flush() is a reliable method to ensure that queued events are indeed empties and flushed. The previous time.sleep(0) did not provide any guarantees of flushing all events. In fact it was just relying on yielding to the task scheduler and random luck. :)

HyperVKvpReportingHandler.flush() is reliable since it calls a queue join, but you risk blocking the report ready if the KVPs could not be flushed due to some reasons. I ran into this once when I had some malformed KVP due to utf-8 encoding. Regarldess, my opinion is that the telemetry channel should not fail the VM reporting ready. There could be bugs in the telemetry channel that we should figure out to make sure it would never cause issues, but that should not be the scope of this PR.

What we observed was that at the time of reporting ready, sometimes there were KVPs that weren't processed yet in the queue. Time.sleep(0) was a low cost best-effort to yield the scheduler so that some of these events could be processed. Is it random luck? I'm not quite sure. When i looked into this the last time, running strace showed that time.sleep(0) did trigger a context switch and schedule yielding.
https://stackoverflow.com/questions/7273474/behavior-of-pythons-time-sleep0-under-linux-does-it-cause-a-context-switch

…epath

johnsonshi · 2020-07-23T00:17:45Z

After discussing with @anhvoms, I've decided to revert time.sleep(0) back as it was a proven and reliable method for KVPs to be written to disk before the VM reports ready. Agree with @anhvoms and his points. This PR should be scoped to refactoring only.

…epath

johnsonshi · 2020-07-27T21:07:07Z

@OddBloke I believe this is ready for merging now :)

OddBloke

Hey Johnson, apologies for the delay in this review! Two minor requests, plus a follow-up on that time.sleep(0) part. This is really close, thanks for your continuing work!

…epath

johnsonshi · 2020-08-13T19:54:15Z

@OddBloke As per @anhvoms 's comment above, there were issues with Hyper-V's flush KVP method due to encoding errors, causing the telemetry channel to be blocked and preventing cloud-init from ever flushing the KVPs. We'd prefer to scope this PR to only be refactoring, and have the next PR deal with telemetry/reporting/logging/flushing KVPs as previously discussed :)

…epath

OddBloke · 2020-08-13T20:13:27Z

@OddBloke As per @anhvoms 's comment above, there were issues with Hyper-V's flush KVP method due to encoding errors, causing the telemetry channel to be blocked and preventing cloud-init from ever flushing the KVPs. We'd prefer to scope this PR to only be refactoring, and have the next PR deal with telemetry/reporting/logging/flushing KVPs as previously discussed :)

I'm perfectly happy with that approach. If we are going to deal with such things in a follow-up PR, then we should drop the time.sleep(0) call: I don't want us to land only part of the full change which you intend to make.

johnsonshi · 2020-08-13T20:22:50Z

Just to clarify @OddBloke, the time.sleep(0) call was previously there (currently in master). Then there was a proposal to change the time.sleep(0) to use flush KVP instead. However, after consulting with @anhvoms, we discovered that he did try to use flush KVP but ran into issues, so I reverted back to using time.sleep(0), which was the original implementation.

…epath

OddBloke · 2020-08-13T20:23:38Z

Also @OddBloke should the type hints in this PR be removed?

TL;DR: no. :-)

Type hints were introduced in Python 3.5, and cloud-init still supports Python 3.4 I believe.

Function annotations have been supported in every version of Python 3: https://www.python.org/dev/peps/pep-3107/

What was introducing in Python 3.5 was the typing module, which our hacking doc prohibits the use of (for the compatibility reason you identified); see https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#type-annotations for more details.

Additionally, older distros such as RHEL/CentOS (which only ships with Python 2.7) might not be able to easily backport changes that are Python 3 specific. What's your take on this?

cloud-init upstream is Python 3-only, we have already introduced changes which break Python 2 compatibility (including function annotations), and our codebase is better with type annotations than without: I don't think we should drop them.

OddBloke · 2020-08-13T20:25:07Z

Just to clarify @OddBloke, the time.sleep(0) call was previously there (currently in master). Then there was a proposal to change the time.sleep(0) to use flush KVP instead. However, after consulting with @anhvoms, we discovered that he did try to use flush KVP but ran into issues, so I reverted back to using time.sleep(0), which was the original implementation.

Aha, I see that now: thanks for pushing back! Given that, we should certainly leave it in.

OddBloke

Thanks!!

johnsonshi and others added 5 commits June 30, 2020 18:18

Refactor Azure report ready code.

f21b4b3

This PR refactors Azure report ready code to include more robust tests and telemetry.

Add telemetry for obtaining Azure certificates XML

6b3088e

Add extra telemetry for marker files

7452026

github-cla-signers: add johnsonshi

c1eca52

Restrict PR to only Goal State and Report Ready refactor

9c04700

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

18244b4

…epath

OddBloke self-assigned this Jul 3, 2020

OddBloke self-requested a review July 3, 2020 13:46

OddBloke suggested changes Jul 3, 2020

View reviewed changes

johnsonshi added 11 commits July 9, 2020 02:17

Address PR comments on Azure GoalState class

3979661

Azure helper: remove inheriting from object for classes

9ced3c8

Azure: address PR comment on getting goal state from Azure confusion

a9f0d7e

Azure: Address PR comments wrt type annotations

b36da25

Address more type annotation comments

e8084a1

Address PR comment on storing AzureHttpClient in self

24724ac

Address azure test helper PR comments

afefdf8

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

4ae5f2b

…epath

Address PR comments on mock var naming convention

1473720

Flush KVP events before reporting to fabric

a44e75e

Address PR comment on filtering/getting admin user's pubkeys

9be69e5

OddBloke suggested changes Jul 14, 2020

View reviewed changes

johnsonshi added 2 commits July 17, 2020 10:27

Address PR Comments

54e9939

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

de5377c

…epath

Remove time.sleep(0)

98d2391

johnsonshi added 3 commits July 22, 2020 20:46

Scope this PR to not include time.sleep(0) change

c77ebd3

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

412b32f

…epath

Remove unused import

1733312

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

4b082cd

…epath

OddBloke suggested changes Aug 4, 2020

View reviewed changes

Comment thread cloudinit/sources/helpers/azure.py Outdated

Comment thread cloudinit/sources/helpers/azure.py

Comment thread tests/unittests/test_datasource/test_azure_helper.py

johnsonshi added 3 commits August 13, 2020 11:43

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

f05c8e0

…epath

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

5a8e379

…epath

Incorporate PR comments

cc59f08

johnsonshi added 2 commits August 13, 2020 12:54

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

255020f

…epath

Remove type hints for backwards compat

8d25d5e

Merge branch 'master' into johnsonshi/refactor-azure-report-ready-cod…

1f80068

…epath

OddBloke approved these changes Aug 13, 2020

View reviewed changes

OddBloke merged commit c3556ae into canonical:master Aug 13, 2020

johnsonshi deleted the johnsonshi/refactor-azure-report-ready-codepath branch August 14, 2020 01:26

johnsonshi mentioned this pull request Aug 21, 2020

Add method type hints for Azure helper #540

Merged

ubuntu-server-builder mentioned this pull request May 12, 2023

Release 20.3 #3774

Closed

Conversation

johnsonshi commented Jun 30, 2020

Uh oh!

johnsonshi commented Jun 30, 2020

Uh oh!

johnsonshi commented Jun 30, 2020

Uh oh!

OddBloke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnsonshi commented Jul 9, 2020

Uh oh!

OddBloke commented Jul 13, 2020

Uh oh!

OddBloke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnsonshi commented Jul 17, 2020

Uh oh!

johnsonshi commented Jul 17, 2020

Uh oh!

anhvoms commented Jul 17, 2020

Uh oh!

johnsonshi commented Jul 23, 2020

Uh oh!

johnsonshi commented Jul 27, 2020

Uh oh!

OddBloke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnsonshi commented Aug 13, 2020

Uh oh!

OddBloke commented Aug 13, 2020

Uh oh!

johnsonshi commented Aug 13, 2020

Uh oh!

OddBloke commented Aug 13, 2020

Uh oh!

OddBloke commented Aug 13, 2020

Uh oh!

OddBloke left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants