Skip to content

Pushing cloud-init log to the KVP#529

Merged
blackboxsw merged 18 commits into
canonical:masterfrom
Moustafa-Moustafa:momousta-push-log-to-kvp
Aug 20, 2020
Merged

Pushing cloud-init log to the KVP#529
blackboxsw merged 18 commits into
canonical:masterfrom
Moustafa-Moustafa:momousta-push-log-to-kvp

Conversation

@Moustafa-Moustafa
Copy link
Copy Markdown
Contributor

@Moustafa-Moustafa Moustafa-Moustafa commented Aug 10, 2020

Pushing the cloud-init.log file (Up to 500KB at once) to the KVP before reporting ready to the Azure platform.
Based on the analysis done on a large sample of cloud-init.log files, Here's the statistics collected on the log file size:

P50 P90 P95 P99 P99.9 P99.99
137K 423K 537K 3.5MB 6MB 16MB

This change is limiting the size of cloud-init.log file data that gets dumped to KVP to 500KB. So for ~95% of the cases, the whole log file will be dumped and for the remaining ~5%, we will get the last 500KB of the cloud-init.log file.
To asses the performance of the 500KB limit, 250 VM were deployed with a 500KB cloud-init.log file and the time taken to compress, encode and dump the entries to KVP was measured. Here's the time in milliseconds percentiles:

P50 P99 P999
75.705 232.701 1169.636

Another 250 VMs were deployed with this logic dumping their normal cloud-init.log file to KVP, the same timing was measured as above. Here's the time in milliseconds percentiles:

P50 P99 P999
1.88 5.277 6.992

Added excluded_handlers to the report_event function to be able to opt-out from reporting the events of the compressed cloud-init.log file to the cloud-init.log file.

The KVP break_down logic had a bug, where it will reuse the same key for all the split chunks of KVP which results in overwriting the split KVPs by the last one when consumed by Hyper-V. I added the split chunk index as a differentiator to the KVP key.

The Hyper-V consumes the KVPs from the KVP file as chunks whose key is 512KB and value is 2048KB but the Azure platform expects the value to be 1024KB, thus I introduced the Azure value limit.

Moustafa-Moustafa and others added 4 commits August 10, 2020 14:52
fixing the UT

Using the Prod KVP max limit

Pushing the logs on DS Azure activate, ensuring logs are only pushed once

Fixing formatting

Fixing formatting

Pushing a portion of the log file to KVP

reporting the number of bytes dumped to kvp

Adding more tests

fixing unit tests

Fixing indentation, deleting unneeded files

Setting the maz limit to 500KB
@blackboxsw blackboxsw self-assigned this Aug 12, 2020
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
Comment thread cloudinit/reporting/events.py Outdated
Comment thread cloudinit/reporting/handlers.py
Co-authored-by: Chad Smith <chad.smith@canonical.com>
Comment thread cloudinit/sources/helpers/azure.py
Comment thread cloudinit/sources/helpers/azure.py Outdated
Comment thread tests/unittests/test_reporting_hyperv.py
Comment thread cloudinit/sources/helpers/azure.py Outdated
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully minor changes on commenting. I think ultimately it would be neat for the handlers themselves to announce whether they accept compressed type events or "bytes" vs "text". But, I think that could be something we could discuss with a broader audience at the cloud-init summit next month. I'll add a topic for input.

Comment thread cloudinit/sources/helpers/azure.py Outdated
Comment thread cloudinit/sources/helpers/azure.py
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
@Moustafa-Moustafa Moustafa-Moustafa marked this pull request as draft August 19, 2020 00:46
@Moustafa-Moustafa Moustafa-Moustafa marked this pull request as ready for review August 19, 2020 23:41
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good @Moustafa-Moustafa thank you for addressing this with the semaphore file in /var/lib/cloud/data. That makes the most sense to gate and prevent multiple runs across reboot.
2 tiny comments left inline (about adding webhook and print to excluded handlers, and changing a log message).

Only question for you to confrm is:

  • cloud-init will still upload the compressed log file to kvp one time after an upgrade and reboot on an existing VM. Do you wish to prevent that kvp compressed upload on existing instances which had already run cloud-init prior to upgrade?

Comment thread cloudinit/sources/helpers/azure.py Outdated
Comment thread cloudinit/sources/helpers/azure.py Outdated
Moustafa-Moustafa and others added 3 commits August 20, 2020 09:41
Improving log message

Co-authored-by: Chad Smith <chad.smith@canonical.com>
Excluding print and webhook

Co-authored-by: Chad Smith <chad.smith@canonical.com>
Breaking the long line
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent thanks so much for the discussion and fixes here. Validated behavior on Azure bionic instances.

@blackboxsw blackboxsw merged commit d941c7f into canonical:master Aug 20, 2020
@Moustafa-Moustafa
Copy link
Copy Markdown
Contributor Author

Excellent thanks so much for the discussion and fixes here. Validated behavior on Azure bionic instances.

Thanks for the great feedback and the through review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants