Pushing cloud-init log to the KVP#529
Conversation
fixing the UT Using the Prod KVP max limit Pushing the logs on DS Azure activate, ensuring logs are only pushed once Fixing formatting Fixing formatting Pushing a portion of the log file to KVP reporting the number of bytes dumped to kvp Adding more tests fixing unit tests Fixing indentation, deleting unneeded files Setting the maz limit to 500KB
Co-authored-by: Chad Smith <chad.smith@canonical.com>
blackboxsw
left a comment
There was a problem hiding this comment.
hopefully minor changes on commenting. I think ultimately it would be neat for the handlers themselves to announce whether they accept compressed type events or "bytes" vs "text". But, I think that could be something we could discuss with a broader audience at the cloud-init summit next month. I'll add a topic for input.
blackboxsw
left a comment
There was a problem hiding this comment.
This looks really good @Moustafa-Moustafa thank you for addressing this with the semaphore file in /var/lib/cloud/data. That makes the most sense to gate and prevent multiple runs across reboot.
2 tiny comments left inline (about adding webhook and print to excluded handlers, and changing a log message).
Only question for you to confrm is:
- cloud-init will still upload the compressed log file to kvp one time after an upgrade and reboot on an existing VM. Do you wish to prevent that kvp compressed upload on existing instances which had already run cloud-init prior to upgrade?
Improving log message Co-authored-by: Chad Smith <chad.smith@canonical.com>
Excluding print and webhook Co-authored-by: Chad Smith <chad.smith@canonical.com>
Breaking the long line
blackboxsw
left a comment
There was a problem hiding this comment.
Excellent thanks so much for the discussion and fixes here. Validated behavior on Azure bionic instances.
Thanks for the great feedback and the through review |
Pushing the cloud-init.log file (Up to 500KB at once) to the KVP before reporting ready to the Azure platform.
Based on the analysis done on a large sample of cloud-init.log files, Here's the statistics collected on the log file size:
This change is limiting the size of cloud-init.log file data that gets dumped to KVP to 500KB. So for ~95% of the cases, the whole log file will be dumped and for the remaining ~5%, we will get the last 500KB of the cloud-init.log file.
To asses the performance of the 500KB limit, 250 VM were deployed with a 500KB cloud-init.log file and the time taken to compress, encode and dump the entries to KVP was measured. Here's the time in milliseconds percentiles:
Another 250 VMs were deployed with this logic dumping their normal cloud-init.log file to KVP, the same timing was measured as above. Here's the time in milliseconds percentiles:
Added excluded_handlers to the report_event function to be able to opt-out from reporting the events of the compressed cloud-init.log file to the cloud-init.log file.
The KVP break_down logic had a bug, where it will reuse the same key for all the split chunks of KVP which results in overwriting the split KVPs by the last one when consumed by Hyper-V. I added the split chunk index as a differentiator to the KVP key.
The Hyper-V consumes the KVPs from the KVP file as chunks whose key is 512KB and value is 2048KB but the Azure platform expects the value to be 1024KB, thus I introduced the Azure value limit.