{Telemetry} Fix telemetry lost due to concurrency issue#29717
Merged
{Telemetry} Fix telemetry lost due to concurrency issue#29717
Conversation
️✔️AzureCLI-FullTest
|
️✔️AzureCLI-BreakingChangeTest
|
Collaborator
|
Telemetry |
MoChilia
approved these changes
Aug 19, 2024
Member
MoChilia
left a comment
There was a problem hiding this comment.
Tested on GitHub Actions. Everything looks great, and telemetry is fully recorded.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR is for fixing telemetry lost due to concurrency issue.
At the stage when we still had telemetry local cache, here's the process of uploading telemetry:
2.1 if it's not enough time since last telemetry uploading time, exit
2.2 if it has been long enough since last telemetry uploading time, start a subprocess for uploading
3.1 check telemetry note file again to ensure it has been enough time since last telemetry sent time
3.2 get
last_sent_timefrom telemetry note file3.3 read all cli records from telemetry cache file where
record_time > last_sent_time3.4 remove cache file as it has been loaded in 3.3
3.5 uploading all records from 3.3
3.6 update
last_sent_timefor telemetry note fileSince #26854, we have dropped telemetry cache. So step 2 and step 3.1 are removed. We won't check if it has been "enough time" since last uploading, we do the uploading every time. So here's the new process:
3.1 get
last_sent_timefrom telemetry note file3.2 read all cli records from telemetry cache file where
record_time > last_sent_time3.3 remove cache file as it has been loaded in 3.2
3.4 uploading all records from 3.3
3.5 update
last_sent_timefor telemetry note fileBut this enlarges concurrency issue as we increased the frequency. Operations on two files may encounter concurrency issue, one is the telemetry cache file (step 1, step 3.2, step 3.3), another is telemetry note file (step 3.1, step 3.5):
portallocker.AlreadyLockederror, seeazure-cli/src/azure-cli-telemetry/azure/cli/telemetry/components/telemetry_note.py
Lines 72 to 76 in ad96375
azure-cli/src/azure-cli-telemetry/azure/cli/telemetry/__init__.py
Lines 109 to 112 in ad96375
So the fix also comes from two part:
Testing Guide
Run a cli script which contains many cli commands
Check $HOME/.azure/logs/telemetry.log and see if all command records have been uploaded
This checklist is used to make sure that common guidelines for a pull request are followed.
The PR title and description has followed the guideline in Submitting Pull Requests.
I adhere to the Command Guidelines.
I adhere to the Error Handling Guidelines.