-
Notifications
You must be signed in to change notification settings - Fork 2
adr0001 pipeline logging and telemetry #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # Pipeline Logging and Telemetry | ||
|
|
||
| ## Status | ||
|
|
||
| Proposed | ||
|
|
||
| ## Context | ||
|
|
||
| Typically CI/CD pipelines use file-based logging or integrate with the logging features of the CI/CD platform to surface progress and diagnostic information. This presents the following challenges: | ||
|
|
||
| * Unstructured output | ||
| * Portability issues across CI/CD platforms | ||
| * Balancing the log verbosity in day-to-day usage, with the need for additional diagnostics in the event of an issue. This can result in having to re-run a failed pipeline in 'diagnostic' mode to get the required information, which makes it harder to troubleshoot inconsistently reproducible issues | ||
| * Limited support for metrics & telemetry | ||
| * Limited support for reporting & trend analysis over time | ||
|
|
||
| It is proposed that the above issues are addressed by decoupling the logging activities from the CI/CD platform - such that the CI/CD platform becomes a consumer of the underlying logging system rather than the primary publisher of the log data. | ||
|
|
||
| The following implementation options are considered below: | ||
|
|
||
| * Azure Application Insights / Azure Log Analytics | ||
| * Azure Data Lake Storage Gen2 | ||
| * Azure CosmosDB (serverless) | ||
|
|
||
| ### Azure Application Insights / Azure Log Analytics | ||
| This options sends all log messages to an AppInsights workspace. It would require the types of log data produced by the CI/CD workload to be mapped to the application logging semantics offered by AppInsights. | ||
|
|
||
| Pros: | ||
| * Built-in semantics for tracking events, exceptions etc. | ||
| * OOTB visualisation and query tools in the Azure Portal | ||
| * .Net SDK reduces up-front integration effort | ||
|
|
||
| Cons: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Application Insights also drops data if it decides it needs to, which it calls "sampling" IIRC. This is often a useful feature, enabling you to maintain an overview of what's going on without paying to capture every last bit of information. But for this sort of diagnostic logging of a single process, it could be a liability. I've never been entirely certain if it's possible to completely disable this sampling feature. I know you can ask for it not to happen, but it's not clear to me whether it takes that as an inviolable instruction, or just a suggestion. |
||
| * Price premium compared to purely data storage-based pricing of other options, both for ingestion >5GB/month and retention >90 days | ||
| * Potential lag in data being available (potentially mitigated by Live Metrics Stream if supported for this scenario) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree that the delay (circa 5 minutes) can be a frustration. But would you be using this to debug live issues for example whilst getting a CI/CD new pipeline up and running for the first time? |
||
| * Built-in telemetry semantics may not fully translate to CI/CD workloads (e.g. events, dependencies etc.) and their use may feel somewhat contrived | ||
| * Potentially more difficult to perform deeper or longer-term analytics | ||
|
|
||
|
|
||
| ### Azure Data Lake Storage Gen2 | ||
| This option sends all log message to table storage in an ADLS Gen2 storage account (to faciliate subsequent analytics). Initially the schema of the log data could be closely aligned to that of the CI/CD workloads (e.g. PowerShell streams, pipeline instances etc.) but could evolve as necessary in the future. | ||
|
|
||
| Pros: | ||
| * Freedom to define a telemetry scheme suitable for CI/CD workloads | ||
| * Easy to integrate with other analytical services (e.g. PowerBI, Synapse etc.) | ||
| * Lower costs for long-term retention | ||
|
|
||
| Cons: | ||
| * Any visualisation interface will need to be built | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lots of small events needing to be consolidated in the data lake reminds me of the challenges Ed has been having with SmartR? |
||
| * Potentially more effort to integrate with CI/CD workloads | ||
|
|
||
|
|
||
| ### Azure CosmosDB (serverless) | ||
| This would be similar to the ADLS Gen2 option, except using CosmosDB to take advantage of its richer query API features. | ||
|
|
||
| Pros: | ||
| * Freedom to define a telemetry scheme suitable for CI/CD workloads | ||
| * Choice of APIs for integrating with CI/CD workloads | ||
| * Ability to integrate with analytical services via Synapse Link | ||
|
|
||
| Cons: | ||
| * Any visualisation interface will need to be built | ||
| * Potentially more effort to integrate with CI/CD workloads | ||
| * Potentially more expensive at higher throughput volumes | ||
|
|
||
| ## Decision | ||
|
|
||
| TBC | ||
|
|
||
| ## Consequences | ||
|
|
||
| * A CI/CD platform-agnostic approach for pipeline logging | ||
| * The ability to capture telemetry from CI/CD workloads | ||
| * Retention of telemetry beyond limits imposed by CI/CD platform | ||
| * The ability to report on pipeline activities and relationships | ||
| * The potential for applying further analytics to track/discover trends | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pro is it applies the "buy before build" principle?
Another pro is that would extend the concept of a central logging solution for "BuildDevOps", assuming Application Insights is being used already for infrastructure / application telemetry? So opportunity to look at pain points with applications through their end to end lifecycle, not just after they have been deployed?