Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/adr/0001-pipeline-logging-and-telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Pipeline Logging and Telemetry

## Status

Proposed

## Context

Typically CI/CD pipelines use file-based logging or integrate with the logging features of the CI/CD platform to surface progress and diagnostic information. This presents the following challenges:

* Unstructured output
* Portability issues across CI/CD platforms
* Balancing the log verbosity in day-to-day usage, with the need for additional diagnostics in the event of an issue. This can result in having to re-run a failed pipeline in 'diagnostic' mode to get the required information, which makes it harder to troubleshoot inconsistently reproducible issues
* Limited support for metrics & telemetry
* Limited support for reporting & trend analysis over time

It is proposed that the above issues are addressed by decoupling the logging activities from the CI/CD platform - such that the CI/CD platform becomes a consumer of the underlying logging system rather than the primary publisher of the log data.

The following implementation options are considered below:

* Azure Application Insights / Azure Log Analytics
* Azure Data Lake Storage Gen2
* Azure CosmosDB (serverless)

### Azure Application Insights / Azure Log Analytics
This options sends all log messages to an AppInsights workspace. It would require the types of log data produced by the CI/CD workload to be mapped to the application logging semantics offered by AppInsights.

Pros:
* Built-in semantics for tracking events, exceptions etc.
* OOTB visualisation and query tools in the Azure Portal
* .Net SDK reduces up-front integration effort

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pro is it applies the "buy before build" principle?
Another pro is that would extend the concept of a central logging solution for "BuildDevOps", assuming Application Insights is being used already for infrastructure / application telemetry? So opportunity to look at pain points with applications through their end to end lifecycle, not just after they have been deployed?


Cons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Application Insights also drops data if it decides it needs to, which it calls "sampling" IIRC. This is often a useful feature, enabling you to maintain an overview of what's going on without paying to capture every last bit of information. But for this sort of diagnostic logging of a single process, it could be a liability.

I've never been entirely certain if it's possible to completely disable this sampling feature. I know you can ask for it not to happen, but it's not clear to me whether it takes that as an inviolable instruction, or just a suggestion.

* Price premium compared to purely data storage-based pricing of other options, both for ingestion >5GB/month and retention >90 days
* Potential lag in data being available (potentially mitigated by Live Metrics Stream if supported for this scenario)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the delay (circa 5 minutes) can be a frustration. But would you be using this to debug live issues for example whilst getting a CI/CD new pipeline up and running for the first time?

* Built-in telemetry semantics may not fully translate to CI/CD workloads (e.g. events, dependencies etc.) and their use may feel somewhat contrived
* Potentially more difficult to perform deeper or longer-term analytics


### Azure Data Lake Storage Gen2
This option sends all log message to table storage in an ADLS Gen2 storage account (to faciliate subsequent analytics). Initially the schema of the log data could be closely aligned to that of the CI/CD workloads (e.g. PowerShell streams, pipeline instances etc.) but could evolve as necessary in the future.

Pros:
* Freedom to define a telemetry scheme suitable for CI/CD workloads
* Easy to integrate with other analytical services (e.g. PowerBI, Synapse etc.)
* Lower costs for long-term retention

Cons:
* Any visualisation interface will need to be built

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of small events needing to be consolidated in the data lake reminds me of the challenges Ed has been having with SmartR?

* Potentially more effort to integrate with CI/CD workloads


### Azure CosmosDB (serverless)
This would be similar to the ADLS Gen2 option, except using CosmosDB to take advantage of its richer query API features.

Pros:
* Freedom to define a telemetry scheme suitable for CI/CD workloads
* Choice of APIs for integrating with CI/CD workloads
* Ability to integrate with analytical services via Synapse Link

Cons:
* Any visualisation interface will need to be built
* Potentially more effort to integrate with CI/CD workloads
* Potentially more expensive at higher throughput volumes

## Decision

TBC

## Consequences

* A CI/CD platform-agnostic approach for pipeline logging
* The ability to capture telemetry from CI/CD workloads
* Retention of telemetry beyond limits imposed by CI/CD platform
* The ability to report on pipeline activities and relationships
* The potential for applying further analytics to track/discover trends