Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
eb2c5f3
separate build yamls for ci_prod branch (#415)
ganga1980 Aug 5, 2020
df29e35
re-enable adx path (#420)
vishiy Aug 6, 2020
bcc8506
Gangams/release changes (#419)
ganga1980 Aug 6, 2020
39534d6
fix for zero filled metrics (#423)
rashmichandrashekar Aug 6, 2020
5e0b429
consolidate windows agent image docker files (#422)
ganga1980 Aug 7, 2020
c5c28f0
Gangams/cluster creation scripts (#414)
ganga1980 Aug 13, 2020
d7a3750
fix: Pin to a particular version of ltsc2019 by SHA (#427)
bragi92 Aug 14, 2020
5e8de91
enable collecting npm metrics (optionally) (#425)
vishiy Aug 14, 2020
17e7ff8
Saaror patch 3 (#426)
saaror Aug 17, 2020
6c7c675
Gangams/add containerd support to windows agent (#428)
ganga1980 Aug 18, 2020
bac8a32
Gangams/arc k8s metrics (#413)
ganga1980 Aug 20, 2020
ab03640
fix: Reverting back to ltsc2019 tag (#429)
bragi92 Aug 21, 2020
af0f981
more kubelet metrics (#430)
vishiy Aug 27, 2020
7fc4d4c
fix nom issue when config is empty (#432)
vishiy Sep 1, 2020
281a77c
support multiple docker paths when docker root is updated thru knode …
vishiy Sep 1, 2020
d8d7f9f
Gangams/doc and other related updates (#434)
ganga1980 Sep 11, 2020
2d56087
add missing serviceprincipal in ps scripts (#435)
ganga1980 Sep 14, 2020
a28aaf0
fix telemetry bug (#436)
vishiy Sep 15, 2020
0062b32
Gangams/readmeupdates non aks 09162020 (#437)
ganga1980 Sep 16, 2020
1a7ef1c
Gangams/fix weird conflicts (#439)
ganga1980 Sep 16, 2020
bf75bf0
fix quote issue for the region (#441)
ganga1980 Sep 21, 2020
6287724
fix cpucapacity/limit bug (#442)
vishiy Sep 21, 2020
bd30a47
grwehner/pv-usage-metrics (#431)
gracewehner Sep 23, 2020
7304a6b
add new custom metric regions (#444)
vishiy Sep 23, 2020
2d8c03f
add 'Terminating' state (#443)
vishiy Sep 23, 2020
da06d76
Gangams/sept agent release tasks (#445)
ganga1980 Sep 25, 2020
5453054
grwehner/pv-collect-volume-name (#448)
gracewehner Sep 28, 2020
fe9f14d
Changes for september agent release (#449)
rashmichandrashekar Sep 30, 2020
f1657c6
Gangams/arc k8s related scripts, charts and doc updates (#450)
ganga1980 Oct 1, 2020
e6dad83
Install CA certs from wireserver (#451)
rashmichandrashekar Oct 1, 2020
23397ed
grwehner/pv-volume-name-in-mdm (#452)
gracewehner Oct 1, 2020
7562a96
Release changes for 10052020 release (#453)
vishiy Oct 5, 2020
4b47f44
Update onboarding_instructions.md (#456)
saaror Oct 12, 2020
3f86b23
chart update for sept2020 release (#457)
ganga1980 Oct 19, 2020
6203c3a
add missing version update in the script (#458)
ganga1980 Oct 19, 2020
5b15469
November release fixes - activate one agent, adx schema v2, win perf …
vishiy Oct 27, 2020
157ba20
remove hiphen for params in chart (#462)
vishiy Oct 28, 2020
7c448bc
Changes for cutting a new build for ciprod10272020 release (#460)
vishiy Oct 28, 2020
62b27d7
using latest stable version of msys2 (#465)
ganga1980 Oct 29, 2020
909cc16
fixing the windows-perf-dups (#466)
rashmichandrashekar Oct 29, 2020
d481c06
chart updates related to new microsoft/charts repo (#467)
ganga1980 Nov 6, 2020
aff1e13
Changes for creating 11092020 release (#468)
vishiy Nov 9, 2020
ca18850
MDM exception aggregation (#470)
rashmichandrashekar Nov 10, 2020
18c27dd
grwehner/mdm custom metric regions (#471)
gracewehner Nov 23, 2020
a5c12e9
updaitng rs limit to 1gb (#474)
rashmichandrashekar Dec 4, 2020
7453fd4
grwehner/pv inventory (#455)
gracewehner Dec 10, 2020
24b709f
Gangams/fix for build release pipeline issue (#476)
ganga1980 Dec 15, 2020
9061201
add pv fluentd plugin config to helm rs config (#477)
gracewehner Dec 15, 2020
064bc06
Gangams/fix rs ooming (#473)
ganga1980 Dec 16, 2020
9cb058c
Gangams/enable arc onboarding to ff (#478)
ganga1980 Dec 18, 2020
ef9d726
Convert PV type dictionary to json for telemetry so it shows up in lo…
gracewehner Jan 4, 2021
97bdb94
fix 2 windows tasks - 1) Dont log to termination log 2) enable ADX ro…
vishiy Jan 6, 2021
94237be
fix ci envvar collection in large pods (#483)
ganga1980 Jan 6, 2021
aacd496
grwehner/jan agent tasks (#481)
gracewehner Jan 7, 2021
148d739
updating fbit version and cpu limit (#485)
rashmichandrashekar Jan 8, 2021
bd33dd9
reverting to older version (#487)
rashmichandrashekar Jan 8, 2021
d5164d2
Gangams/add fbsettings configurable via configmap (#486)
ganga1980 Jan 11, 2021
908d9b0
Gangams/jan agent release tasks (#484)
ganga1980 Jan 11, 2021
8ede536
remove per container logs in ci (#488)
ganga1980 Jan 11, 2021
37e5218
updates for ciprod01112021 release (#489)
ganga1980 Jan 12, 2021
3c97af6
new yaml files (#491)
deagraw Jan 14, 2021
90e1a5b
Use cloud-specific instrumentation keys (#494)
daweim0 Jan 22, 2021
98b6d77
upgrade apt to latest version (#492)
ganga1980 Jan 22, 2021
ddcd3ee
Gangams/add support for extension msi for arc k8s cluster (#495)
ganga1980 Jan 27, 2021
0cd99e4
Gangams/arm template arc k8s extension (#496)
ganga1980 Jan 27, 2021
13521c5
Gangams/aks monitoring via policy (#497)
ganga1980 Feb 1, 2021
e4f36c7
revert to use operatingSystem from osImage for node os telemety (#498)
ganga1980 Feb 1, 2021
ec15ac1
Container log v2 schema changes (#499)
vishiy Feb 4, 2021
6031be8
Add priority class to the daemonsets (#500)
Michael-Sinz Feb 9, 2021
4212e1a
fix node metric issue (#502)
ganga1980 Feb 11, 2021
24644ce
Bug fixes for Feb release (#504)
rashmichandrashekar Feb 18, 2021
e56104c
Gangams/feb 2021 agent bug fix (#505)
ganga1980 Feb 23, 2021
e00b2aa
changes for release -ciprod02232021 (#506)
vishiy Feb 23, 2021
31f0e5f
Gangams/e2e test framework (#503)
ganga1980 Feb 23, 2021
91f954f
scrape new kubelet pod count metric name (#508)
gracewehner Feb 25, 2021
4a8ff23
Adding explicit json output to az commands as the script fails if az …
nyuen Mar 20, 2021
512e5c0
Gangams/arc proxy contract and token renewal updates (#511)
ganga1980 Mar 22, 2021
6b48b6a
doc updates for microsoft charts repo release (#512)
ganga1980 Mar 22, 2021
d93c680
Update enable-monitoring.sh (#514)
seenu433 Mar 23, 2021
4d386ce
Prometheus scraping from sidecar and OSM changes (#515)
rashmichandrashekar Mar 25, 2021
16936aa
add liveness timeout for exec (#518)
vishiy Mar 26, 2021
12964be
chart and other updates (#519)
rashmichandrashekar Mar 26, 2021
73548c0
Saaror osmdoc (#523)
saaror Apr 5, 2021
fea4ffa
telemetry bug fix (#527)
rashmichandrashekar Apr 6, 2021
e31cc87
Fix conflicting logrotate settings (#526)
gracewehner Apr 6, 2021
ca8fa12
bug fix (#528)
rashmichandrashekar Apr 6, 2021
1f6f6d2
Gangams/arc ev2 deployment (#522)
ganga1980 Apr 7, 2021
97678b6
added liveness and telemetry for telegraf (#517)
daweim0 Apr 9, 2021
63ea896
Windows metric fix (#530)
daweim0 Apr 13, 2021
42730a4
OSM doc update (#533)
rashmichandrashekar Apr 13, 2021
7ad52cd
Adding MDM metrics for threshold violation (#531)
rashmichandrashekar Apr 14, 2021
34d1f64
Rashmi/april agent 2021 (#538)
rashmichandrashekar Apr 21, 2021
fcc5048
add Read_from_Head config for all fluentbit tail plugins (#539)
gracewehner Apr 21, 2021
01e5529
fix programdata mount issue on containerd win nodes (#542)
ganga1980 Apr 22, 2021
b5d074a
Update sidecar mem limits (#541)
rashmichandrashekar Apr 22, 2021
5feeb3e
David/release 4 22 2021 (#544)
daweim0 Apr 22, 2021
f1c055a
Merge branch 'ci_dev' of github.com:microsoft/Docker-Provider into da…
daweim0 Apr 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .pipelines/build-linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,8 @@ cd $DIR/../build/linux
echo "----------- Build Docker Provider -------------------------------"
make
cd $DIR

echo "------------ Bundle Shell Extension Scripts & HELM chart -------------------------"
cd $DIR/../deployment/arc-k8s-extension/ServiceGroupRoot/Scripts
tar -czvf ../artifacts.tar.gz ../../../../charts/azuremonitor-containers/ pushChartToAcr.sh

7 changes: 6 additions & 1 deletion .pipelines/pipeline.user.linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,15 @@ restore:

build:
commands:
- !!defaultcommand
- !!buildcommand
name: 'Build Docker Provider Shell Bundle'
command: '.pipelines/build-linux.sh'
fail_on_stderr: false
artifacts:
- from: 'deployment'
to: 'build'
include:
- '**'

package:
commands:
Expand Down
38 changes: 32 additions & 6 deletions .pipelines/pull-from-cdpx-and-push-to-ci-acr-linux-image.sh
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,22 @@ echo "end: read appid and appsecret which has read access on cdpx acr"
# suffix 00 primary and 01 secondary, and we only use primary
# This configured via pipeline variable
echo "login to cdpxlinux acr:${CDPX_ACR}"
docker login $CDPX_ACR --username $CDPX_ACR_APP_ID --password $CDPX_ACR_APP_SECRET
echo "login to cdpxlinux acr completed: ${CDPX_ACR}"
echo $CDPX_ACR_APP_SECRET | docker login $CDPX_ACR --username $CDPX_ACR_APP_ID --password-stdin
if [ $? -eq 0 ]; then
echo "login to cdpxlinux acr: ${CDPX_ACR} completed successfully."
else
echo "-e error login to cdpxlinux acr: ${CDPX_ACR} failed.Please see release task logs."
exit 1
fi

echo "pull agent image from cdpxlinux acr: ${CDPX_ACR}"
docker pull ${CDPX_ACR}/official/${CDPX_REPO_NAME}:${CDPX_AGENT_IMAGE_TAG}
echo "pull image from cdpxlinux acr completed: ${CDPX_ACR}"
if [ $? -eq 0 ]; then
echo "pulling of agent image from cdpxlinux acr: ${CDPX_ACR} completed successfully."
else
echo "-e error pulling of agent image from cdpxlinux acr: ${CDPX_ACR} failed.Please see release task logs."
exit 1
fi

echo "CI Release name is:"$CI_RELEASE
imagetag=$CI_RELEASE$CI_IMAGE_TAG_SUFFIX
Expand All @@ -51,13 +61,29 @@ echo "CI AGENT REPOSITORY NAME : ${CI_AGENT_REPO}"

echo "tag linux agent image"
docker tag ${CDPX_ACR}/official/${CDPX_REPO_NAME}:${CDPX_AGENT_IMAGE_TAG} ${CI_ACR}/public/azuremonitor/containerinsights/${CI_AGENT_REPO}:${imagetag}
if [ $? -eq 0 ]; then
echo "tagging of linux agent image completed successfully."
else
echo "-e error tagging of linux agent image failed. Please see release task logs."
exit 1
fi

echo "login ciprod acr":$CI_ACR
docker login $CI_ACR --username $ACR_APP_ID --password $ACR_APP_SECRET
echo "login to ${CI_ACR} acr completed"
echo $ACR_APP_SECRET | docker login $CI_ACR --username $ACR_APP_ID --password-stdin
if [ $? -eq 0 ]; then
echo "login to ciprod acr: ${CI_ACR} completed successfully"
else
echo "-e error login to ciprod acr: ${CI_ACR} failed. Please see release task logs."
exit 1
fi

echo "pushing the image to ciprod acr:${CI_ACR}"
docker push ${CI_ACR}/public/azuremonitor/containerinsights/${CI_AGENT_REPO}:${imagetag}
echo "pushing the image to ciprod acr completed"
if [ $? -eq 0 ]; then
echo "pushing of the image to ciprod acr completed successfully"
else
echo "-e error pushing of image to ciprod acr failed. Please see release task logs."
exit 1
fi

echo "end: pull linux agent image from cdpx and push to ciprod acr"
39 changes: 33 additions & 6 deletions .pipelines/pull-from-cdpx-and-push-to-ci-acr-windows-image.sh
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,22 @@ echo "end: read appid and appsecret which has read access on cdpx acr"
# suffix 00 primary and 01 secondary, and we only use primary
# This configured via pipeline variable
echo "login to cdpxwindows acr:${CDPX_ACR}"
docker login $CDPX_ACR --username $CDPX_ACR_APP_ID --password $CDPX_ACR_APP_SECRET
echo "login to cdpxwindows acr:${CDPX_ACR} completed"
echo $CDPX_ACR_APP_SECRET | docker login $CDPX_ACR --username $CDPX_ACR_APP_ID --password-stdin
if [ $? -eq 0 ]; then
echo "login to cdpxwindows acr: ${CDPX_ACR} completed successfully."
else
echo "-e error login to cdpxwindows acr: ${CDPX_ACR} failed.Please see release task logs."
exit 1
fi

echo "pull image from cdpxwin acr: ${CDPX_ACR}"
docker pull ${CDPX_ACR}/official/${CDPX_REPO_NAME}:${CDPX_AGENT_IMAGE_TAG}
echo "pull image from cdpxwin acr completed: ${CDPX_ACR}"
if [ $? -eq 0 ]; then
echo "pulling of image from cdpxwin acr: ${CDPX_ACR} completed successfully."
else
echo "pulling of image from cdpxwin acr: ${CDPX_ACR} failed. Please see release task logs."
exit 1
fi

echo "CI Release name:"$CI_RELEASE
echo "CI Image Tax suffix:"$CI_IMAGE_TAG_SUFFIX
Expand All @@ -49,13 +59,30 @@ echo "agentimagetag="$imagetag

echo "tag windows agent image"
docker tag ${CDPX_ACR}/official/${CDPX_REPO_NAME}:${CDPX_AGENT_IMAGE_TAG} ${CI_ACR}/public/azuremonitor/containerinsights/${CI_AGENT_REPO}:${imagetag}
if [ $? -eq 0 ]; then
echo "tagging of windows agent image completed successfully."
else
echo "-e error tagging of windows agent image failed. Please see release task logs."
exit 1
fi

echo "login to ${CI_ACR} acr"
docker login $CI_ACR --username $ACR_APP_ID --password $ACR_APP_SECRET
echo "login to ${CI_ACR} acr completed"
echo $ACR_APP_SECRET | docker login $CI_ACR --username $ACR_APP_ID --password-stdin
if [ $? -eq 0 ]; then
echo "login to acr: ${CI_ACR} completed successfully."
else
echo "login to acr: ${CI_ACR} failed. Please see release task logs."
exit 1
fi


echo "pushing the image to ciprod acr"
docker push ${CI_ACR}/public/azuremonitor/containerinsights/${CI_AGENT_REPO}:${imagetag}
echo "pushing the image to ciprod acr completed"
if [ $? -eq 0 ]; then
echo "pushing the image to ciprod acr completed successfully."
else
echo "pushing the image to ciprod acr failed. Please see release task logs"
exit 1
fi

echo "end: pull windows agent image from cdpx and push to ciprod acr"
Binary file added Documentation/OSMPrivatePreview/Image1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions Documentation/OSMPrivatePreview/ReadMe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Note - This is private preview. For any support issues, please reach out to us at [askcoin@microsoft.com](mailto:askcoin@microsoft.com). Please don't open a support ticket.

# Azure Monitor Container Insights Open Service Mesh Monitoring

Azure Monitor container insights now supporting preview of [Open Service Mesh(OSM)](https://docs.microsoft.com/azure/aks/servicemesh-osm-about) Monitoring. As part of this support, customer can:
1. Filter & view inventory of all the services that are part of your service mesh.
2. Visualize and monitor requests between services in your service mesh, with request latency, error rate & resource utilization by services.
3. Provides connection summary for OSM infrastructure running on AKS.

## How to onboard Container Insights OSM monitoring?
OSM exposes Prometheus metrics which Container Insights can collect, for container insights agent to collect OSM metrics follow the following steps.

1. Follow this [link](https://docs.microsoft.com/en-us/azure/aks/servicemesh-osm-about?pivots=client-operating-system-linux#register-the-aks-openservicemesh-preview-feature) as a prereq before enabling the addon.

2. Enable AKS OSM addon on your
- [New AKS cluster](https://docs.microsoft.com/en-us/azure/aks/servicemesh-osm-about?pivots=client-operating-system-linux#install-open-service-mesh-osm-azure-kubernetes-service-aks-add-on-for-a-new-aks-cluster)
- [Existing AKS cluster](https://docs.microsoft.com/en-us/azure/aks/servicemesh-osm-about?pivots=client-operating-system-linux#enable-open-service-mesh-osm-azure-kubernetes-service-aks-add-on-for-an-existing-aks-cluster)
2. Configure OSM to allow Prometheus scraping, follow steps from [here](https://docs.microsoft.com/en-us/azure/aks/servicemesh-osm-about?pivots=client-operating-system-linux#configure-osm-to-allow-prometheus-scraping)
3. To enable namespace(s), download the osm client library [here](https://docs.microsoft.com/en-us/azure/aks/servicemesh-osm-about?pivots=client-operating-system-linux#osm-service-quotas-and-limits-preview) & then enable metrics on namespaces
```bash
# With osm
osm metrics enable --namespace test
osm metrics enable --namespace "test1, test2"

```
3. If you are using Azure Monitor Container Insights follow steps below, if not on-board [here.](https://docs.microsoft.com/azure/azure-monitor/containers/container-insights-overview)
* Download the configmap from [here](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-osmconfig.yaml)
* Add the namespaces you want to monitor in configmap `monitor_namespaces = ["namespace1", "namespace2"]`
* Run the following kubectl command: kubectl apply -f<configmap_yaml_file.yaml>
* Example: `kubectl apply -f container-azm-ms-agentconfig.yaml`
4. The configuration change can take upto 15 mins to finish before taking effect, and all omsagent pods in the cluster will restart. The restart is a rolling restart for all omsagent pods, not all restart at the same time.


## Validate the metrics flow
1. Query cluster's Log Analytics workspace InsightsMetrics table to see metrics are flowing or not
```
InsightsMetrics
| where Name contains "envoy"
| summarize count() by Name
```

## How to consume OSM monitoring dashboard?
1. Access your AKS cluster & Container Insights through this [link.](https://aka.ms/azmon/osmux)
2. Go to reports tab and access Open Service Mesh (OSM) workbook.
3. Select the time-range & namespace to scope your services. By default, we only show services deployed by customers and we exclude internal service communication. In case you want to view that you select Show All in the filter. Please note OSM is managed service mesh, we show all internal connections for transparency.

![alt text](https://github.com/microsoft/Docker-Provider/blob/saarorOSMdoc/Documentation/OSMPrivatePreview/Image1.jpg)
### Requests Tab
1. This tab provides you the summary of all the http requests sent via service to service in OSM.
2. You can view all the services and all the services it is communicating to by selecting the service in grid.
3. You can view total requests, request error rate & P90 latency.
4. You can drill-down to destination and view trends for HTTP error/success code, success rate, Pods resource utilization, latencies at different percentiles.

### Connections Tab
1. This tab provides you a summary of all the connections between your services in Open Service Mesh.
2. Outbound connections: Total number of connections between Source and destination services.
3. Outbound active connections: Last count of active connections between source and destination in selected time range.
4. Outbound failed connections: Total number of failed connections between source and destination service

### Troubleshooting guidance when Outbound active connections is 0 or failed connection count is >10k.
1. Please check your connection policy in OSM configuration.
2. If connection policy is fine, please refer the OSM documentation. https://aka.ms/osm/tsg
3. From this view as well, you can drill-down to destination and view trends for HTTP error/success code, success rate, Pods resource utilization, latencies at different percentiles.


### Known Issues
1. The workbook has scale limits of 50 pods per namespace. If you have more than 50 pods in mesh you can have workbook loading issues.
2. When source or destination is osmcontroller we show no latency & for internal services we show no resource utilization.
3. When both prometheus scraping using pod annotations and OSM monitoring are enabled on the same set of namespaces, the default set of metrics (envoy_cluster_upstream_cx_total, envoy_cluster_upstream_cx_connect_fail, envoy_cluster_upstream_rq, envoy_cluster_upstream_rq_xx, envoy_cluster_upstream_rq_total, envoy_cluster_upstream_rq_time_bucket, envoy_cluster_upstream_cx_rx_bytes_total, envoy_cluster_upstream_cx_tx_bytes_total, envoy_cluster_upstream_cx_active) will be collected twice. You can follow [this](https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration#prometheus-scraping-settings) documentation to exclude these namespaces from pod annotation scraping using the setting monitor_kubernetes_pods_namespaces to work around this issue.

This is private preview, the goal for us is to get feedback. Please feel free to reach out to us at [askcoin@microsoft.com](mailto:askcoin@microsoft.com) for any feedback and questions!
17 changes: 17 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,23 @@ additional questions or comments.

Note : The agent version(s) below has dates (ciprod<mmddyyyy>), which indicate the agent build dates (not release dates)

### 04/22/2021 -
##### Version microsoft/oms:ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04222021 (linux)
##### Version microsoft/oms:win-ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod04222021 (windows)
##### Code change log
- Bug fixes for metrics cpuUsagePercentage and memoryWorkingSetPercentage for windows nodes
- Added metrics for threshold violation
- Made Job completion metric configurable
- Udated default buffer sizes in fluent-bit
- Updated recommended alerts
- Fixed bug where logs written before agent starts up were not collected
- Fixed bug which kept agent logs from being rotated
- Bug fix for Windows Containerd container log collection
- Bug fixes
- Doc updates
- Minor telemetry changes


### 03/26/2021 -
##### Version microsoft/oms:ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03262021 (linux)
##### Version microsoft/oms:win-ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod03262021 (windows)
Expand Down
19 changes: 10 additions & 9 deletions ReleaseProcess.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,21 @@ Image automatically synched to MCR CN from Public cloud MCR.

- Refer to internal docs for the release process and instructions.

## ARO v3

This needs to be co-ordinated with Red hat and ARO-RP team for the release and Red hat team will pick up the changes for the release.

## AKS-Engine

Make PR against [AKS-Engine](https://github.com/Azure/aks-engine). Refer PR https://github.com/Azure/aks-engine/pull/2318

## ARO v4, Azure Arc K8s and OpenShift v4 clusters

Make sure azuremonitor-containers chart yamls updates with all changes going with the release and also make sure to bump the chart version, imagetag and docker provider version etc. Similar to agent container image, build pipeline automatically push the chart to container insights prod acr for canary and prod repos accordingly.
Both the agent and helm chart will be replicated to `mcr.microsoft.com`.
## Arc for Kubernetes

The way, customers will be onboard the monitoring to these clusters using onboarding scripts under `onboarding\managed` directory so please bump chart version for prod release. Once we move to Arc K8s Monitoring extension Public preview, these will be taken care so at that point of time no manual changes like this required.
Ev2 pipeline used to deploy the chart of the Arc K8s Container Insights Extension as per Safe Deployment Process.
Here is the high level process
```
1. Specify chart version of the release candidate and trigger [container-insights-arc-k8s-extension-ci_prod-release](https://github-private.visualstudio.com/microsoft/_release?_a=releases&view=all)
2. Get the approval from one of team member for the release
3. Once the approved, release should be triggered automatically
4. use `cimon-arck8s-eastus2euap` for validating latest release in canary region
5. TBD - Notify vendor team for the validation on all Arc K8s supported platforms
```

## Microsoft Charts Repo release for On-prem K8s

Expand Down
19 changes: 16 additions & 3 deletions build/linux/installer/conf/td-agent-bit-prom-side-car.conf
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,27 @@
Parsers_File /etc/opt/microsoft/docker-cimprov/azm-containers-parser.conf
Log_File /var/opt/microsoft/docker-cimprov/log/fluent-bit.log

[INPUT]
Name tail
Tag oms.container.log.flbplugin.terminationlog.*
Path /dev/write-to-traces
Read_from_Head true
DB /var/opt/microsoft/docker-cimprov/state/terminationlog-ai.db
DB.Sync Off
Parser docker
Mem_Buf_Limit 1m
Path_Key filepath
Skip_Long_Lines On
Ignore_Older 2m

[INPUT]
Name tcp
Tag oms.container.perf.telegraf.*
Listen 0.0.0.0
Port 25229
Chunk_Size 1m
Buffer_Size 1m
Mem_Buf_Limit 20m
Chunk_Size 10m
Buffer_Size 10m
Mem_Buf_Limit 200m

[OUTPUT]
Name oms
Expand Down
13 changes: 13 additions & 0 deletions build/linux/installer/conf/td-agent-bit-rs.conf
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,19 @@
Parsers_File /etc/opt/microsoft/docker-cimprov/azm-containers-parser.conf
Log_File /var/opt/microsoft/docker-cimprov/log/fluent-bit.log

[INPUT]
Name tail
Tag oms.container.log.flbplugin.terminationlog.*
Path /dev/write-to-traces
Read_from_Head true
DB /var/opt/microsoft/docker-cimprov/state/terminationlog-ai.db
DB.Sync Off
Parser docker
Mem_Buf_Limit 1m
Path_Key filepath
Skip_Long_Lines On
Ignore_Older 2m

[INPUT]
Name tcp
Tag oms.container.perf.telegraf.*
Expand Down
16 changes: 16 additions & 0 deletions build/linux/installer/conf/td-agent-bit.conf
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Name tail
Tag oms.container.log.la.*
Path ${AZMON_LOG_TAIL_PATH}
Read_from_Head true
DB /var/log/omsagent-fblogs.db
DB.Sync Off
Parser docker
Expand All @@ -32,6 +33,7 @@
Name tail
Tag oms.container.log.flbplugin.*
Path /var/log/containers/omsagent*.log
Read_from_Head true
DB /var/opt/microsoft/docker-cimprov/state/omsagent-ai.db
DB.Sync Off
Parser docker
Expand All @@ -44,6 +46,7 @@
Name tail
Tag oms.container.log.flbplugin.mdsd.*
Path /var/opt/microsoft/linuxmonagent/log/mdsd.err
Read_from_Head true
DB /var/opt/microsoft/docker-cimprov/state/mdsd-ai.db
DB.Sync Off
Parser docker
Expand All @@ -52,6 +55,19 @@
Skip_Long_Lines On
Ignore_Older 2m

[INPUT]
Name tail
Tag oms.container.log.flbplugin.terminationlog.*
Path /dev/write-to-traces
Read_from_Head true
DB /var/opt/microsoft/docker-cimprov/state/terminationlog-ai.db
DB.Sync Off
Parser docker
Mem_Buf_Limit 1m
Path_Key filepath
Skip_Long_Lines On
Ignore_Older 2m

[INPUT]
Name tcp
Tag oms.container.perf.telegraf.*
Expand Down
9 changes: 9 additions & 0 deletions build/linux/installer/scripts/livenessprobe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,15 @@ then
exit 1
fi

#test to exit non zero value if telegraf is not running
(ps -ef | grep telegraf | grep -v "grep")
if [ $? -ne 0 ]
then
echo "Telegraf is not running" > /dev/termination-log
echo "Telegraf is not running (controller: ${CONTROLLER_TYPE}, container type: ${CONTAINER_TYPE})" > /dev/write-to-traces # this file is tailed and sent to traces
exit 1
fi

if [ -s "inotifyoutput.txt" ]
then
# inotifyoutput file has data(config map was applied)
Expand Down
Loading