Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.coderdSelector | string | `"pod=~`coder.*`, pod!~`.*provisioner.*`"` | series selector for Prometheus/Loki to locate provisioner pods. ensure this uses backticks for quotes! |
| global.coder.controlPlaneNamespace | string | `"coder"` | the namespace into which the control plane has been deployed. |
| global.coder.externalProvisionersNamespace | string | `"coder"` | the namespace into which any external provisioners have been deployed. |
Expand Down
7 changes: 7 additions & 0 deletions coder-observability/runbooks/coderd.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,10 @@ Terraform plugin.
Your Enterprise license is approaching or has exceeded the number of seats purchased.

Please contact your Coder sales contact, or visit https://coder.com/contact/sales.

## CoderdIneligiblePrebuilds

Prebuilds only become eligible to be claimed by users once the workspace's agent is a) running and b) all of its startup
scripts have completed.

If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.
23 changes: 22 additions & 1 deletion coder-observability/templates/configmap-prometheus-alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: metrics-alerts
namespace: {{ .Release.Namespace }}
data:
{{- $service := dict "service" "coder" -}}
{{- $service := dict "service" "coderd" -}}

{{- with .Values.global.coder.alerts.coderd }} {{/* start-section */}}
coderd.yaml: |-
Expand Down Expand Up @@ -104,6 +104,27 @@ data:
{{- end }}
{{- end }}

{{- with .groups.IneligiblePrebuilds }}
{{- $group := . }}
{{- if .enabled }}
- name: Coderd Ineligible Prebuilds
rules:
{{ $alert := "CoderdIneligiblePrebuilds" }}
{{- range $severity, $threshold := .thresholds }}
- alert: {{ $alert }}
expr: max by (template_name, preset_name) (coderd_prebuilds_running - coderd_prebuilds_eligible) > 0
for: {{ $group.delay }}
annotations:
summary: >
{{ `{{ $value }}` }} prebuilt workspace(s) are currently ineligible for claiming for the "{{ `{{ $labels.template_name }}` }}" template and "{{ `{{ $labels.preset_name }}` }}" preset.
This usually indicates that the agent has not started correctly, or is still running its startup scripts after an extended period of time.
labels:
severity: {{ $severity }}
runbook_url: {{ template "runbook-url" (deepCopy $ | merge (dict "alert" $alert) $service) }}
{{- end }}
{{- end }}
{{- end }}

{{- end }} {{/* end-section */}}


Expand Down
5 changes: 5 additions & 0 deletions coder-observability/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ global:
notify: 2
warning: 5
critical: 10
IneligiblePrebuilds:
enabled: true
delay: 10m
thresholds:
notify: 1
provisionerd:
groups:
Replicas:
Expand Down
Loading