This project sets up a complete observability stack using Terraform, Prometheus, and Grafana to monitor both EC2 and EKS resources. It includes alerting rules for key metrics like CPU, memory, network usage, and application availability. Additional tools like Blackbox Exporter and Postman collections are included for active endpoint testing and validation.
⚠️ Disclaimer:
This project is intended as a demonstration of observability concepts and SRE practices.
It is not hardened for production use and intentionally omits concerns such as authentication, encryption, network security, and cost optimization to focus on the core monitoring stack and alerting pipeline.
.
├── document/
│ └── SRE Project Template – Observing Cloud Resources.pdf
├── screenshots/
│ └── *.png (dashboard + alerts)
├── starter/
│ ├── terraform/ # Infra code (modularized)
│ │ ├── main.tf
│ │ └── modules/{vpc,ec2,eks}
│ ├── prometheus-additional.yaml
│ ├── blackbox-values.yaml
│ ├── debug-pod.yaml
│ ├── values.yaml # Helm override values
│ ├── SRE-Project.postman_environment.json
│ └── SRE-Project-postman-collection.json
└── README.md
-
Initialize Terraform:
cd starter/terraform terraform init -
Deploy the infrastructure:
terraform apply
This will create VPC, EC2 instances, and an EKS cluster with IAM and networking preconfigured.
-
Configure Prometheus
- Edit
prometheus-additional.yamlto define scrape targets and alert rules.
- Edit
-
Install using Helm:
helm upgrade --install prometheus prometheus-community/prometheus \ -f starter/values.yaml
-
Add Blackbox Exporter (for HTTP endpoint checks):
helm upgrade --install blackbox-exporter prometheus-community/prometheus-blackbox-exporter \ -f starter/blackbox-values.yaml
- Alerts are defined in
prometheus-additional.yaml - Dashboards are accessed via:
- Grafana on EC2:
http://<ec2-ip>:3000 - Grafana on EKS: use LoadBalancer IP from
kubectl get svc
- Grafana on EC2:
Run included Postman collections to simulate endpoint monitoring and observe alerts.
- Open
starter/SRE-Project-postman-collection.jsonin Postman - Use
starter/SRE-Project.postman_environment.jsonfor test environment setup
- AWS EC2, EKS
- Terraform – Infrastructure provisioning
- Prometheus & Alertmanager – Metrics + alerting
- Grafana – Dashboards
- Blackbox Exporter – Endpoint probes
- Postman – Manual test collections
Visuals include:
- Node Exporter setup
- Prometheus and Grafana dashboards
- Memory, CPU, and network alerts
- Blackbox probe results
(See /screenshots folder)
Made with ❤️ and 🧉 by Cristian Cevasco