gpuaudit

Scan your AWS account for GPU waste and get actionable recommendations to cut your cloud spend.

$ gpuaudit scan --profile ml-prod

  GPU Fleet Summary
  Total GPU instances:       14
  Total monthly GPU spend:   $47,832
  Estimated monthly waste:   $18,240  (38%)

  CRITICAL (3 instances, $8,940/mo potential savings)

  i-0a1b2c3d4e  g5.12xlarge (4x A10G)     $4,380/mo   Idle — no activity for 18 days → terminate
  i-9f8e7d6c5b  p4d.24xlarge (8x A100)    $23,652/mo   Idle — <1% CPU for 6 days → terminate
  sagemaker:asr ml.g6.48xlarge (8x L40S)   $9,490/mo   GPU util avg 8% → downsize to ml.g5.xlarge

What it detects

Idle GPU instances — running but doing nothing (low CPU + near-zero network for 24+ hours)
Oversized GPU — multi-GPU instances where utilization suggests a single GPU would suffice
Pricing mismatch — on-demand instances running 30+ days that should be Reserved Instances
Stale instances — non-production instances running 90+ days
SageMaker low utilization — endpoints with <10% GPU utilization
SageMaker oversized — endpoints using <30% GPU memory on multi-GPU instances

Install

go install github.com/gpuaudit/cli/cmd/gpuaudit@latest

Or build from source:

git clone https://github.com/gpuaudit/cli.git
cd gpuaudit
go build -o gpuaudit ./cmd/gpuaudit

Quick start

# Uses default AWS credentials (~/.aws/credentials or environment variables)
gpuaudit scan

# Specific profile and region
gpuaudit scan --profile production --region us-east-1

# JSON output for automation
gpuaudit scan --format json --output report.json

# Markdown for docs/PRs
gpuaudit scan --format markdown

# Slack Block Kit payload (pipe to webhook)
gpuaudit scan --format slack --output - | curl -X POST -H 'Content-Type: application/json' -d @- $SLACK_WEBHOOK

# Skip CloudWatch metrics (faster, less accurate)
gpuaudit scan --skip-metrics

# Skip SageMaker scanning
gpuaudit scan --skip-sagemaker

IAM permissions

gpuaudit is read-only. It never modifies your infrastructure. Generate the minimal IAM policy:

gpuaudit iam-policy

This outputs a JSON policy requiring only Describe*, List*, Get* permissions for EC2, SageMaker, CloudWatch, Cost Explorer, and Pricing APIs.

GPU pricing reference

# List all GPU instance pricing
gpuaudit pricing

# Filter by GPU model
gpuaudit pricing --gpu H100
gpuaudit pricing --gpu A10G
gpuaudit pricing --gpu T4

Output formats

Format	Flag	Use case
Table	`--format table` (default)	Terminal viewing
JSON	`--format json`	Automation, CI/CD pipelines
Markdown	`--format markdown`	PRs, wikis, docs
Slack	`--format slack`	Slack webhook integration

How it works

Discovery — Scans EC2 and SageMaker across multiple regions for GPU instance families (g4dn, g5, g6, g6e, p4d, p4de, p5, inf2, trn1)
Metrics — Collects 7-day CloudWatch metrics: CPU, network I/O for EC2; GPU utilization, GPU memory, invocations for SageMaker
Analysis — Applies 6 waste detection rules with severity levels (critical/warning)
Recommendations — Generates specific actions (terminate, downsize, switch pricing) with estimated monthly savings

Regions scanned by default: us-east-1, us-east-2, us-west-2, eu-west-1, eu-west-2, eu-central-1, ap-southeast-1, ap-northeast-1, ap-south-1.

Project structure

gpuaudit/
├── cmd/gpuaudit/          CLI entry point (cobra)
├── internal/
│   ├── models/            Core data types (GPUInstance, WasteSignal, Recommendation)
│   ├── pricing/           Bundled GPU pricing database (40+ instance types)
│   ├── analysis/          Waste detection rules engine
│   ├── output/            Formatters (table, JSON, markdown, Slack)
│   └── providers/aws/     EC2, SageMaker, CloudWatch, scanner orchestrator
└── LICENSE                Apache 2.0

Roadmap

AWS Cost Explorer integration (actual vs projected spend)
EKS GPU pod discovery
SageMaker training job analysis
Multi-account (AWS Organizations) scanning
GCP + Azure support
GitHub Action for scheduled scans
Historical scan comparison (gpuaudit diff)

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
cmd/gpuaudit		cmd/gpuaudit
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpuaudit

What it detects

Install

Quick start

IAM permissions

GPU pricing reference

Output formats

How it works

Project structure

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpuaudit

What it detects

Install

Quick start

IAM permissions

GPU pricing reference

Output formats

How it works

Project structure

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages