AI for Infrastructure
5 min read
DevopsCloud EngSreSysadminCloud Arch
Devops
Use AI for Terraform modules, pipeline YAML, and runbook drafts. Never apply without review.
Sre
Paste alert definitions, runbooks, and incident notes. Ask for improvements and missing scenarios.
AI for Infrastructure
TL;DR
- AI excels at generating Terraform, Kubernetes manifests, and pipeline YAML. It knows the syntax.
- Always review. Wrong Terraform can cost money. Wrong K8s can take down prod.
- Use AI for drafting and explanation. You own validation and application.
Infrastructure-as-code, configs, and runbooks are highly structured. AI can produce them quickly. Your job is to make sure they're correct and safe.
Terraform and IaC
Good use cases:
- "Generate Terraform for an S3 bucket with versioning and lifecycle rules"
- "Convert this AWS config to GCP equivalent"
- "Add a new module for X. Follow our existing pattern in modules/."
Cautions:
- AI may use deprecated resources or wrong provider versions. Check the provider docs.
- State and sensitive data — never paste real state or keys.
- Complex dependencies — AI can miss ordering. Review
depends_onand implicit dependencies.
Workflow: Generate → review → plan (dry run) → apply. Never skip review for prod.
Kubernetes
Good use cases:
- "Generate a Deployment + Service for a stateless app"
- "Add resource limits and probes to this manifest"
- "Convert this Docker Compose to K8s manifests"
Cautions:
- Security contexts, RBAC, and network policies — AI often skips or oversimplifies. Add them.
- Version skew — ensure API versions match your cluster.
- Production hardening — replicas, PDBs, topology spread — AI may not include. You add.
CI/CD Pipelines
Good use cases:
- "Generate a GitHub Actions workflow for test + build + deploy"
- "Convert this Jenkins pipeline to GitLab CI"
- "Add a step to run security scans"
Cautions:
- Secrets — never put real secrets in prompts. Use placeholders.
- Conditional logic and matrix builds — AI can get it wrong. Test in a branch first.
Monitoring and Runbooks
Good use cases:
- "Draft a runbook for 'database connection pool exhausted'"
- "Improve this alert description. Add runbook link and severity"
- "Summarize this incident timeline. Extract action items"
Cautions:
- Runbooks must match your actual systems. AI drafts need customization.
- Alerts — verify thresholds and query correctness. AI doesn't know your baselines.
# AI generated this. Review before apply.
# Check: Provider version? Deprecated resources? depends_on?
# Never paste real state or keys.
resource "aws_s3_bucket" "example" {
bucket = "my-bucket-name"
lifecycle_rule {
enabled = true
expiration { days = 90 }
}
}
# Workflow: Generate → review → terraform plan → apply
# Wrong Terraform can cost money. Wrong K8s can take down prod.Quick Check
AI generates Terraform for a new S3 bucket. What's the critical step before applying to production?
Do This Next
- Generate one Terraform or K8s resource with AI. Review it line by line. Apply only to a dev/test env.
- Have AI draft or improve one runbook for an incident type you've seen. Customize it for your setup.