Introduction
In this final part, we'll transform our Terraform code into production-ready, maintainable infrastructure. We'll cover module design, environment management, CI/CD integration, monitoring strategies, and operational best practices used in enterprise environments.
Terraform Module Architecture
Modules are the building blocks of reusable Terraform code. Let's restructure our infrastructure using a modular approach.
Project Structure
terraform-aws-infrastructure/
├── modules/
│ ├── network/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── security/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── database/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── monitoring/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── outputs.tf
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── outputs.tf
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── outputs.tf
├── scripts/
│ ├── deploy.sh
│ ├── validate.sh
│ └── destroy.sh
└── README.md
Network Module
modules/network/main.tf
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-vpc"
})
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-igw"
})
}
# Availability Zones
data "aws_availability_zones" "available" {
state = "available"
}
# Public Subnets
resource "aws_subnet" "public" {
count = var.az_count
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 1)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-public-${count.index + 1}"
Type = "public"
})
}
# Private Subnets
resource "aws_subnet" "private" {
count = var.az_count
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-private-${count.index + 1}"
Type = "private"
})
}
# Database Subnets
resource "aws_subnet" "database" {
count = var.az_count
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-database-${count.index + 1}"
Type = "database"
})
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? var.az_count : 0
domain = "vpc"
depends_on = [aws_internet_gateway.main]
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-nat-eip-${count.index + 1}"
})
}
# NAT Gateways
resource "aws_nat_gateway" "main" {
count = var.enable_nat_gateway ? var.az_count : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-nat-${count.index + 1}"
})
depends_on = [aws_internet_gateway.main]
}
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-public-rt"
})
}
# Public Route Table Association
resource "aws_route_table_association" "public" {
count = var.az_count
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Private Route Tables
resource "aws_route_table" "private" {
count = var.enable_nat_gateway ? var.az_count : 1
vpc_id = aws_vpc.main.id
dynamic "route" {
for_each = var.enable_nat_gateway ? [1] : []
content {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
}
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-private-rt-${count.index + 1}"
})
}
# Private Route Table Association
resource "aws_route_table_association" "private" {
count = var.az_count
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[var.enable_nat_gateway ? count.index : 0].id
}
# Database Route Table Association
resource "aws_route_table_association" "database" {
count = var.az_count
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.private[var.enable_nat_gateway ? count.index : 0].id
}
# VPC Flow Logs
resource "aws_flow_log" "vpc_flow_log" {
count = var.enable_vpc_flow_logs ? 1 : 0
iam_role_arn = aws_iam_role.flow_log[0].arn
log_destination = aws_cloudwatch_log_group.vpc_flow_log[0].arn
traffic_type = "ALL"
vpc_id = aws_vpc.main.id
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-vpc-flow-log"
})
}
# CloudWatch Log Group for VPC Flow Logs
resource "aws_cloudwatch_log_group" "vpc_flow_log" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "/aws/vpc/flowlogs/${var.name_prefix}"
retention_in_days = var.flow_log_retention_days
tags = var.common_tags
}
# IAM Role for VPC Flow Logs
resource "aws_iam_role" "flow_log" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "${var.name_prefix}-vpc-flow-log-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
}
]
})
tags = var.common_tags
}
# IAM Policy for VPC Flow Logs
resource "aws_iam_role_policy" "flow_log" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "${var.name_prefix}-vpc-flow-log-policy"
role = aws_iam_role.flow_log[0].id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
Effect = "Allow"
Resource = "*"
}
]
})
}
modules/network/variables.tf
variable "name_prefix" {
description = "Prefix for resource names"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "az_count" {
description = "Number of Availability Zones"
type = number
default = 2
validation {
condition = var.az_count >= 2 && var.az_count <= 4
error_message = "AZ count must be between 2 and 4."
}
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "enable_vpc_flow_logs" {
description = "Enable VPC Flow Logs"
type = bool
default = true
}
variable "flow_log_retention_days" {
description = "VPC Flow Log retention period"
type = number
default = 30
}
variable "common_tags" {
description = "Common tags to apply to all resources"
type = map(string)
default = {}
}
modules/network/outputs.tf
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "vpc_cidr_block" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "database_subnet_ids" {
description = "IDs of the database subnets"
value = aws_subnet.database[*].id
}
output "internet_gateway_id" {
description = "ID of the Internet Gateway"
value = aws_internet_gateway.main.id
}
output "nat_gateway_ids" {
description = "IDs of the NAT Gateways"
value = aws_nat_gateway.main[*].id
}
Environment Configuration
environments/prod/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
random = {
source = "hashicorp/random"
version = "~> 3.1"
}
}
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
CreatedDate = timestamp()
}
}
}
# Data sources
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# Local values for consistent naming
locals {
name_prefix = "${var.project_name}-${var.environment}"
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
# Network Module
module "network" {
source = "../../modules/network"
name_prefix = local.name_prefix
vpc_cidr = var.vpc_cidr
az_count = var.az_count
enable_nat_gateway = var.enable_nat_gateway
enable_vpc_flow_logs = var.enable_vpc_flow_logs
flow_log_retention_days = var.flow_log_retention_days
common_tags = local.common_tags
}
# Security Module
module "security" {
source = "../../modules/security"
name_prefix = local.name_prefix
vpc_id = module.network.vpc_id
vpc_cidr = module.network.vpc_cidr_block
common_tags = local.common_tags
# Security configuration
allowed_cidr_blocks = var.allowed_cidr_blocks
enable_waf = var.enable_waf
}
# Database Module
module "database" {
source = "../../modules/database"
name_prefix = local.name_prefix
vpc_id = module.network.vpc_id
subnet_ids = module.network.database_subnet_ids
security_group_ids = [module.security.database_sg_id]
# Database configuration
engine = var.db_engine
engine_version = var.db_engine_version
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
# Credentials
master_username = var.db_master_username
master_password = var.db_master_password
database_name = var.db_name
# Backup and maintenance
backup_retention_period = var.db_backup_retention_period
backup_window = var.db_backup_window
maintenance_window = var.db_maintenance_window
# Monitoring and performance
monitoring_interval = var.db_monitoring_interval
performance_insights_enabled = var.db_performance_insights_enabled
# Security
deletion_protection = var.db_deletion_protection
storage_encrypted = var.db_storage_encrypted
common_tags = local.common_tags
}
# Compute Module
module "compute" {
source = "../../modules/compute"
name_prefix = local.name_prefix
vpc_id = module.network.vpc_id
# Subnets
public_subnet_ids = module.network.public_subnet_ids
private_subnet_ids = module.network.private_subnet_ids
# Security Groups
alb_security_group_id = module.security.alb_sg_id
ecs_security_group_id = module.security.ecs_sg_id
# Load Balancer configuration
enable_https = var.enable_https
ssl_certificate_arn = var.ssl_certificate_arn
domain_name = var.domain_name
# ECS configuration
ecs_cluster_name = "${local.name_prefix}-cluster"
ecs_service_desired_count = var.ecs_service_desired_count
ecs_task_cpu = var.ecs_task_cpu
ecs_task_memory = var.ecs_task_memory
# Auto Scaling
enable_auto_scaling = var.enable_auto_scaling
auto_scaling_min_capacity = var.auto_scaling_min_capacity
auto_scaling_max_capacity = var.auto_scaling_max_capacity
common_tags = local.common_tags
}
# Monitoring Module
module "monitoring" {
source = "../../modules/monitoring"
name_prefix = local.name_prefix
environment = var.environment
# Resources to monitor
alb_arn = module.compute.alb_arn
ecs_cluster_name = module.compute.ecs_cluster_name
ecs_service_name = module.compute.ecs_service_name
rds_instance_id = module.database.db_instance_id
# Notification
sns_topic_arn = var.sns_alert_topic_arn
# Thresholds
cpu_threshold_high = var.cpu_threshold_high
memory_threshold_high = var.memory_threshold_high
common_tags = local.common_tags
}
CI/CD Integration
GitHub Actions Workflow
Create .github/workflows/terraform.yml:
name: Terraform
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
env:
AWS_REGION: us-east-1
TF_VERSION: 1.5.0
jobs:
validate:
name: Validate
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init -backend=false
- name: Terraform Validate
run: terraform validate
security-scan:
name: Security Scan
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Run Checkov
uses: bridgecrewio/checkov-action@master
with:
directory: .
framework: terraform
output_format: sarif
output_file_path: reports/results.sarif
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: reports/results.sarif
plan-dev:
name: Plan Dev
runs-on: ubuntu-latest
needs: [validate, security-scan]
if: github.event_name == 'pull_request'
defaults:
run:
working-directory: ./environments/dev
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -var-file="terraform.tfvars" -out=tfplan
- name: Save Plan
uses: actions/upload-artifact@v3
with:
name: dev-tfplan
path: ./environments/dev/tfplan
deploy-dev:
name: Deploy Dev
runs-on: ubuntu-latest
needs: plan-dev
if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
environment: development
defaults:
run:
working-directory: ./environments/dev
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Init
run: terraform init
- name: Download Plan
uses: actions/download-artifact@v3
with:
name: dev-tfplan
path: ./environments/dev/
- name: Terraform Apply
run: terraform apply tfplan
plan-prod:
name: Plan Prod
runs-on: ubuntu-latest
needs: [validate, security-scan]
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
defaults:
run:
working-directory: ./environments/prod
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID_PROD }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY_PROD }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -var-file="terraform.tfvars" -out=tfplan
- name: Save Plan
uses: actions/upload-artifact@v3
with:
name: prod-tfplan
path: ./environments/prod/tfplan
deploy-prod:
name: Deploy Prod
runs-on: ubuntu-latest
needs: plan-prod
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production
defaults:
run:
working-directory: ./environments/prod
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID_PROD }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY_PROD }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Init
run: terraform init
- name: Download Plan
uses: actions/download-artifact@v3
with:
name: prod-tfplan
path: ./environments/prod/
- name: Terraform Apply
run: terraform apply tfplan
- name: Notify Deployment
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
channel: '#deployments'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
if: always()
State Management and Remote Backend
S3 Backend with DynamoDB Locking
# backend-setup/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
tags = {
Name = "Terraform State Bucket"
Environment = "shared"
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_encryption" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# DynamoDB table for state locking
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock Table"
Environment = "shared"
}
}
Advanced Monitoring and Observability
CloudWatch Dashboard Module
# modules/monitoring/main.tf
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = "${var.name_prefix}-dashboard"
dashboard_body = jsonencode({
widgets = [
# Load Balancer Metrics
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.alb_arn_suffix],
[".", "TargetResponseTime", ".", "."],
[".", "HTTPCode_Target_2XX_Count", ".", "."],
[".", "HTTPCode_Target_4XX_Count", ".", "."],
[".", "HTTPCode_Target_5XX_Count", ".", "."]
]
view = "timeSeries"
stacked = false
region = data.aws_region.current.name
title = "Load Balancer Metrics"
period = 300
stat = "Sum"
}
},
# ECS Service Metrics
{
type = "metric"
x = 12
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/ECS", "CPUUtilization", "ServiceName", var.ecs_service_name, "ClusterName", var.ecs_cluster_name],
[".", "MemoryUtilization", ".", ".", ".", "."]
]
view = "timeSeries"
region = data.aws_region.current.name
title = "ECS Service Metrics"
period = 300
}
},
# Database Metrics
{
type = "metric"
x = 0
y = 6
width = 12
height = 6
properties = {
metrics = [
["AWS/RDS", "CPUUtilization", "DBInstanceIdentifier", var.rds_instance_id],
[".", "DatabaseConnections", ".", "."],
[".", "ReadLatency", ".", "."],
[".", "WriteLatency", ".", "."]
]
view = "timeSeries"
region = data.aws_region.current.name
title = "Database Metrics"
period = 300
}
},
# Error Rate
{
type = "metric"
x = 12
y = 6
width = 12
height = 6
properties = {
metrics = [
["AWS/ApplicationELB", "HTTPCode_Target_5XX_Count", "LoadBalancer", var.alb_arn_suffix]
]
view = "timeSeries"
region = data.aws_region.current.name
title = "Error Rate"
period = 300
stat = "Sum"
}
}
]
})
}
# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.name_prefix}-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = "300"
statistic = "Average"
threshold = var.cpu_threshold_high
alarm_description = "This metric monitors ECS service CPU utilization"
alarm_actions = [var.sns_topic_arn]
dimensions = {
ServiceName = var.ecs_service_name
ClusterName = var.ecs_cluster_name
}
tags = var.common_tags
}
resource "aws_cloudwatch_metric_alarm" "high_memory" {
alarm_name = "${var.name_prefix}-high-memory"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "MemoryUtilization"
namespace = "AWS/ECS"
period = "300"
statistic = "Average"
threshold = var.memory_threshold_high
alarm_description = "This metric monitors ECS service memory utilization"
alarm_actions = [var.sns_topic_arn]
dimensions = {
ServiceName = var.ecs_service_name
ClusterName = var.ecs_cluster_name
}
tags = var.common_tags
}
# Application Insights
resource "aws_applicationinsights_application" "main" {
resource_group_name = aws_resourcegroups_group.main.name
auto_config_enabled = true
tags = var.common_tags
}
resource "aws_resourcegroups_group" "main" {
name = "${var.name_prefix}-resources"
resource_query {
query = jsonencode({
ResourceTypeFilters = ["AWS::AllSupported"]
TagFilters = [
{
Key = "Project"
Values = [var.name_prefix]
}
]
})
}
tags = var.common_tags
}
Cost Optimization Strategies
Cost Monitoring and Budgets
# Cost monitoring
resource "aws_budgets_budget" "monthly" {
name = "${var.name_prefix}-monthly-budget"
budget_type = "COST"
limit_amount = var.monthly_budget_limit
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters {
tag {
key = "Project"
values = [var.project_name]
}
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Cost anomaly detection
resource "aws_ce_anomaly_detector" "main" {
name = "${var.name_prefix}-cost-anomaly-detector"
monitor_type = "DIMENSIONAL"
specification = jsonencode({
Dimension = "SERVICE"
MatchOptions = ["EQUALS"]
Values = ["Amazon Elastic Compute Cloud - Compute", "Amazon Relational Database Service"]
})
tags = var.common_tags
}
resource "aws_ce_anomaly_subscription" "main" {
name = "${var.name_prefix}-cost-anomaly-subscription"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_detector.main.arn
]
subscriber {
type = "EMAIL"
address = var.cost_anomaly_email
}
threshold_expression {
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["100"]
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
tags = var.common_tags
}
Security Best Practices
Security Module with WAF
# modules/security/main.tf
# WAF Web ACL
resource "aws_wafv2_web_acl" "main" {
count = var.enable_waf ? 1 : 0
name = "${var.name_prefix}-web-acl"
scope = "REGIONAL"
default_action {
allow {}
}
# Rate limiting rule
rule {
name = "rate-limit-rule"
priority = 1
override_action {
none {}
}
statement {
rate_based_statement {
limit = 2000
aggregate_key_type = "IP"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "RateLimitRule"
sampled_requests_enabled = true
}
action {
block {}
}
}
# AWS Managed Rules
rule {
name = "aws-managed-common-rule-set"
priority = 2
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "CommonRuleSetMetric"
sampled_requests_enabled = true
}
}
# SQL injection protection
rule {
name = "aws-managed-sql-injection-rule-set"
priority = 3
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesSQLiRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "SQLiRuleSetMetric"
sampled_requests_enabled = true
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "${var.name_prefix}-waf-metric"
sampled_requests_enabled = true
}
tags = var.common_tags
}
# Security Groups
resource "aws_security_group" "alb" {
name = "${var.name_prefix}-alb-sg"
description = "Security group for ALB"
vpc_id = var.vpc_id
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = var.allowed_cidr_blocks
}
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = var.allowed_cidr_blocks
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-alb-sg"
})
}
resource "aws_security_group" "ecs" {
name = "${var.name_prefix}-ecs-sg"
description = "Security group for ECS tasks"
vpc_id = var.vpc_id
ingress {
description = "From ALB"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-ecs-sg"
})
}
resource "aws_security_group" "database" {
name = "${var.name_prefix}-db-sg"
description = "Security group for database"
vpc_id = var.vpc_id
ingress {
description = "From ECS tasks"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.ecs.id]
}
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-db-sg"
})
}
Disaster Recovery and Backup
RDS Automated Backups and Cross-Region Replication
# modules/database/main.tf (additional configuration)
# Automated backups
resource "aws_db_instance" "main" {
# ... existing configuration ...
backup_retention_period = var.backup_retention_period
backup_window = var.backup_window
copy_tags_to_snapshot = true
delete_automated_backups = false
# Point-in-time recovery
enabled_cloudwatch_logs_exports = ["postgresql"]
# Cross-region backup (if enabled)
replicate_source_db = var.create_read_replica ? null : var.source_db_identifier
}
# Read replica in different region (for disaster recovery)
resource "aws_db_instance" "read_replica" {
count = var.create_read_replica ? 1 : 0
identifier = "${var.name_prefix}-read-replica"
# Point to main instance
replicate_source_db = aws_db_instance.main.identifier
# Different AZ for high availability
availability_zone = var.replica_availability_zone
# Can be smaller instance for cost optimization
instance_class = var.replica_instance_class
# Monitoring
monitoring_interval = var.monitoring_interval
monitoring_role_arn = aws_iam_role.enhanced_monitoring.arn
# Security
publicly_accessible = false
# Backup (inherited from source)
skip_final_snapshot = false
final_snapshot_identifier = "${var.name_prefix}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
tags = merge(var.common_tags, {
Name = "${var.name_prefix}-read-replica"
Type = "read-replica"
})
}
Testing and Validation
Terraform Test Configuration
Create tests/integration_test.go:
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/stretchr/testify/assert"
)
func TestTerraformInfrastructure(t *testing.T) {
t.Parallel()
// Configure Terraform options
terraformOptions := &terraform.Options{
TerraformDir: "../environments/dev",
Vars: map[string]interface{}{
"environment": "test",
"project_name": "terratest",
},
}
// Clean up resources with "defer"
defer terraform.Destroy(t, terraformOptions)
// Deploy the infrastructure
terraform.InitAndApply(t, terraformOptions)
// Validate outputs
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcID)
// Test AWS resources
aws.GetVpcById(t, vpcID, "us-east-1")
// Test ALB is accessible
albDNS := terraform.Output(t, terraformOptions, "alb_dns_name")
assert.NotEmpty(t, albDNS)
}
Documentation and Maintenance
README Template
# Infrastructure as Code with Terraform
## Overview
This repository contains Terraform configurations for provisioning and managing AWS infrastructure across multiple environments.
## Architecture
- **Network**: VPC with public/private subnets across multiple AZs
- **Compute**: ECS Fargate with Application Load Balancer
- **Database**: RDS PostgreSQL with read replicas
- **Storage**: S3 with CloudFront CDN
- **Monitoring**: CloudWatch dashboards and alarms
- **Security**: WAF, security groups, and IAM roles
## Prerequisites
- Terraform >= 1.0
- AWS CLI configured
- S3 bucket for state storage
- DynamoDB table for state locking
## Usage
### Initialize Backend
```bash
cd backend-setup
terraform init
terraform apply
Deploy Environment
cd environments/dev
terraform init
terraform plan -var-file="terraform.tfvars"
terraform apply
Destroy Environment
terraform destroy -var-file="terraform.tfvars"
Module Structure
modules/network/: VPC, subnets, routingmodules/security/: Security groups, WAF, IAMmodules/compute/: ALB, ECS, Auto Scalingmodules/database/: RDS, backups, monitoringmodules/monitoring/: CloudWatch, alarms, dashboards
Contributing
- Create feature branch
- Make changes
- Run
terraform fmtandterraform validate - Submit pull request
- Automated tests will run
- Deploy after approval
Cost Management
- Monthly budget alerts configured
- Cost anomaly detection enabled
- Resource tagging for cost allocation
- Regular resource cleanup scheduled
## Key Takeaways
You now have enterprise-ready Terraform infrastructure with:
- ✅ **Modular Design**: Reusable modules for different components
- ✅ **Multi-Environment**: Separate configurations for dev/staging/prod
- ✅ **CI/CD Integration**: Automated testing and deployment
- ✅ **State Management**: Remote state with locking
- ✅ **Security**: WAF, security groups, encryption
- ✅ **Monitoring**: Comprehensive observability
- ✅ **Cost Control**: Budgets and anomaly detection
- ✅ **Disaster Recovery**: Backups and cross-region replication
- ✅ **Documentation**: Clear structure and maintenance guides
## Production Checklist
Before deploying to production:
- [ ] Enable MFA for AWS accounts
- [ ] Set up least-privilege IAM policies
- [ ] Configure backup and disaster recovery
- [ ] Enable logging and monitoring
- [ ] Set up alerting and notifications
- [ ] Document runbooks and procedures
- [ ] Train team on infrastructure management
- [ ] Establish change management process
---
**Final Thoughts:**
Building infrastructure with Terraform is a journey. Start simple, iterate frequently, and always prioritize security and maintainability. The patterns shown in this series are battle-tested in production environments and will serve as a solid foundation for your cloud infrastructure.
Remember: Infrastructure as Code is not just about automation—it's about creating reliable, scalable, and maintainable systems that enable your business to grow.

