Amazon ECS Cluster
Overview
This service contains Terraform code to deploy a production-grade ECS cluster on AWS using Elastic Container Service (ECS).
This service launches an ECS cluster on top of an Auto Scaling Group that you manage. If you wish to launch an ECS cluster on top of Fargate that is completely managed by AWS, refer to the ecs-fargate-cluster module. Refer to the section EC2 vs Fargate Launch Types for more information on the differences between the two flavors.
ECS architecture
Features
This Terraform Module launches an EC2 Container Service Cluster that you can use to run Docker containers. The cluster consists of a configurable number of instances in an Auto Scaling Group (ASG). Each instance:
Runs the ECS Container Agent so it can communicate with the ECS scheduler.
Authenticates with a Docker repo so it can download private images. The Docker repo auth details should be encrypted using Amazon Key Management Service (KMS) and passed in as input variables. The instances, when booting up, will use gruntkms to decrypt the data in-memory. Note that the IAM role for these instances, which uses
var.cluster_name
as its name, must be granted access to the Customer Master Key (CMK) used to encrypt the data.Runs the CloudWatch Logs Agent to send all logs in syslog to CloudWatch Logs. This is configured using the cloudwatch-agent.
Emits custom metrics that are not available by default in CloudWatch, including memory and disk usage. This is configured using the cloudwatch-agent module.
Runs the syslog module to automatically rotate and rate limit syslog so that your instances don’t run out of disk space from large volumes.
Runs the ssh-grunt module so that developers can upload their public SSH keys to IAM and use those SSH keys, along with their IAM user names, to SSH to the ECS Nodes.
Runs the auto-update module so that the ECS nodes install security updates automatically.
Learn
note
This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!
Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-ecs repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.
Core concepts
To understand core concepts like what is ECS, and the different cluster types, see the documentation in the terraform-aws-ecs repo.
To use ECS, you first deploy one or more EC2 Instances into a "cluster". The ECS scheduler can then deploy Docker containers across any of the instances in this cluster. Each instance needs to have the Amazon ECS Agent installed so it can communicate with ECS and register itself as part of the right cluster.
For more info on ECS clusters, including how to run Docker containers in a cluster, how to add additional security group rules, how to handle IAM policies, and more, check out the ecs-cluster documentation in the terraform-aws-ecs repo.
For info on finding your Docker container logs and custom metrics in CloudWatch, check out the cloudwatch-agent documentation.
Repo organization
- modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
- examples: This folder contains working examples of how to use the submodules.
- test: Automated tests for the modules and examples.
Deploy
Non-production deployment (quick start for learning)
If you just want to try this repo out for experimenting and learning, check out the following resources:
- examples/for-learning-and-testing folder: The
examples/for-learning-and-testing
folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).
Production deployment
If you want to deploy this repo in production, check out the following resources:
- examples/for-production folder: The
examples/for-production
folder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.
Manage
For information on how to configure cluster autoscaling, see How do you configure cluster autoscaling?
For information on how to manage your ECS cluster, see the documentation in the terraform-aws-ecs repo.
Reference
- Inputs
- Outputs
Required
cluster_instance_ami
stringThe AMI to run on each instance in the ECS cluster. You can build the AMI using the Packer template ecs-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required.
cluster_instance_ami_filters
object(…)Properties on the AMI that can be used to lookup a prebuilt AMI for use with ECS workers. You can build the AMI using the Packer template ecs-node-al2.json. Only used if cluster_instance_ami
is null. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. Set to null if cluster_instance_ami is set.
object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)
# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
cluster_instance_type
stringThe type of instances to run in the ECS cluster (e.g. t2.medium)
cluster_max_size
numberThe maxiumum number of instances to run in the ECS cluster
cluster_min_size
numberThe minimum number of instances to run in the ECS cluster
cluster_name
stringThe name of the ECS cluster
vpc_id
stringThe ID of the VPC in which the ECS cluster should be launched
vpc_subnet_ids
list(string)The IDs of the subnets in which to deploy the ECS cluster instances
Optional
alarms_sns_topic_arn
list(string)The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications
[]
allow_ssh_from_cidr_blocks
list(string)The IP address ranges in CIDR format from which to allow incoming SSH requests to the ECS instances.
[]
allow_ssh_from_security_group_ids
list(string)The IDs of security groups from which to allow incoming SSH requests to the ECS instances.
[]
Protect EC2 instances running ECS tasks from being terminated due to scale in (spot instances do not support lifecycle modifications). Note that the behavior of termination protection differs between clusters with capacity providers and clusters without. When capacity providers is turned on and this flag is true, only instances that have 0 ECS tasks running will be scaled in, regardless of capacity_provider_target. If capacity providers is turned off and this flag is true, this will prevent ANY instances from being scaled in.
false
Enable a capacity provider to autoscale the EC2 ASG created for this ECS cluster.
false
Maximum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.
null
Minimum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.
null
capacity_provider_target
numberTarget cluster utilization for the ASG capacity provider; a number from 1 to 100. This number influences when scale out happens, and when instances should be scaled in. For example, a setting of 90 means that new instances will be provisioned when all instances are at 90% utilization, while instances that are only 10% utilized (CPU and Memory usage from tasks = 10%) will be scaled in.
null
cloud_init_parts
map(object(…))Cloud init scripts to run on the ECS cluster instances during boot. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax
map(object({
filename = string
content_type = string
content = string
}))
{}
The ID (ARN, alias ARN, AWS ID) of a customer managed KMS Key to use for encrypting log data.
null
The name of the log group to create in CloudWatch. Defaults to <a href="#cluster_name"><code>cluster_name</code></a>-logs
.
""
The number of days to retain log events in the log group. Refer to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group#retention_in_days for all the valid values. When null, the log events are retained forever.
null
cloudwatch_log_group_tags
map(string)Tags to apply on the CloudWatch Log Group, encoded as a map where the keys are tag keys and values are tag values.
null
cluster_access_from_sgs
list(any)Specify a list of Security Groups that will have access to the ECS cluster. Only used if enable_cluster_access_ports
is set to true
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]
Whether to associate a public IP address with an instance in a VPC
false
The name of the Key Pair that can be used to SSH to each instance in the ECS cluster
null
A list of custom tags to apply to the EC2 Instances in this ASG. Each item in this list should be a map with the parameters key, value, and propagate_at_launch.
[]
custom_tags_ecs_cluster
map(string)Custom tags to apply to the ECS cluster
{}
custom_tags_security_group
map(string)A map of custom tags to apply to the Security Group for this ECS Cluster. The key is the tag name and the value is the tag value.
{}
default_user
stringThe default OS user for the ECS worker AMI. For AWS Amazon Linux AMIs, which is what the Packer template in ecs-node-al2.json uses, the default OS user is 'ec2-user'.
"ec2-user"
disallowed_availability_zones
list(string)A list of availability zones in the region that should be skipped when deploying ECS. You can use this to avoid availability zones that may not be able to provision the resources (e.g instance type does not exist). If empty, allows all availability zones.
[]
Set to true to enable Cloudwatch log aggregation for the ECS cluster
true
Set to true to enable Cloudwatch metrics collection for the ECS cluster
true
enable_cluster_access_ports
list(any)Specify a list of ECS Cluster ports which should be accessible from the security groups given in cluster_access_from_sgs
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]
Set to true to enable several basic Cloudwatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn
true
enable_fail2ban
boolEnable fail2ban to block brute force log in attempts. Defaults to true
true
enable_imds
boolSet this variable to true to enable the Instance Metadata Service (IMDS) endpoint, which is used to fetch information such as user-data scripts, instance IP address and region, etc. Set this variable to false if you do not want the IMDS endpoint enabled for instances launched into the Auto Scaling Group for the workers.
true
Enable ip-lockdown to block access to the instance metadata. Defaults to true
true
enable_ssh_grunt
boolSet to true to add IAM permissions for ssh-grunt (https://github.com/gruntwork-io/terraform-aws-security/tree/master/modules/ssh-grunt), which will allow you to manage SSH access via IAM groups.
true
Since our IAM users are defined in a separate AWS account, this variable is used to specify the ARN of an IAM role that allows ssh-grunt to retrieve IAM group and public SSH key info from that account.
""
The number of periods over which data is compared to the specified threshold
2
The period, in seconds, over which to measure the CPU utilization percentage. Only used if enable_ecs_cloudwatch_alarms
is set to true
300
The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum
"Average"
Trigger an alarm if the ECS Cluster has a CPU utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms
is set to true
90
Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.
"missing"
The number of periods over which data is compared to the specified threshold
2
The period, in seconds, over which to measure the memory utilization percentage. Only used if enable_ecs_cloudwatch_alarms
is set to true
300
The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum
"Average"
Trigger an alarm if the ECS Cluster has a memory utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms
is set to true
90
Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.
"missing"
The desired HTTP PUT response hop limit for instance metadata requests for the workers.
null
internal_alb_sg_ids
list(string)The Security Group ID for the internal ALB
[]
Enable a multi-az capacity provider to autoscale the EC2 ASGs created for this ECS cluster, only if capacity_provider_enabled = true
false
public_alb_sg_ids
list(string)The Security Group ID for the public ALB
[]
When true, precreate the CloudWatch Log Group to use for log aggregation from the EC2 instances. This is useful if you wish to customize the CloudWatch Log Group with various settings such as retention periods and KMS encryption. When false, the CloudWatch agent will automatically create a basic log group to use.
true
ssh_grunt_iam_group
stringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster. This value is only used if enable_ssh_grunt=true.
"ssh-grunt-users"
ssh_grunt_iam_group_sudo
stringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster with sudo permissions. This value is only used if enable_ssh_grunt=true.
"ssh-grunt-sudo-users"
tenancy
stringThe tenancy of this server. Must be one of: default, dedicated, or host.
"default"
use_imdsv1
boolSet this variable to true to enable the use of Instance Metadata Service Version 1 in this module's aws_launch_configuration. Note that while IMDsv2 is preferred due to its special security hardening, we allow this in order to support the use case of AMIs built outside of these modules that depend on IMDSv1.
true
When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.
true
A list of all the CloudWatch Dashboard metric widgets available in this module.
The ID of the ECS cluster
The name of the ECS cluster's autoscaling group (ASG)
For configurations with multiple ASGs, this contains a list of ASG names.
For configurations with multiple capacity providers, this contains a list of all capacity provider names.
The ID of the launch configuration used by the ECS cluster's auto scaling group (ASG)
The name of the ECS cluster
The ID of the VPC into which the ECS cluster is launched
The VPC subnet IDs into which the ECS cluster can launch resources into
The ARN of the IAM role applied to ECS instances
The ID of the IAM role applied to ECS instances
The name of the IAM role applied to ECS instances
The ID of the security group applied to ECS instances
The CloudWatch Dashboard metric widget for the ECS cluster workers' CPU utilization metric.
The CloudWatch Dashboard metric widget for the ECS cluster workers' Memory utilization metric.