Amazon EKS Workers
Overview
This service contains Terraform and Packer code to deploy a production-grade EC2 server cluster as workers for Elastic Kubernetes Service (EKS) on AWS.
EKS architecture
Features
Deploy self-managed worker nodes in an Auto Scaling Group
Deploy managed workers nodes in a Managed Node Group
Zero-downtime, rolling deployment for updating worker nodes
Auto scaling and auto healing
For Nodes:
- Server-hardening with fail2ban, ip-lockdown, auto-update, and more
- Manage SSH access via IAM groups via ssh-grunt
- CloudWatch log aggregation
- CloudWatch metrics and alerts
Learn
note
This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!
Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-eks repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.
Core concepts
To understand core concepts like what is Kubernetes, the different worker types, how to authenticate to Kubernetes, and more, see the documentation in the terraform-aws-eks repo.
Repo organization
- modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
- examples: This folder contains working examples of how to use the submodules.
- test: Automated tests for the modules and examples.
Deploy
Non-production deployment (quick start for learning)
If you just want to try this repo out for experimenting and learning, check out the following resources:
- examples/for-learning-and-testing folder: The
examples/for-learning-and-testing
folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).
Production deployment
If you want to deploy this repo in production, check out the following resources:
examples/for-production folder: The
examples/for-production
folder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.How to deploy a production-grade Kubernetes cluster on AWS: A step-by-step guide for deploying a production-grade EKS cluster on AWS using the code in this repo.
Manage
For information on registering the worker IAM role to the EKS control plane, refer to the IAM Roles and Kubernetes API Access section of the documentation.
For information on how to perform a blue-green deployment of the worker pools, refer to the How do I perform a blue green release to roll out new versions of the module section of the documentation.
For information on how to manage your EKS cluster, including how to deploy Pods on Fargate, how to associate IAM roles to Pod, how to upgrade your EKS cluster, and more, see the documentation in the terraform-aws-eks repo.
Reference
- Inputs
- Outputs
Required
Configure one or more self-managed Auto Scaling Groups (ASGs) to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure self-managed ASGs.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
cluster_instance_ami
stringThe AMI to run on each instance in the EKS cluster. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. Only used if cluster_instance_ami_filters
is null. Set to null if cluster_instance_ami_filters is set.
cluster_instance_ami_filters
object(…)Properties on the AMI that can be used to lookup a prebuilt AMI for use with self managed workers. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. If both are defined, cluster_instance_ami_filters
will be used. Set to null if cluster_instance_ami is set.
object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)
# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
eks_cluster_name
stringThe name of the EKS cluster. The cluster must exist/already be deployed.
Configure one or more Node Groups to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure managed node groups.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
Optional
additional_security_groups_for_workers
list(string)A list of additional security group IDs to be attached on worker groups.
[]
alarms_sns_topic_arn
list(string)The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications.
[]
allow_inbound_ssh_from_cidr_blocks
list(string)The list of CIDR blocks to allow inbound SSH access to the worker groups.
[]
allow_inbound_ssh_from_security_groups
list(string)The list of security group IDs to allow inbound SSH access to the worker groups.
[]
asg_custom_iam_role_name
stringCustom name for the IAM role for the Self-managed workers. When null, a default name based on worker_name_prefix will be used. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
null
Default value for enable_detailed_monitoring field of autoscaling_group_configurations.
true
Default value for the http_put_response_hop_limit field of autoscaling_group_configurations.
null
Default value for the asg_instance_root_volume_encryption field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_encryption will use this value.
true
Default value for the asg_instance_root_volume_iops field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_iops will use this value.
null
Default value for the asg_instance_root_volume_size field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_size will use this value.
40
Default value for the asg_instance_root_volume_throughput field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_throughput will use this value.
null
Default value for the asg_instance_root_volume_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_type will use this value.
"standard"
Default value for the asg_instance_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_type will use this value.
"t3.medium"
Default value for the max_pods_allowed field of autoscaling_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
null
asg_default_max_size
numberDefault value for the max_size field of autoscaling_group_configurations. Any map entry that does not specify max_size will use this value.
2
asg_default_min_size
numberDefault value for the min_size field of autoscaling_group_configurations. Any map entry that does not specify min_size will use this value.
1
Default value for the multi_instance_overrides field of autoscaling_group_configurations. Any map entry that does not specify multi_instance_overrides will use this value.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]
Default value for the on_demand_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify on_demand_allocation_strategy will use this value.
null
Default value for the on_demand_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_base_capacity will use this value.
null
Default value for the on_demand_percentage_above_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_percentage_above_base_capacity will use this value.
null
Default value for the spot_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify spot_allocation_strategy will use this value.
null
Default value for the spot_instance_pools field of autoscaling_group_configurations. Any map entry that does not specify spot_instance_pools will use this value.
null
Default value for the spot_max_price field of autoscaling_group_configurations. Any map entry that does not specify spot_max_price will use this value. Set to empty string (default) to mean on-demand price.
null
asg_default_tags
list(object(…))Default value for the tags field of autoscaling_group_configurations. Any map entry that does not specify tags will use this value.
list(object({
key = string
value = string
propagate_at_launch = bool
}))
[]
Default value for the use_multi_instances_policy field of autoscaling_group_configurations. Any map entry that does not specify use_multi_instances_policy will use this value.
false
Custom name for the IAM instance profile for the Self-managed workers. When null, the IAM role name will be used. If asg_use_resource_name_prefix
is true, this will be used as a name prefix.
null
Whether or not the IAM role used for the Self-managed workers already exists. When false, this module will create a new IAM role.
false
asg_iam_role_arn
stringARN of the IAM role to use if iam_role_already_exists = true. When null, uses asg_custom_iam_role_name to lookup the ARN. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
null
asg_security_group_tags
map(string)A map of tags to apply to the Security Group of the ASG for the self managed worker pool. The key is the tag name and the value is the tag value.
{}
When true, all the relevant resources for self managed workers will be set to use the name_prefix attribute so that unique names are generated for them. This allows those resources to support recreation through create_before_destroy lifecycle rules. Set to false if you were using any version before 0.65.0 and wish to avoid recreating the entire worker pool on your cluster.
true
Adds additional tags to each ASG that allow a cluster autoscaler to auto-discover them. Only used for self-managed workers.
true
Namespace where the AWS Auth Merger is deployed. If configured, the worker IAM role will be mapped to the Kubernetes RBAC group for Nodes using a ConfigMap in the auth merger namespace.
null
cloud_init_parts
map(object(…))Cloud init scripts to run on the EKS worker nodes when it is booting. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax. To override the default boot script installed as part of the module, use the key default
.
map(object({
# A filename to report in the header for the part. Should be unique across all cloud-init parts.
filename = string
# A MIME-style content type to report in the header for the part. For example, use "text/x-shellscript" for a shell
# script.
content_type = string
# The contents of the boot script to be called. This should be the full text of the script as a raw string.
content = string
}))
{}
The ID (ARN, alias ARN, AWS ID) of a customer managed KMS Key to use for encrypting log data. Only used if enable_cloudwatch_log_aggregation
is true.
null
Name of the CloudWatch Log Group where server system logs are reported to. Only used if enable_cloudwatch_log_aggregation
is true.
null
The number of days to retain log events in the log group. Refer to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group#retention_in_days for all the valid values. When null, the log events are retained forever. Only used if enable_cloudwatch_log_aggregation
is true.
null
cloudwatch_log_group_tags
map(string)Tags to apply on the CloudWatch Log Group, encoded as a map where the keys are tag keys and values are tag values. Only used if enable_cloudwatch_log_aggregation
is true.
null
Whether or not to associate a public IP address to the instances of the self managed ASGs. Will only work if the instances are launched in a public subnet.
false
The name of the Key Pair that can be used to SSH to each instance in the EKS cluster.
null
custom_egress_security_group_rules
map(object(…))A map of unique identifiers to egress security group rules to attach to the worker groups.
map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string
# The target of the traffic. Only one of the following can be defined; the others must be configured to null.
target_security_group_id = string # The ID of the security group to which the traffic goes to.
cidr_blocks = list(string) # The list of IP CIDR blocks to which the traffic goes to.
}))
{}
custom_ingress_security_group_rules
map(object(…))A map of unique identifiers to ingress security group rules to attach to the worker groups.
map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string
# The source of the traffic. Only one of the following can be defined; the others must be configured to null.
source_security_group_id = string # The ID of the security group from which the traffic originates from.
cidr_blocks = list(string) # The list of IP CIDR blocks from which the traffic originates from.
}))
{}
Parameters for the worker cpu usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
Parameters for the worker disk usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
Parameters for the worker memory usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
Set to true to enable several basic CloudWatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn
.
true
Set to true to send logs to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/logs/cloudwatch-log-aggregation-scripts to do log aggregation in CloudWatch. Note that this is only recommended for aggregating system level logs from the server instances. Container logs should be managed through fluent-bit deployed with eks-core-services.
false
Set to true to add IAM permissions to send custom metrics to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/agents/cloudwatch-agent to get memory and disk metrics in CloudWatch for your Bastion host.
true
enable_fail2ban
boolEnable fail2ban to block brute force log in attempts. Defaults to true.
true
If you are using ssh-grunt and your IAM users / groups are defined in a separate AWS account, you can use this variable to specify the ARN of an IAM role that ssh-grunt can assume to retrieve IAM group and public SSH key info from that account. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
""
The period, in seconds, over which to measure the CPU utilization percentage for the ASG.
60
Trigger an alarm if the ASG has an average cluster CPU utilization percentage above this threshold.
90
Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.
"missing"
The period, in seconds, over which to measure the root disk utilization percentage for the ASG.
60
Trigger an alarm if the ASG has an average cluster root disk utilization percentage above this threshold.
90
Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.
"missing"
The period, in seconds, over which to measure the Memory utilization percentage for the ASG.
60
Trigger an alarm if the ASG has an average cluster Memory utilization percentage above this threshold.
90
Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.
"missing"
Custom name for the IAM role for the Managed Node Groups. When null, a default name based on worker_name_prefix will be used. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
null
Whether or not the IAM role used for the Managed Node Group workers already exists. When false, this module will create a new IAM role.
false
ARN of the IAM role to use if iam_role_already_exists = true. When null, uses managed_node_group_custom_iam_role_name to lookup the ARN. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
null
Default value for capacity_type field of managed_node_group_configurations.
"ON_DEMAND"
Default value for desired_size field of managed_node_group_configurations.
1
Default value for enable_detailed_monitoring field of managed_node_group_configurations.
true
Default value for http_put_response_hop_limit field of managed_node_group_configurations. Any map entry that does not specify http_put_response_hop_limit will use this value.
null
Default value for the instance_root_volume_encryption field of managed_node_group_configurations.
true
Default value for the instance_root_volume_size field of managed_node_group_configurations.
40
Default value for the instance_root_volume_type field of managed_node_group_configurations.
"gp3"
node_group_default_instance_types
list(string)Default value for instance_types field of managed_node_group_configurations.
null
node_group_default_labels
map(string)Default value for labels field of managed_node_group_configurations. Unlike common_labels which will always be merged in, these labels are only used if the labels field is omitted from the configuration.
{}
Default value for the max_pods_allowed field of managed_node_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
null
Default value for max_size field of managed_node_group_configurations.
1
Default value for min_size field of managed_node_group_configurations.
1
node_group_default_subnet_ids
list(string)Default value for subnet_ids field of managed_node_group_configurations.
null
node_group_default_tags
map(string)Default value for tags field of managed_node_group_configurations. Unlike common_tags which will always be merged in, these tags are only used if the tags field is omitted from the configuration.
{}
node_group_default_taints
list(map(…))Default value for taint field of node_group_configurations. These taints are only used if the taint field is omitted from the configuration.
list(map(string))
[]
The instance type to configure in the launch template. This value will be used when the instance_types field is set to null (NOT omitted, in which case node_group_default_instance_types
will be used).
null
Tags assigned to a node group are mirrored to the underlaying autoscaling group by default. If you want to disable this behaviour, set this flag to false. Note that this assumes that there is a one-to-one mappping between ASGs and Node Groups. If there is more than one ASG mapped to the Node Group, then this will only apply the tags on the first one. Due to a limitation in Terraform for_each where it can not depend on dynamic data, it is currently not possible in the module to map the tags to all ASGs. If you wish to apply the tags to all underlying ASGs, then it is recommended to call the aws_autoscaling_group_tag resource in a separate terraform state file outside of this module, or use a two-stage apply process.
true
node_group_names
list(string)The names of the node groups. When null, this value is automatically calculated from the managed_node_group_configurations map. This variable must be set if any of the values of the managed_node_group_configurations map depends on a resource that is not available at plan time to work around terraform limitations with for_each.
null
node_group_security_group_tags
map(string)A map of tags to apply to the Security Group of the ASG for the managed node group pool. The key is the tag name and the value is the tag value.
{}
ssh_grunt_iam_group
stringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
"ssh-grunt-users"
ssh_grunt_iam_group_sudo
stringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers with sudo permissions. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
"ssh-grunt-sudo-users"
tenancy
stringThe tenancy of the servers in the self-managed worker ASG. Must be one of: default, dedicated, or host.
"default"
If this variable is set to true, then use an exec-based plugin to authenticate and fetch tokens for EKS. This is useful because EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy', and since the native Kubernetes provider in Terraform doesn't have a way to fetch up-to-date tokens, we recommend using an exec-based provider as a workaround. Use the use_kubergrunt_to_fetch_token input variable to control whether kubergrunt or aws is used to fetch tokens.
true
use_imdsv1
boolSet this variable to true to enable the use of Instance Metadata Service Version 1 in this module's aws_launch_template. Note that while IMDsv2 is preferred due to its special security hardening, we allow this in order to support the use case of AMIs built outside of these modules that depend on IMDSv1.
false
EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy'. To avoid this issue, we use an exec-based plugin to fetch an up-to-date token. If this variable is set to true, we'll use kubergrunt to fetch the token (in which case, kubergrunt must be installed and on PATH); if this variable is set to false, we'll use the aws CLI to fetch the token (in which case, aws must be installed and on PATH). Note this functionality is only enabled if use_exec_plugin_for_auth is set to true.
true
When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.
true
When true, assumes prefix delegation mode is in use for the AWS VPC CNI component of the EKS cluster when computing max pods allowed on the node. In prefix delegation mode, each ENI will be allocated 16 IP addresses (/28) instead of 1, allowing you to pack more Pods per node.
false
Name of the IAM role to Kubernetes RBAC group mapping ConfigMap. Only used if aws_auth_merger_namespace is not null.
"eks-cluster-worker-iam-mapping"
worker_name_prefix
stringPrefix EKS worker resource names with this string. When you have multiple worker groups for the cluster, you can use this to namespace the resources. Defaults to empty string so that resource names are not excessively long by default.
""
Map of Node Group names to ARNs of the created EKS Node Groups.
The ARN of the IAM role associated with the Managed Node Group EKS workers.
The name of the IAM role associated with the Managed Node Group EKS workers.
Map of Node Group names to Auto Scaling Group security group IDs. Empty if cluster_instance_keypair_name
is not set.
The ID of the common AWS Security Group associated with all the managed EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the self-managed EKS workers.
The ARN of the IAM role associated with the self-managed EKS workers.
The name of the IAM role associated with the self-managed EKS workers.
The ID of the AWS Security Group associated with the self-managed EKS workers.
The list of names of the ASGs that were deployed to act as EKS workers.