Skip to main content
Service Catalog Version 0.96.9Last updated in version 0.95.1

Amazon EKS Workers

View SourceRelease Notes

Overview

This service contains Terraform and Packer code to deploy a production-grade EC2 server cluster as workers for Elastic Kubernetes Service (EKS) on AWS.

EKS architectureEKS architecture

Features

  • Deploy self-managed worker nodes in an Auto Scaling Group

  • Deploy managed workers nodes in a Managed Node Group

  • Zero-downtime, rolling deployment for updating worker nodes

  • Auto scaling and auto healing

  • For Nodes:

    • Server-hardening with fail2ban, ip-lockdown, auto-update, and more
    • Manage SSH access via IAM groups via ssh-grunt
    • CloudWatch log aggregation
    • CloudWatch metrics and alerts

Learn

note

This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!

Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-eks repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.

Core concepts

To understand core concepts like what is Kubernetes, the different worker types, how to authenticate to Kubernetes, and more, see the documentation in the terraform-aws-eks repo.

Repo organization

  • modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
  • examples: This folder contains working examples of how to use the submodules.
  • test: Automated tests for the modules and examples.

Deploy

Non-production deployment (quick start for learning)

If you just want to try this repo out for experimenting and learning, check out the following resources:

  • examples/for-learning-and-testing folder: The examples/for-learning-and-testing folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).

Production deployment

If you want to deploy this repo in production, check out the following resources:

Manage

For information on registering the worker IAM role to the EKS control plane, refer to the IAM Roles and Kubernetes API Access section of the documentation.

For information on how to perform a blue-green deployment of the worker pools, refer to the How do I perform a blue green release to roll out new versions of the module section of the documentation.

For information on how to manage your EKS cluster, including how to deploy Pods on Fargate, how to associate IAM roles to Pod, how to upgrade your EKS cluster, and more, see the documentation in the terraform-aws-eks repo.

Reference

Required

Configure one or more self-managed Auto Scaling Groups (ASGs) to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure self-managed ASGs.

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
cluster_instance_amistringrequired

The AMI to run on each instance in the EKS cluster. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami or cluster_instance_ami_filters is required. Only used if cluster_instance_ami_filters is null. Set to null if cluster_instance_ami_filters is set.

cluster_instance_ami_filtersobject(…)required

Properties on the AMI that can be used to lookup a prebuilt AMI for use with self managed workers. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami or cluster_instance_ami_filters is required. If both are defined, cluster_instance_ami_filters will be used. Set to null if cluster_instance_ami is set.

object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)

# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
eks_cluster_namestringrequired

The name of the EKS cluster. The cluster must exist/already be deployed.

Configure one or more Node Groups to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure managed node groups.

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.

Optional

A list of additional security group IDs to be attached on worker groups.

[]
alarms_sns_topic_arnlist(string)optional

The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications.

[]

The list of CIDR blocks to allow inbound SSH access to the worker groups.

[]

The list of security group IDs to allow inbound SSH access to the worker groups.

[]

Custom name for the IAM role for the Self-managed workers. When null, a default name based on worker_name_prefix will be used. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.

null

Default value for enable_detailed_monitoring field of autoscaling_group_configurations.

true

Default value for the http_put_response_hop_limit field of autoscaling_group_configurations.

null

Default value for the asg_instance_root_volume_encryption field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_encryption will use this value.

true

Default value for the asg_instance_root_volume_iops field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_iops will use this value.

null

Default value for the asg_instance_root_volume_size field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_size will use this value.

40

Default value for the asg_instance_root_volume_throughput field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_throughput will use this value.

null

Default value for the asg_instance_root_volume_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_type will use this value.

"standard"

Default value for the asg_instance_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_type will use this value.

"t3.medium"

Default value for the max_pods_allowed field of autoscaling_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.

null
asg_default_max_sizenumberoptional

Default value for the max_size field of autoscaling_group_configurations. Any map entry that does not specify max_size will use this value.

2
asg_default_min_sizenumberoptional

Default value for the min_size field of autoscaling_group_configurations. Any map entry that does not specify min_size will use this value.

1

Default value for the multi_instance_overrides field of autoscaling_group_configurations. Any map entry that does not specify multi_instance_overrides will use this value.

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]

Default value for the on_demand_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify on_demand_allocation_strategy will use this value.

null

Default value for the on_demand_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_base_capacity will use this value.

null

Default value for the on_demand_percentage_above_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_percentage_above_base_capacity will use this value.

null

Default value for the spot_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify spot_allocation_strategy will use this value.

null

Default value for the spot_instance_pools field of autoscaling_group_configurations. Any map entry that does not specify spot_instance_pools will use this value.

null

Default value for the spot_max_price field of autoscaling_group_configurations. Any map entry that does not specify spot_max_price will use this value. Set to empty string (default) to mean on-demand price.

null
asg_default_tagslist(object(…))optional

Default value for the tags field of autoscaling_group_configurations. Any map entry that does not specify tags will use this value.

list(object({
key = string
value = string
propagate_at_launch = bool
}))
[]

Default value for the use_multi_instances_policy field of autoscaling_group_configurations. Any map entry that does not specify use_multi_instances_policy will use this value.

false

Custom name for the IAM instance profile for the Self-managed workers. When null, the IAM role name will be used. If asg_use_resource_name_prefix is true, this will be used as a name prefix.

null

Whether or not the IAM role used for the Self-managed workers already exists. When false, this module will create a new IAM role.

false
asg_iam_role_arnstringoptional

ARN of the IAM role to use if iam_role_already_exists = true. When null, uses asg_custom_iam_role_name to lookup the ARN. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.

null
asg_security_group_tagsmap(string)optional

A map of tags to apply to the Security Group of the ASG for the self managed worker pool. The key is the tag name and the value is the tag value.

{}

When true, all the relevant resources for self managed workers will be set to use the name_prefix attribute so that unique names are generated for them. This allows those resources to support recreation through create_before_destroy lifecycle rules. Set to false if you were using any version before 0.65.0 and wish to avoid recreating the entire worker pool on your cluster.

true

Adds additional tags to each ASG that allow a cluster autoscaler to auto-discover them. Only used for self-managed workers.

true

Namespace where the AWS Auth Merger is deployed. If configured, the worker IAM role will be mapped to the Kubernetes RBAC group for Nodes using a ConfigMap in the auth merger namespace.

null
cloud_init_partsmap(object(…))optional

Cloud init scripts to run on the EKS worker nodes when it is booting. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax. To override the default boot script installed as part of the module, use the key default.

map(object({
# A filename to report in the header for the part. Should be unique across all cloud-init parts.
filename = string

# A MIME-style content type to report in the header for the part. For example, use "text/x-shellscript" for a shell
# script.
content_type = string

# The contents of the boot script to be called. This should be the full text of the script as a raw string.
content = string
}))
{}

The ID (ARN, alias ARN, AWS ID) of a customer managed KMS Key to use for encrypting log data. Only used if enable_cloudwatch_log_aggregation is true.

null

Name of the CloudWatch Log Group where server system logs are reported to. Only used if enable_cloudwatch_log_aggregation is true.

null

The number of days to retain log events in the log group. Refer to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group#retention_in_days for all the valid values. When null, the log events are retained forever. Only used if enable_cloudwatch_log_aggregation is true.

null
cloudwatch_log_group_tagsmap(string)optional

Tags to apply on the CloudWatch Log Group, encoded as a map where the keys are tag keys and values are tag values. Only used if enable_cloudwatch_log_aggregation is true.

null

Whether or not to associate a public IP address to the instances of the self managed ASGs. Will only work if the instances are launched in a public subnet.

false

The name of the Key Pair that can be used to SSH to each instance in the EKS cluster.

null
custom_egress_security_group_rulesmap(object(…))optional

A map of unique identifiers to egress security group rules to attach to the worker groups.

map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string

# The target of the traffic. Only one of the following can be defined; the others must be configured to null.
target_security_group_id = string # The ID of the security group to which the traffic goes to.
cidr_blocks = list(string) # The list of IP CIDR blocks to which the traffic goes to.
}))
{}
custom_ingress_security_group_rulesmap(object(…))optional

A map of unique identifiers to ingress security group rules to attach to the worker groups.

map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string

# The source of the traffic. Only one of the following can be defined; the others must be configured to null.
source_security_group_id = string # The ID of the security group from which the traffic originates from.
cidr_blocks = list(string) # The list of IP CIDR blocks from which the traffic originates from.
}))
{}

Parameters for the worker cpu usage widget to output for use in a CloudWatch dashboard.

object({
# The period in seconds for metrics to sample across.
period = number

# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}

Parameters for the worker disk usage widget to output for use in a CloudWatch dashboard.

object({
# The period in seconds for metrics to sample across.
period = number

# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}

Parameters for the worker memory usage widget to output for use in a CloudWatch dashboard.

object({
# The period in seconds for metrics to sample across.
period = number

# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}

Set to true to enable several basic CloudWatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn.

true

Set to true to send logs to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/logs/cloudwatch-log-aggregation-scripts to do log aggregation in CloudWatch. Note that this is only recommended for aggregating system level logs from the server instances. Container logs should be managed through fluent-bit deployed with eks-core-services.

false

Set to true to add IAM permissions to send custom metrics to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/agents/cloudwatch-agent to get memory and disk metrics in CloudWatch for your Bastion host.

true
enable_fail2banbooloptional

Enable fail2ban to block brute force log in attempts. Defaults to true.

true

If you are using ssh-grunt and your IAM users / groups are defined in a separate AWS account, you can use this variable to specify the ARN of an IAM role that ssh-grunt can assume to retrieve IAM group and public SSH key info from that account. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).

""

The period, in seconds, over which to measure the CPU utilization percentage for the ASG.

60

Trigger an alarm if the ASG has an average cluster CPU utilization percentage above this threshold.

90

Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.

"missing"

The period, in seconds, over which to measure the root disk utilization percentage for the ASG.

60

Trigger an alarm if the ASG has an average cluster root disk utilization percentage above this threshold.

90

Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.

"missing"

The period, in seconds, over which to measure the Memory utilization percentage for the ASG.

60

Trigger an alarm if the ASG has an average cluster Memory utilization percentage above this threshold.

90

Sets how this alarm should handle entering the INSUFFICIENT_DATA state. Based on https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data. Must be one of: 'missing', 'ignore', 'breaching' or 'notBreaching'.

"missing"

Custom name for the IAM role for the Managed Node Groups. When null, a default name based on worker_name_prefix will be used. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.

null

Whether or not the IAM role used for the Managed Node Group workers already exists. When false, this module will create a new IAM role.

false

ARN of the IAM role to use if iam_role_already_exists = true. When null, uses managed_node_group_custom_iam_role_name to lookup the ARN. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.

null

Default value for capacity_type field of managed_node_group_configurations.

"ON_DEMAND"

Default value for desired_size field of managed_node_group_configurations.

1

Default value for enable_detailed_monitoring field of managed_node_group_configurations.

true

Default value for http_put_response_hop_limit field of managed_node_group_configurations. Any map entry that does not specify http_put_response_hop_limit will use this value.

null

Default value for the instance_root_volume_encryption field of managed_node_group_configurations.

true

Default value for the instance_root_volume_size field of managed_node_group_configurations.

40

Default value for the instance_root_volume_type field of managed_node_group_configurations.

"gp3"

Default value for instance_types field of managed_node_group_configurations.

null
node_group_default_labelsmap(string)optional

Default value for labels field of managed_node_group_configurations. Unlike common_labels which will always be merged in, these labels are only used if the labels field is omitted from the configuration.

{}

Default value for the max_pods_allowed field of managed_node_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.

null

Default value for max_size field of managed_node_group_configurations.

1

Default value for min_size field of managed_node_group_configurations.

1
node_group_default_subnet_idslist(string)optional

Default value for subnet_ids field of managed_node_group_configurations.

null
node_group_default_tagsmap(string)optional

Default value for tags field of managed_node_group_configurations. Unlike common_tags which will always be merged in, these tags are only used if the tags field is omitted from the configuration.

{}
node_group_default_taintslist(map(…))optional

Default value for taint field of node_group_configurations. These taints are only used if the taint field is omitted from the configuration.

list(map(string))
[]

The instance type to configure in the launch template. This value will be used when the instance_types field is set to null (NOT omitted, in which case node_group_default_instance_types will be used).

null

Tags assigned to a node group are mirrored to the underlaying autoscaling group by default. If you want to disable this behaviour, set this flag to false. Note that this assumes that there is a one-to-one mappping between ASGs and Node Groups. If there is more than one ASG mapped to the Node Group, then this will only apply the tags on the first one. Due to a limitation in Terraform for_each where it can not depend on dynamic data, it is currently not possible in the module to map the tags to all ASGs. If you wish to apply the tags to all underlying ASGs, then it is recommended to call the aws_autoscaling_group_tag resource in a separate terraform state file outside of this module, or use a two-stage apply process.

true
node_group_nameslist(string)optional

The names of the node groups. When null, this value is automatically calculated from the managed_node_group_configurations map. This variable must be set if any of the values of the managed_node_group_configurations map depends on a resource that is not available at plan time to work around terraform limitations with for_each.

null

A map of tags to apply to the Security Group of the ASG for the managed node group pool. The key is the tag name and the value is the tag value.

{}
ssh_grunt_iam_groupstringoptional

If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).

"ssh-grunt-users"

If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers with sudo permissions. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).

"ssh-grunt-sudo-users"
tenancystringoptional

The tenancy of the servers in the self-managed worker ASG. Must be one of: default, dedicated, or host.

"default"

If this variable is set to true, then use an exec-based plugin to authenticate and fetch tokens for EKS. This is useful because EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy', and since the native Kubernetes provider in Terraform doesn't have a way to fetch up-to-date tokens, we recommend using an exec-based provider as a workaround. Use the use_kubergrunt_to_fetch_token input variable to control whether kubergrunt or aws is used to fetch tokens.

true
use_imdsv1booloptional

Set this variable to true to enable the use of Instance Metadata Service Version 1 in this module's aws_launch_template. Note that while IMDsv2 is preferred due to its special security hardening, we allow this in order to support the use case of AMIs built outside of these modules that depend on IMDSv1.

false

EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy'. To avoid this issue, we use an exec-based plugin to fetch an up-to-date token. If this variable is set to true, we'll use kubergrunt to fetch the token (in which case, kubergrunt must be installed and on PATH); if this variable is set to false, we'll use the aws CLI to fetch the token (in which case, aws must be installed and on PATH). Note this functionality is only enabled if use_exec_plugin_for_auth is set to true.

true

When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.

true

When true, assumes prefix delegation mode is in use for the AWS VPC CNI component of the EKS cluster when computing max pods allowed on the node. In prefix delegation mode, each ENI will be allocated 16 IP addresses (/28) instead of 1, allowing you to pack more Pods per node.

false

Name of the IAM role to Kubernetes RBAC group mapping ConfigMap. Only used if aws_auth_merger_namespace is not null.

"eks-cluster-worker-iam-mapping"
worker_name_prefixstringoptional

Prefix EKS worker resource names with this string. When you have multiple worker groups for the cluster, you can use this to namespace the resources. Defaults to empty string so that resource names are not excessively long by default.

""