How to update your Reference Architecture to use the Gruntwork Service Catalog
This guide walks you through how to update v1.0 of the Gruntwork Reference
Architecture, which uses a private infrastructure-modules
repo, to v2.0, which uses the
Gruntwork
Service Catalog. The Gruntwork Service Catalog is a set of dozens of highly configurable, battle-tested,
production-grade services that you can deploy off-the-shelf without writing any code. Many modules offered in the
infrastructure-modules
repository of the Gruntwork Reference Architecture are fully replaced with the Service Catalog,
eliminating the need to maintain snapshots of the Services in your own repository.
note
Dedicated guide for updating Landing Zone and Gruntwork Pipelines haven't been written yet. Until then, please consult this guide on adopting the Landing Zone and the Gruntwork Pipelines deployment guide.
Core Concepts
On August 26th, 2020, we announced the initial release of the Service Catalog in a private, invite-only alpha program. Since then we have expanded the catalog to be generally available, with all Gruntwork subscribers getting access to the services of the catalog. A service includes everything you need in production, including the Terraform code for provisioning, Packer templates or Dockerfiles for configuration management, built-in monitoring (metrics, logging, alerting), built-in security features (SSH access, secrets management, server hardening, encryption in transit and at rest), and so on. This allows you to deploy infrastructure without writing any code.
Prior to the release of the Service Catalog, the Gruntwork Reference Architecture included a customized set of services
in the form of a private infrastructure-modules
repository for each customer. These services required continuous
maintenance from the end users to keep up to date with each change, including updating module references when new
versions were released.
Now with the Service Catalog, there is no longer a need to maintain a separate service module for every component you use. Instead, you can rely on the battle tested, Gruntwork maintained service module in the Service Catalog!
If you haven’t made any modifications to a service since receiving the Reference Architecture, we recommend updating
your infrastructure-live
configuration to use the service modules from the Service Catalog instead of using and
maintaining your copy in infrastructure-modules
. Using the Service Catalog directly has several advantages:
Keep up to date with module releases with a single version bump in your Terragrunt live configuration, as opposed to updating each module block reference.
Update to a new, backward incompatible Terraform version with a single version bump as well. No need to track backward incompatible syntactical changes in Terraform!
Rely on Gruntwork to provide feature updates and bug fixes to the services.
Though the Service Catalog fully replaces the services defined in your infrastructure-modules
repository, there are
some differences that require you to make modifications to Terragrunt configurations in your infrastructure-live
repository before you can start using the Service Catalog.
Deployment Walkthrough
Your goal in this walkthrough is to replace references to infrastructure-modules
with terraform-aws-service-catalog
,
within your infrastructure-live
repo. You will follow a two-step process:
- Prepare your Terragrunt
infrastructure-live
configuration to support the Service Catalog - Update each Terragrunt live configuration to use the Service Catalog
This guide assumes a minimum Terragrunt version of 0.28.18. If you are using an older Terragrunt version, be sure to update to at least this version first.
You can also watch a video walkthrough of a single module update.
Step 1: Prepare your Terragrunt configuration to support the Service Catalog
The Terraform modules in the Service Catalog are missing some required blocks for Terraform to operate (e.g., the
provider
and terraform
state backend blocks). This is by design, to allow the modules to be flexibly used in
different contexts (unlike the infrastructure-modules
modules which are only intended to be used in the Terragrunt
context). As such, you need to inject these required blocks to use advanced features of Terragrunt. Here we define a new
root terragrunt.hcl
configuration that includes these directives so that we can continue to use the original,
non-Service Catalog based live configuration.
For reference, you can download the refarch-folder-structure
zip file which contains all the following files in their respective locations in the folder structure. You can copy
these files into your own infrastructure-live
repository.
Create a new file
terragrunt_service_catalog.hcl
at the root of theinfrastructure-live
repository. We name this differently from the existingterragrunt.hcl
so that the modules that haven’t migrated yet won’t be affected.Insert the following contents:
# ---------------------------------------------------------------------------------------------------------------------
# TERRAGRUNT CONFIGURATION
# ---------------------------------------------------------------------------------------------------------------------
locals {
common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
# Optional: Update to use HCL instead of YAML
region_vars = yamldecode(file(find_in_parent_folders("region.yaml")))
name_prefix = local.common_vars.locals.name_prefix
account_name = local.account_vars.locals.account_name
account_id = local.account_vars.locals.account_id
default_region = local.common_vars.locals.default_region
aws_region = local.region_vars["aws_region"]
}
# ----------------------------------------------------------------------------------------------------------------
# GENERATED PROVIDER BLOCK
# ----------------------------------------------------------------------------------------------------------------
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "${local.aws_region}"
version = ">= 3.13.0"
# Only these AWS Account IDs may be operated on by this template
allowed_account_ids = ["${local.account_id}"]
}
EOF
}
# ----------------------------------------------------------------------------------------------------------------
# GENERATED REMOTE STATE BLOCK
# ----------------------------------------------------------------------------------------------------------------
# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state {
backend = "s3"
config = {
encrypt = true
bucket = "${local.name_prefix}-${local.account_name}-terraform-state"
region = local.default_region
dynamodb_table = "terraform-locks"
# To ensure that the state paths are the same as before, we drop the account folder (the first path element)
# which is now included in the relative path.
key = trimprefix("${path_relative_to_include()}/terraform.tfstate", "${local.account_name}/")
}
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
}
# ---------------------------------------------------------------------------------------------------------------------
# GLOBAL PARAMETERS
# These variables apply to all configurations in this subfolder. These are automatically merged into the child
# `terragrunt.hcl` config via the include block.
# ---------------------------------------------------------------------------------------------------------------------
inputs = {
# Many modules require these two inputs, so we set them globally here to keep all the child terragrunt.hcl files more
# DRY
aws_account_id = local.account_id
aws_region = local.aws_region
}Explanation
In the non-Service Catalog flavor of the Reference Architecture, we had a root configuration for each account to ensure that we can create a different state bucket for each account. While this isn’t necessary for the Service Catalog, we switch to a single root
terragrunt.hcl
config here (which is possible due to the advanced functions that are available in newer Terragrunt versions) because there are more common blocks that are necessary, and we want to keep these blocks DRY.To support the new requirements of the Service Catalog, we also introduce two code generation configuration:
generate "provider"
: Uses the terragrunt code generation feature to inject the provider block into the module prior to invoking terraform.generate
attribute of theremote_state
block: Similar to the provider block generation, this attribute injects theterraform.backend
configuration.
Finally, we introduce a
locals
block to define references that can be reused throughout the configuration. Note that for new commonly used variables, we useread_terragrunt_config
instead ofyamldecode(file())
to allow for use of terragrunt functions in the config.Note that the suggested config continues to use the
region.yaml
data file to simplify the migration process. You can optionally update this file tohcl
for consistency.Create new data files for the root config (these are the files that are read in in the
locals
blocks):- In the root of the
infrastructure-live
repository, add acommon.hcl
file with the following contents:
locals {
# TODO: A unique name prefix to set for all infrastructure resources created in your accounts.
name_prefix = ""
# TODO: the default AWS region to use. This should be the same as where the terraform state S3 bucket is
# currently provisioned.
default_region = ""
}- In each account folder (e.g.,
infrastructure-live/dev
orinfrastructure-live/shared
), add a file namedaccount.hcl
with the following contents:
locals {
# TODO: Update with the actual information for each account
# The user friendly name of the AWS account. Usually matches the folder name.
account_name = ""
# The 12 digit ID number for your AWS account.
account_id = ""
}- (optional) If you wish to replace your yaml variable files with HCL, in each region folder (e.g.,
infrastructure-live/dev/us-east-2
), add a file namedregion.hcl
with the following contents:
locals {
# TODO: enter the region to use for all resources in this subfolder.
aws_region = ""
}Note that you will want to have a
region.hcl
file for the_global
folder as well. In this case, set theaws_region
tous-east-1
.- In the root of the
Create migration helper scripts (these are used by the Gruntwork Service Catalog Terraform state migration bash scripts):
Create a new directory
_scripts
at the root of theinfrastructure-live
repository.Create a new file
migration_helpers.sh
in the newly created_scripts
folder and paste in the following contents:#!/usr/bin/env bash
# Helper functions for implementing state migrations for updating terraform modules to newer versions.
function log {
>&2 echo -e "$@"
}
# find_state_address uses the provided query string to identify the full resource address to use in the state file.
function find_state_address {
local -r query="$1"
log "Identifying real state address of $query"
terragrunt state list \
| grep -E "$query" || true
}
# strip_bash_color will strip out bash color/bold escape sequences.
function strip_bash_color {
local -r input="$1"
# Based on this stack overflow post: https://stackoverflow.com/questions/6534556/how-to-remove-and-all-of-the-escape-sequences-in-a-file-using-linux-shell-sc.
# None of the sed calls worked to completely strip of the escape sequences by itself, but the following combination worked.
echo "$input" | cat -v | sed 's/\^\[\[[10]m//g'
}
# Check that the given binary is available on the PATH. If it's not, exit with an error.
function assert_is_installed {
local -r name="$1"
local -r help_url="$2"
if ! command -v "$name" > /dev/null; then
log "ERROR: The command '$name' is required by this script but is not installed or in the system's PATH. Visit $help_url for instructions on how to install."
exit 1
fi
}
# Make sure that the hcledit utility is installed and available on the system.
function assert_hcledit_is_installed {
assert_is_installed 'hcledit' 'https://github.com/minamijoyo/hcledit#install'
}
# Make sure that the jq utility is installed and available on the system.
function assert_jq_is_installed {
assert_is_installed 'jq' 'https://stedolan.github.io/jq/download/'
}
# Move resources in a literal terraform state move. Unlike a simple terragrunt state mv call, this does error checking
# to make sure the given state actually exists.
function move_state {
local -r original_addr="$1"
local -r new_addr="$2"
if [[ "$original_addr" == "$new_addr" ]]; then
echo "Nothing to change. Skipping state migration."
return
fi
if ! terragrunt state show "$original_addr" >/dev/null 2>&1; then
echo "Original address $original_addr not found in state file. Nothing to change. Skipping state migration."
return
fi
echo "Migrating state:"
echo
echo " $original_addr =>"
echo " $new_addr"
echo
terragrunt state mv "$original_addr" "$new_addr"
}
# Move resources in terraform state using fuzzy matches.
function fuzzy_move_state {
local -r original_addr_query="$1"
local -r new_addr="$2"
local -r friendly_name="$3"
log "Checking if $friendly_name needs to be migrated"
local original_addr
original_addr="$(find_state_address "$original_addr_query")"
if [[ -z "$original_addr" ]] || [[ "$original_addr" == "$new_addr" ]]; then
echo "Nothing to change. Skipping state migration."
else
echo "Migrating state:"
echo
echo " $original_addr =>"
echo " $new_addr"
echo
terragrunt state mv "$original_addr" "$new_addr"
fi
}
# The following routine extracts a resource attribute.
function extract_attribute {
local -r address="$1"
local -r resource_basename="$2"
local -r attribute="$3"
local state
state="$(terragrunt state show "$address")"
local state_nocolor
state_nocolor="$(strip_bash_color "$state")"
echo "$state_nocolor" \
| hcledit attribute get "$resource_basename"."$attribute" \
| jq -r '.'
}
# Move resources in terraform state using an import call instead of state mv. This is useful when moving resources
# across aliased resources (e.g., aws_alb => aws_lb).
function fuzzy_import_move_state {
local -r original_addr_query="$1"
local -r new_addr="$2"
local -r resource_basename="$3"
local -r friendly_name="$4"
log "Checking if $friendly_name needs to be migrated."
local original_addr
original_addr="$(find_state_address "$original_addr_query")"
if [[ -z "$original_addr" ]]; then
log "$friendly_name is already migrated. Skipping import."
return
fi
log "$friendly_name needs to be migrated"
log "Identifying $friendly_name ID to import into new resource."
local resource_id
resource_id="$(extract_attribute $original_addr $resource_basename "id")"
if [[ -z "$resource_id" ]]; then
log "ERROR: could not identify $friendly_name ID to import."
exit 1
fi
log "Importing $friendly_name to new resource:"
log
log " ID: $resource_id"
log " ResourceAddr: $new_addr"
terragrunt import "$new_addr" "$resource_id"
log "Removing old $friendly_name state."
terragrunt state rm "$original_addr"
}
# This function migrates the original list into a new map at the original address.
# First it moves each item in the list into a new map.
# Then it moves the new map back to the original address.
# The keys of the new map correspond to the value of the attribute that is passed into `attribute`.
# We look up this value in the function.
function move_state_list_to_map {
local -r original_addr="$1"
local -r attribute="$2"
# Move the list to a temporary map
local map_addr
map_addr="$1_tmp"
# Pull the full list of addresses in the original_addr (aws_ecr_repository.repos[0], aws_ecr_repository.repos[1], etc)
local list_addresses
list_addresses="$(find_state_address "$original_addr")"
for addr in $list_addresses
do
# Extract the attribute that will be the new key in the map.
local key
key="$(extract_attribute "$addr" "resource.$original_addr" "$attribute")"
echo "Migrating state:"
echo
echo " $addr =>"
echo " $map_addr[$key]"
echo
# Move the resource to the new temporary map address.
terragrunt state mv "$addr" "$map_addr[\"$key\"]"
done
# Move the temporary map back to the original address, now as a map
terragrunt state mv "$map_addr" "$original_addr"
}
# This function migrates a list to a new list with a different name.
function fuzzy_move_list {
local -r original_list="$1"
local -r new_list="$2"
local -r friendly_name="$3"
log "Checking if $friendly_name needs to be migrated"
if [[ -z "$original_list" ]] || [[ "$original_list" == "$new_list" ]]; then
echo "Nothing to change. Skipping state migration."
else
local original_addrs
original_addrs="$(find_state_address "$original_list")"
for addr in $original_addrs
do
local new_addr
new_addr="$(
echo "$addr" \
| sed "s/$original_list/$new_list/"
)"
echo "Migrating state:"
echo
echo " $addr =>"
echo " $new_addr"
echo
terragrunt state mv "$addr" "$new_addr"
done
fi
}
Step 2: Update each Terragrunt live configuration to use the Service Catalog
At this point, you are ready to update each live configuration! It’s important to take a bottom-up approach for migrating the live configurations. That is, update live configurations that don’t have any downstream dependencies first, then work your way up the dependency graph.
This ensures that:
Each update is self contained. Changing the live configuration of leaf services will not affect other live configurations, allowing you to continue to make changes to un-migrated live configurations.
The migration is low risk. The leaf nodes in the Terragrunt infrastructure graph tend to be lower risk services. That is, the closer you are to the root of the graph, the higher the number of things that depend on that infrastructure, which gives that service a larger surface area. E.g., VPC has many downstream dependencies, which means that messing it up can cause lots of other services to fail.
However, this does mean that you will need to update previously migrated services if the upstream services has a change. For example, Service Catalog services sometimes have output name changes, which means that you will need to update the references in the downstream services when you update the service.
To handle this, you can identify all the downstream services that are affected by running terragrunt validate-all
to
identify these broken links each time a service is updated, and fix them in the same PR.
Don’t worry — we’re going to walk you through every step right now. At a high-level, here’s what you’ll do:
- Choose a service.
- Refer to the dedicated guide for that service.
- Backup the state file.
- Modify the
terragrunt.hcl
live configuration for it, following the guide. - Validate the backend configuration with
terragrunt state list
. - Validate the inputs with
terragrunt validate-inputs
. - Run the state migration script, if any.
- Sanity check the changes with
terragrunt plan
. - Roll out with
terragrunt apply
.
Some of the services, such as EKS and ASG, have slightly different steps to the above list, so please pay attention to that.
Now for the full-fledged instructions to upgrade a single service:
Check the service’s downstream dependencies. Use the
graph-dependencies
command to create a visual representation. The arrow points from the leaf to the root, toward the dependency. Thus in the graph, the top nodes are leaf nodes and the bottom, root nodes.terragrunt graph-dependencies | dot -Tpng > graph.png
If you get an error that
dot
is not available, install graphviz, which installs thedot
utility.Here is an example of a dependency tree for the `dev` account using Reference Architecture v1.
Ensure the module is updated to the same version used in Reference Architecture version 20201125.
- If you’re running a newer version, continue.
- If you are running an older version, follow the migration guides referenced in the Reference Architecture releases to update to the latest version. This is important because the Service Catalog module references use newer versions from the Module Catalog than what is shipped with v1.0 of the Reference Architecture. Once you’ve upgraded to 20201125, you can automate any state manipulations that are required to update a service using the provided guides and scripts.
Make a backup of the state file using the following command:
terragrunt state pull > backup.tfstate
You can use this to rollback the state to before you attempted the migration with the following command:terragrunt state push "$(pwd)/backup.tfstate"
.- NOTE: Make sure to use the Terraform version that is required for your module, as specified in the required_version configuration of the module.
Modify the
terragrunt.hcl
file to be compatible with the Service Catalog:Change the
include
path tofind_in_parent_folders("terragrunt_service_catalog.hcl")
. This ensures that you use the Service Catalog compatible root config you created in the previous step.Change the
terraform.source
attribute to point to the corresponding Terraform module in theterraform-aws-service-catalog
repo. When updating the source, make sure to set the ref to targetv0.35.5
, unless otherwise noted down below in Appendix: Dedicated service migration guides.
Explanation
This migration guide targets
v0.35.5
of the Service Catalog, unless otherwise noted down below in Appendix: Dedicated service migration guides. Newer versions may require additional state migrations that are not covered by the automated scripts. If you wish to update further, first update tov0.35.5
and then read the migration guides in the release notes of the Service Catalog to bump beyond that version.- Find the dedicated service migration guide for the service.
- Using that guide, update the inputs to adapt to the Service Catalog Terraform module.
- You can use
terragrunt validate-inputs
as a sanity check. - Remove the
dependencies
block, if any. - Use
dependency
blocks. Use the dedicated service migration guide as a reference for what dependency blocks are needed. - Add new required inputs, using
dependency
references as needed. - Remove or rename unused variables.
- Ensure you include inputs for backward compatibility mentioned in the dedicated guide!
Run
terragrunt state list
to sanity check the state backend configuration. Watch for the following:You should NOT get any prompts from Terragrunt to create a new S3 state bucket. If you get the prompt, this means that either you are authenticating to the wrong account, or that the bucket name was misconfigured in the root
terragrunt_service_catalog.hcl
file.You should see resources listed in the state. If the command returns nothing, that means you are not properly linked to the old state file. Double check the
key
attribute of theremote_state
block in the rootterragrunt_service_catalog.hcl
config.
Once you verify the state backend configuration is valid, perform the state migration operations:
Run the provided migration script for the service. Not all services have a migration script. Refer to the dedicated service migration guide for the script to run.
Sanity check the migration operation by running
terragrunt plan
. If the guide states that the upgrade is fully backward compatible, then you should only see backward compatible changes (only~
or+
operations, not-
operations). Otherwise, expect some destroys.NOTE: If you run into any errors related to code verification during provider plugin initialization, you will need to update to the latest terraform patch version that contains the latest terraform GPG key to sign the providers. When updating the terraform version, you also need to run
terragrunt init
to reinitialize the providers. The following lists the minimum patch version that includes the latest GPG key:
Once you’re satisfied with the plan, roll out the changes using
terragrunt apply
.If the service has downstream dependencies, run
terragrunt validate-all
from the ACCOUNT directory to identify any outputs that have changed. Fix the output references on thedependency
block for each error.
Appendix: Dedicated service migration guides
These dedicated guides are meant to be used in tandem with the main detailed guide above. They are stored in the
infrastructure-live-multi-account-acme
repository, which is now archived because it was used to share examples
of Reference Architecture 1.0. You can still interact with the archived repo, and use it to help you upgrade your
existing Reference Architecture.
Exceptions to Service Catalog Versions
The following modules require a different version of the Service Catalog than v0.35.5
to migrate properly:
route53-public
:v0.32.0
- ALB Service Migration Guide
- ASG Service Migration Guide
- Aurora Service Migration Guide
- CloudTrail Service Migration Guide
- cloudwatch-dashboard Service Migration Guide
- ecr-repos Service Migration Guide
- ecs-cluster Service Migration Guide
- ecs-service-with-alb Service Migration Guide
- EKS Service Migration Guide
- iam-cross-account Service Migration Guide
- iam-groups Service Migration Guide
- iam-user-password-policy Service Migration Guide
- Jenkins Service Migration Guide
- kms-master-key Service Migration Guide
- Memcached Service Migration Guide
- OpenVPN Server Service Migration Guide
- RDS Service migration Guide
- Redis Service Migration Guide
- Route 53 (private) Migration Guide
- Route 53 (public) Migration Guide
- sns-topics Service Migration Guide
- VPC (app) Migration Guide
- VPC (mgmt) Migration Guide