We have used Terraform to stand up a typical collection of systems and Ansible to configure them. What we haven't done is tied those two together into a single step. Because showing this has some additional requirements and permissions, I am going to run this as a demonstration from my laptop.
To make our system more realistic, I'll put a load balancer in place in front of the web servers. Other than that, I'll just repeat the previous lesson, but add in the pieces to run Terraform and Ansible together.
I have prepared a video of these steps to accompany the descriptions below if you want to see them in action. I regularly update the demonstration, so the video might be for an older version and it may not match exactly.
I've installed Ansible locally. I'll also need to install the Ansible roles and collections locally as well, since they were only on our control node before.
- geerlingguy.nginx role
- geerlingguy.php role
- community.mongodb collection
$ ansible-galaxy install geerlingguy.nginx
...
- geerlingguy.nginx (3.1.4) was installed successfully
$ ansible-galaxy install geerlingguy.php
...
- geerlingguy.php (5.0.0) was installed successfully
$ ansible-galaxy collection install community.mongodb
Process install dependency map
Starting collection install process
Installing 'community.mongodb:1.6.0' to '/home/ggotimer/.ansible/collections/ansible_collections/community/mongodb'
Installing 'ansible.posix:1.5.4' to '/home/ggotimer/.ansible/collections/ansible_collections/ansible/posix'
Installing 'community.general:7.0.1' to '/home/ggotimer/.ansible/collections/ansible_collections/community/general'
Since Ansible will need to ssh
from the local system to the systems in AWS,
I'll configure the SSH key I was using on the control node to be my default key.
$ eval $(ssh-agent)
Agent pid 7953
$ ssh-add /home/ggotimer/.ssh/gene-test-us-east-2.pem
Identity added: /home/ggotimer/.ssh/gene-test-us-east-2.pem (/home/ggotimer/.ssh/gene-test-us-east-2.pem)
$ ssh-add -l
2048 SHA256:BdmxwBiUP11ZL1X1Qw6M3st8k7nWVGWkNCIpj3vhkmc /home/ggotimer/.ssh/gene-test-us-east-2.pem (RSA)
Terraform was always running locally, so there isn't anything extra to prepare for this demo.
This demonstration will take about 10 minutes to complete, so I'll kick it off first and then describe the important changes.
$ terraform init
...
$ terraform apply
...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
...
That's it. And it is actually vital that is all we need to do. A prime benefit of using these tools and infrastructure-as-code in general is making our jobs automated and therefore repeatable and reliable. And they can't very well meet those expectations if there are a bunch of manual steps.
There are a few changes in this lesson that should be explained.
In Lesson 04, we set up SSH access from the internet to the control node for us and then from the control node to the target web servers and database for Ansible. If I am running Ansible from my laptop, I'll need to open SSH access to all the target systems.
Opening SSH from the internet in general to these systems is more open than I am
comfortable with. So in network.tf, I use the http
provisioner
in Terraform to get my current public IP address from https://ifconfig.me/.
When I set up the security groups for SSH, each only allows access from my
current address. And if I move to a new IP address I can just terraform apply
again to modify the security group to allow access only from my new address.
We set up two web servers in the previous Ansible lessons. There isn't much real world value in that unless we had a load balancer in front of them to distribute the load between them.
Setting up a load balancer is a little tedious, setting up:
- the application load balancer (ALB) instance in multiple availability zones
- security groups into the ALB
- security groups from the ALB to the target systems
- the group of target systems (target group)
- the ALB listener for the port to a target group
The example code in load-balancer.tf takes some significant shortcuts, but it is a good starting point. The HTTP listeners should redirect to HTTPS and host all traffic there. There should be a web application firewall (WAF) protecting the ALB from attack. The ALB is available in multiple availability zones (AZs), but there should be web servers and databases in those AZs as well for true resiliency. The ALB should have logging and deletion protection turned on.
Feel free to use it as a starting point, but remember it is just a starting point.
We tagged all our instances and most resources earlier, but it was just for reference before. Now, it will take on some strong purpose.
tags = {
Name = "planet-mongodb"
Project = "planets"
Environment = "demo"
Role = "database"
Owner = var.owner_email
}
If we look at the database tags, we see the Name
and Owner
as before. Those
are good practice. But we've hard-coded the Project
(it was a variable before),
and added Environment
and Role
. The names and values are up to us, but
almost everyone will need something along these lines.
We are going to generate our Ansible inventory based on the Project
and
Environment
, using the Role
tag to name the groups for Ansible.
Here is where the changes start getting really interesting.
Our old inventory looked like:
[workstation]
10.8.0.26
[targets]
10.8.0.10
10.8.0.41
10.8.0.178
[all:children]
workstation
targets
[webserver]
10.8.0.41
10.8.0.178
[database]
10.8.0.10
Particularly important to Ansible were the webserver
and database
groups.
For each of the playbooks we used, nginx-playbook.yml
and mongodb-playbook.yml
,
we ran them against our entire system and let the playbook pattern (i.e., the hosts
line) determine what systems to target.
For example, mongodb-playbook.yml
starts with:
---
- hosts: database
so it targeted 10.8.0.10
. That's convenient, since we can point our playbooks
at all of our servers and just let each playbook sort out which servers to run
against.
Our problem is that the IP addresses were determined when Terraform ran. So we needed to look at the output, set up our inventory file, and then run Ansible.
With the tagging we've added and a dynamic inventory plugin we can let Ansible
query our AWS account using the same credentials we set up for Terraform. Since
it queries our entire AWS account, not limited to the systems that Terraform
just stood up, we will filter out only the systems in the Project
and
Environment
we are applying Ansible to. And then we will use the Role
tag as
the key for our groups.
I created inventory-aws_ec2.yml with those filters and
grouping instructions. (The file name was up to me, as long as it ended with
aws_ec2.yml
or aws_ec2.yaml
.)
---
plugin: aws_ec2
regions:
- us-east-2
hostnames:
- ip-address
filters:
tag:Project: planets
tag:Environment: demo
keyed_groups:
- key: tags.Role
separator: ''
- key: tags.Environment
prefix: env
Since I am using this from my laptop, the IP addresses will be the public addresses rather than the private ones. Also, we can't just look at a file to see the results since this is truly dynamic. No file is written out- the plugin determines the inventory at runtime. Instead, I can run a command to see a representation of the current inventory.
$ ansible-inventory -i inventory-aws_ec2.yml --graph
@all:
|--@aws_ec2:
| |--18.220.32.12
| |--18.221.23.189
| |--3.145.43.100
|--@database:
| |--3.145.43.100
|--@env_demo:
| |--18.220.32.12
| |--18.221.23.189
| |--3.145.43.100
|--@ungrouped:
|--@webserver:
| |--18.220.32.12
| |--18.221.23.189
The @
symbols are just part of the graph representation, not part of the group
names.
For my use, this generates the same group names (database
and webserver
) as
we used in the previous lesson. As such I can use the playbooks exactly as they
were.
The aws_ec2
inventory
plugin has many options. Also, the playbook
patterns
can be more than just a single group. With a well-planned tagging strategy, a
single inventory plugin configuration could take care of your entire AWS
ecosystem, making it easy to target the right servers for any project in any
environment. For exmaple, env_demo:&webserver
for web servers in the demo
environment. Or database:!env_prod
for databases not in the prod
enviroment,
if we had such an environment and had not filtered it out.
Now that we've handled the inventory using automation, we just need to call Ansible once Terraform is finished. We could just wrap the Terraform and Ansible commands in a shell script, but there is a more flexible and elegant way.
At the end of main.tf, I added a resource with a call to the
local-exec
provider.
resource "null_resource" "ansible" {
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=false ansible-playbook --inventory inventory-aws_ec2.yml --user ubuntu site.yml"
}
triggers = {
always_run = timestamp()
}
depends_on = [
aws_instance.webserver,
aws_instance.mongodb
]
}
The null_resource
is handled by Terraform like any other resource, except it
has no action until we give it one. The local-exec
runs locally on my laptop,
as opposed to a remote-exec
which runs on the resource. We used the
remote-exec
to bootstrap Ansible on our control node in Lesson 04.
The triggers
argument tells Terraform when this resource needs to run or not.
In this case, as long as the time changes the provisioner will run. That might
seem anti-idempotent, but as long as our Ansible code is idempotent itself,
calling it each time is cleaner than jumping through hoops to determine if one
of the resources and/or playbooks changed. Plus, if someone or something makes
a change on one of the systems, Ansible will figure out that it needs to reapply
the configuration.
The depends_on
argument tells Terraform that I need those resources built
before this resource is applied. Most of the time Terraform figures that out on
its own, but with the null_resource
we need to explicitly spell it out.
Ansible needs the webserver and database EC2 instances, so I'll call them out
as dependencies.
The Ansible command line looks a little more involved than we used before, but
it is really just putting the options from the ansible.cfg
directly into the
command. That way I don't have to rely on a local configuration file.
The last change is the site.yml playbook.
---
- hosts: all
gather_facts: false
become: true
tasks:
- name: Wait 600 seconds for targets to become reachable/usable
ansible.builtin.wait_for_connection:
- name: Update repositories caches
ansible.builtin.apt:
update_cache: yes
- import_playbook: nginx-playbook.yml
- import_playbook: mongodb-playbook.yml
It imports the two playbooks we used before so that I can call them from one
place. But before it invokes them, it waits until the target systems are responding
until it kicks them off. That way the Ansible commands don't start running until
the system is ready. Internally, wait_for_connection
relies on the
ping
module we used earlier. By default, it will wait for up to 10 minutes
(600 seconds) until it times out. This playbook also forces an update to the
package repository cache on each system, to make sure the following playbooks
can find any packages they need.
The result of my single terraform apply
is that all three systems plus a
load balancer was stood up and Ansible configured the web servers and the
database.
$ terraform apply
...
Outputs:
planets_url = "http://planet20220519132008987000000001-1532753543.us-east-2.elb.amazonaws.com/"
webserver_private_ips = [
"10.8.17.120",
"10.8.18.228",
]
webserver_public_ips = [
"18.220.32.12",
"18.221.23.189",
]
When I point a browser to that long, unwieldy URL, I see the same planets demo as before, with the addition of Pluto just so I can see a difference.
If I hit this URL from multiple locations, we could see the private IP address for the webserver change between the two web servers, thanks to the load balancer.
As mentioned earlier, there are some shortcuts in this demo. Before using it as an example for a production system, remember that this was a demonstration.
An easy way to see some of the IaC shortcuts is to look at the warnings that were suppressed from Checkov:
$ checkov -d .
...
Check: CKV_AWS_91: "Ensure the ELBv2 (Application/Network) has access logging enabled"
SKIPPED for resource: aws_lb.webserver_alb
Suppress comment: No logging since it is a demo- not good for production
File: /load-balancer.tf:53-72
Check: CKV_AWS_150: "Ensure that Load Balancer has deletion protection enabled"
SKIPPED for resource: aws_lb.webserver_alb
Suppress comment: Allow deletion since it is a demo- not good for production
File: /load-balancer.tf:53-72
Check: CKV_AWS_2: "Ensure ALB protocol is HTTPS"
SKIPPED for resource: aws_alb_listener.webservers_http
Suppress comment: HTTP only for demo- not good for production
File: /load-balancer.tf:93-104
Check: CKV_AWS_103: "Ensure that load balancer is using TLS 1.2"
SKIPPED for resource: aws_alb_listener.webservers_http
Suppress comment: HTTP only for demo- not good for production
File: /load-balancer.tf:93-104
Check: CKV_AWS_88: "EC2 instance should not have public IP."
SKIPPED for resource: aws_instance.webserver
Suppress comment: Allowing public access for Ansible
File: /main.tf:21-51
Check: CKV_AWS_88: "EC2 instance should not have public IP."
SKIPPED for resource: aws_instance.mongodb
Suppress comment: Allowing public access for Ansible
File: /main.tf:53-82
Check: CKV2_AWS_11: "Ensure VPC flow logging is enabled in all VPCs"
SKIPPED for resource: aws_vpc.sandbox_vpc
Suppress comment: Skipping logging to make permissions easier- not a generally good idea
File: /network.tf:24-33
Check: CKV2_AWS_20: "Ensure that ALB redirects HTTP requests into HTTPS ones"
SKIPPED for resource: aws_lb.webserver_alb
Suppress comment: HTTP only for demo- not good for production
File: /load-balancer.tf:53-72
Check: CKV2_AWS_28: "Ensure public facing ALB are protected by WAF"
SKIPPED for resource: aws_lb.webserver_alb
Suppress comment: No WAF since it is a demo- not good for production
File: /load-balancer.tf:53-72
In many cases, the Bridgecrew documentation will describe how to fix the issue if you just search for the Checkov finding ID. But often those fixes will introduce new issues that need to be resolved. It isn't endless, but making sure you have covered all the recommended practices can be tedious, which is why we use tools to check our code and ensure we haven't missed anything.
When we are done, we can clean up the environment as usual. From our laptop:
$ terraform destroy
This is the final lesson, at least for now.
Name | Version |
---|---|
terraform | >= 1.8.3 |
aws | ~> 5.49.0 |
http | ~> 3.3.0 |
null | ~> 3.2.1 |
Name | Version |
---|---|
aws | ~> 5.49.0 |
http | ~> 3.3.0 |
null | ~> 3.2.1 |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
aws_profile | Local AWS profile to use for AWS credentials | string |
"default" |
no |
aws_region | AWS region to build in | string |
n/a | yes |
az_number | n/a | map |
{ |
no |
key_name | Name of an already-installed AWS keypair | string |
n/a | yes |
owner_email | Email address to tag resources with | string |
n/a | yes |
planet_az | Availability zone to run the demo in | string |
"us-east-2a" |
no |
Name | Description |
---|---|
planets_url | URL for the planets demo |
webserver_private_ips | Private IP addresses of the NGINX webservers |
webserver_public_ips | Public IP addresses of the NGINX webservers |