slide

Validating ia c with terraform and git hub actions

Ned Bellavance
9 min read

Cover

This post is an accompaniment to my second appearance on the Azure DevOps Labs YouTube channel. My presentation focused on validating your Terraform code as part of a GitOps workflow. During the demonstration I ran into an unexpected policy violation from Checkov. I thought I would review what I presented in the demo, and the resolution of my policy violation. It was not what I expected!

Why Validate?

One thing that host, April Edwards, brought up during my presentation that I completely failed to communicate was the why of validation. I presumed the why was obvious, but she reminded me that not everyone has been in the IaC space long enough to grok why they should be validating their code or what validation even means.

When you are writing IaC using Terraform, or any other language, the point is to reliably deploy and manage infrastructure using software development practices. The first time you code something up in Terraform, it probably won’t deploy correctly. That’s okay! You’re not bad at this coding thing. Mistakes are common and inevitable, so software developers have created tools to check for common mistakes and catch them before they make their way into production. Starting with the humble linter, moving to more complex syntax and logic validation, and stretching all the way to static and dynamic code testing.

The goal of all this testing is to catch and resolve issues as early as possible in the development process. It’s much easier to fix something in your local code editor than trying to hot-patch in production while everything is on fire!

There are many kinds of tests and validation, but for my demo I covered three areas: formatting, syntax, and security.

Formatting

Properly formatted code doesn’t just look pretty, although I do find it deeply satisfying. It also makes it easier to read and catch common errors. When you have multiple developers all working on the same codebase with their own personal style of formatting, reading through the code can be a nightmare. Consistent formatting lowers the noise level and let’s you read through the code regardless of who wrote it. In the world of Terraform, this is accomplished with the terraform fmt command.

Syntax

There are two kinds of syntax that you can validate with Terraform code. One is basic language syntax, like remembering to close a curly brace in a configuration block. The linter in your code editor of choice should pick up on this type of error easily.

The second syntax error is more akin to logical issues, where you reference an attribute that doesn’t exist or have more than one resource with the same name label. You haven’t messed up the language syntax, but your code is still invalid. Terraform will pick up on these issues when you run a terraform plan or apply, but you can check beforehand using terraform validate.

Running validate will check both the language syntax and the logic of the code. You must run terraform init first, because validate will also inspect the syntax of providers and modules you use in the code. I make it a habit of running validate early and often when I’m developing my Terraform configurations. It’s much faster than running a plan and requires less inputs.

Security

When you are deploying infrastructure in an organization, they will likely have best practices and policies for how things should be configured. Those policies can be expressed in code and used to evaluate a Terraform configuration. Tools that look at the configuration code or a plan generated by the code are thought of as static code analysis tools. Dynamic code analysis tools will actually provision the infrastructure and test directly against what has been deployed. For the demonstration, I showed how you could use Bridgecrew’s Checkov static code analysis tool to check your Terraform code against their list of best practices for Terraform and Azure.

Checkov will flag common security issues, like having the remote desktop port 3389 open to the world or not enabling HTTPS on an Azure Web Application. The list of checks is [fairly exhaustive(https://www.checkov.io/5.Policy%20Index/terraform.html)] and growing everyday. In fact, it changes often enough that it caught me out during my demonstration!

The Demonstration

The goal of the demo was to show how formatting, validation, and security checks could be integrated into a GitOps style workflow with GitHub Actions. If you want to check out the code for yourself, go ahead and fork the repo and try it out!

The starting configuration has GitHub Actions triggers for commits to the non-default branch, pull requests on the default branch, and commits to the default branch.

When an engineer wants to update the Terraform code, they will create a new branch from the default branch and update the code. Then they will commit the change and push it up to GitHub. The commit to a non-main branch will trigger a GitHub action workflow that runs both terraform fmt and terraform validate against the code. If either of these tasks fails, then the code needs to be revised before a pull request can be created.

Once the changes are complete and the code has been properly formatted and validated, the next step is to create a pull request to merge the feature branch to the default branch. This triggers a task in GitHub actions which runs terraform plan and Checkov. The results of both are added as a comment to the pull request. The person reviewing the pull request can see what Terraform will change in the target environment by looking at the plan output, and verify that the code is following best practices by looking at the Checkov results. If the results are not satisfactory, they can ping the original engineer to make updates to their code to bring it into spec.

If the results look good, the pull request reviewer can merge the pull request, which effectively creates a commit on the default branch. This will trigger a task in GitHub Actions to run a terraform apply with the merged code. That should make the desired changes seen in the plan output in the target environment.

Whoops!

During the pull request review portion of my demo, I expected to have a single Checkov policy failure, specifically CKV_AZURE_14 which checks to make sure that all HTTP requests to the Web App are redirected to HTTPS. Then I was going to fix the issue in code and voila all the checks would pass. But of course that is not what happened! Imagine my horror when I looked at the results and there were two failures (CKV_AZURE_14 and CKV_GIT_4). Of course, when I had checked the demo the week before CKV_GIT_4 didn’t exist!

I think I kept my cool pretty well in the video - things going wrong during a demo is neither unexpected or new to me. But afterwards I wanted to know what this failure was and whether I needed to fix something in the code or add it to the list of ignored policies. The policy in question, CKV_GIT_4 refers to a check against secrets created in a GitHub repository by Terraform. Here’s the relevant code that was found to be in violation:

resource "github_actions_secret" "actions_secret" {
  for_each = {
    STORAGE_ACCOUNT     = azurerm_storage_account.sa.name
    RESOURCE_GROUP      = azurerm_storage_account.sa.resource_group_name
    CONTAINER_NAME      = azurerm_storage_container.ct.name
    ARM_CLIENT_ID       = azuread_service_principal.gh_actions.application_id
    ARM_CLIENT_SECRET   = azuread_service_principal_password.gh_actions.value
    ARM_SUBSCRIPTION_ID = data.azurerm_subscription.current.subscription_id
    ARM_TENANT_ID       = data.azuread_client_config.current.tenant_id
    TERRAFORM_VERSION   = var.terraform_version
  }

  repository      = var.github_repository
  secret_name     = each.key
  plaintext_value = each.value
}

In particular, the plaintext_value argument was considered insecure for use. Naturally, I went to the github_actions_secret resource to learn more about the argument and what alternatives there were. According to the docs, the plaintext_value argument is “(Optional) Plaintext value of the secret to be encrypted.” Hrm, that doesn’t sound terribly insecure. The value is in plaintext, but GitHub is going to encrypt it and send it securely with HTTPS right?

The alternative was the encrypted_value argument which is “(Optional) Encrypted value of the secret using the Github public key in Base64 format.” So in this case, the secret value is encrypted using the GitHub public key and encoded in Base64 before being sent to GitHub. Ostensibly, GitHub wouldn’t need to encrypt the value once it is received and could simply store it as is, using the private key later to decrypt the value once it is needed.

But how am I supposed to encrypt a value like the azuread_service_principal_password.gh_actions.value that is generated within the same Terraform configuration? Terraform does not have an encryption function, so I would need to have a null resource that invokes an local-exec provisioner to encrypt the value with a script and then Base64 encode it. Before I went any further, I went to look at the issues in the GitHub repo for the GitHub provider to see what other folks were doing. Sure enough there was an open issue about how to obtain the value for the encrypted_value argument. I added a comment asking if there was any benefit from a security perspective in using the encrypted_value over the plaintext_value.

The answer I received was that the only reason to use the encrypted_value argument would be for situations where you didn’t want the value to be stored in Terraform state unencrypted. In my case, the values are coming from the other resources in the same configuration, so they’ll be in the state data regardless. If I were going to pass a value as a variable into Terraform, I could encrypt it ahead of time so that it is never stored in plaintext in the state data.

I mentioned in the comment thread that the plaintext_value was being flagged as insecure by CKV_GIT_4, and one of the maintainers of the GitHub provider filed an issue in the Checkov repo to change the behavior of CKV_GIT_4. The check was adjusted to only trigger when a value is submitted as a variable and not from another resource/data source in the configuration and the issue was closed!

What’s the point of this story? I want to highlight a few things. I could have simply added CKV_GIT_4 to my list of ignored checks and moved on with my life, but digging deeper to understand the check and the Terraform resource enhanced my knowledge of GitHub secrets and security considerations around submitting data through Terraform. Because both the GitHub provider and Checkov are open source, I was able to bring up the issue and get it resolved quickly. The community around both Terraform and Checkov are incredibly helpful and diligent. The entire process took about a day! Now the check is more targeted and useful and my code no longer fails against it.

If you are using Terraform or Checkov, rest assured that you have the supported of an amazing community. They can help you conquer whatever challenges you encounter when trying to adopt IaC and automation for your organization.

Conclusion

Hopefully you enjoyed my presentation and I hope that you’ll take the demo for a spin on your own. I also hope that my little victory with CVK_GIT_4 inspires you to dig a little deeper when you encounter an issue in an open-source product. By filing an issue or commenting on an existing one, you can help make the tools we all use a little bit better!