Microsoft recently announced the general availability of OIDC authentication for GitHub Actions using Azure AD. Naturally, I immediately thought of how I could use this to remove static credentials from my GitHub Actions workflows that deploy Terraform configurations. I could use a service principal and OIDC for deployment of the Terraform configuration as well as the storage of state data in an Azure Storage account.
The actual process of using OIDC with Azure AD and Terraform will be the subject of an entirely different post. In this post, I wanted to explore how to use the OIDC authentication with the azurerm
backend, and some considerations depending on your state data storage structure.
If you’re just looking for the conclusion, here it is. To scope permissions at the container level when using OIDC authentication for the azurerm
backend, enable both OIDC and Azure AD authentication using the use_azuread_auth
and use_oidc
arguments or the ARM_USE_OIDC
and ARM_USE_AZUREAD
environment variables.
If you don’t, you will get an access error that the ListKeys
action failed. You’re welcome.
The azurerm
backend stores state data in an Azure storage blob in a container in a storage account. There are many ways you can authenticate to the Azure storage API to execute the necessary actions:
The OIDC option was introduce in a recent version of Terraform, since the backend code is part of the core Terraform binary and not part of a provider. To use OIDC authentication, you will need to configure the azurerm
backend, either by including the information in the backend block or by setting environment variables. Here is an example backend block:
terraform {
backend "azurerm" {
resource_group_name = "sa-rg"
storage_account_name = "storageaccountname"
container_name = "tfstate"
key = "terraform.tfstate"
use_oidc = true
subscription_id = "00000000-0000-0000-0000-000000000000"
tenant_id = "00000000-0000-0000-0000-000000000000"
client_id = "00000000-0000-0000-0000-000000000000"
}
}
You can replace some of the argument with environment variables:
Argument | Environment Variable |
---|---|
subscription_id | ARM_SUBSCRIPTION_ID |
tenant_id | ARM_TENANT_ID |
client_id | ARM_CLIENT_ID |
use_oidc | ARM_USE_OIDC |
The rest of the arguments can be specified at run time when you initialize Terraform using the -backend-config
option for each argument.
One question you might ask is, how do I properly configure permissions on the storage account to adhere to the principle of least privilege? And the answer - as always - is it depends (TM). To better understand why it depends, you need to know what Terraform is doing when it leverages the azurerm
backend.
Assuming you’re using a configuration block similar to what you see above, Terraform will take the following actions:
What I want to highlight here is that Terraform is going to retrieve the access keys for the storage account, and then use those keys to perform operations. This means two important things:
ListKeys
permission on the storage account to get those keysFor those that are not familiar, access keys are the equivalent of having root on the storage account. Those keys can do anything on the account.
I was… not excited about Terraform needing effectively root access to the storage account to write data to a storage blob. That seems like overkill. In case you don’t believe me, you can check out the source code right here.
Let’s assume for a second you are using a separate storage account to house state data for each Terraform configuration. Well then, I suppose we don’t really have a problem. I’m still not wild about the godlike power Terraform has over the storage account, but it isn’t going to affect anything but its own configuration.
What if you wanted to use a single storage account to house state data for multiple instances of a Terraform configuration? Well now we have a possible problem. Each instance now has the permissions to overwrite or delete anything in the storage account, even if you are using different service principals for each instance. Your development environment service principal has the ability to delete production state data. That’s… bad. And you can’t restrict it with Azure IAM, because it will be using the access key, which doesn’t check Azure permissions.
What’s a sad Azure boy to do?
If you’re using a service principal or the Azure CLI to authenticate to your azurerm
backend, then you will see the behavior I am describing. But there is another way! Actually there’s two:
Since we are using OIDC, there is no option to generate a SAS token, but we can use Azure AD authentication! (I think the Azure Storage API may be using SAS tokens under the covers, but that’s just speculation). In that case, instead of trying to get the access keys for the storage account, the backend simply tries to access the container and blobs in the container. Instead of assigning our service principal rights to ListKeys
on the storage account, we can narrow down the scope of permissions to the container level (the lowest level you can scope a permission on Azure Storage).
You can either add the following argument to your backend block:
terraform {
backend "azurerm" {
use_azuread_auth = true
}
}
Or set the environment variable ARM_USE_AZUREAD
to TRUE
.
It is now possible to use the same storage account for as many instances of state data you want, with each instance residing in its own storage container. Assuming you’re using a different service principal for each environment (dev, stage, prod, etc.), you can assign narrowly scoped rights to each service principal that only allows access to the state data corresponding to that environment.
According to the HashiCorp docs, you need to grant Storage Blob Data Owner
permissions to the service principal when you’re using Azure AD auth. In practice, I have found that Storage Blob Data Contributor
is sufficient. We don’t need our service principal to alter permissions.
To sum up, enable OIDC and Azure AD authentication on the backend, and assign the service principal the Storage Blob Data Contributor
role scoped to the container that will house state data.
Two additional things of note: workspaces and SAS tokens.
The container level is the narrowest scope you can define for Azure IAM on a storage account. You cannot set individual permissions on storage blobs. Savvy readers might take note that if you’re using workspaces, this presents a problem. When using the azurerm
backend, each workspace resides in the same container, with the workspace name added to the blob name. Whatever service principal you use for each workspace will have permissions to alter the state data for all workspaces in the container. That is less that ideal.
Of course, these days I would recommend against using workspaces for most situations. There are better patterns for managing multiple environments with the same configuration, most notably using branches in your source control for each environment. Each environment can use a different storage container and the problem is solved. Be on the lookout for a blog post and video about exactly that.
In my humblest of opinions, I would prefer to use a SAS token over a service principal. The SAS token gives you an even higher level of control regarding access. Unlike Azure IAM permissions, you can scope a token to a specific storage blob, meaning you can use it with workspaces and prevent one workspace from accessing the state data of another.
The SAS token supports setting permissions, start and stop time, and source IP address as additional controls on the token. Given the sensitive nature of state data, I like the idea of being able to grant limited access for a short duration of time.
The downside is that you’ll need a portion of your automation workflow that can generate a SAS token before a Terraform run starts. There’s plenty of example scripts that do exactly that, but it is another piece of your pipeline that you need to maintain.
What's New in the AzureRM Provider Version 4?
August 27, 2024
Debugging the AzureRM Provider with VSCode
August 20, 2024
State Encryption with OpenTofu
August 1, 2024
July 24, 2024