slide

Using oidc authentication with the azure rm backend

Ned Bellavance
7 min read

Cover

Microsoft recently announced the general availability of OIDC authentication for GitHub Actions using Azure AD. Naturally, I immediately thought of how I could use this to remove static credentials from my GitHub Actions workflows that deploy Terraform configurations. I could use a service principal and OIDC for deployment of the Terraform configuration as well as the storage of state data in an Azure Storage account.

The actual process of using OIDC with Azure AD and Terraform will be the subject of an entirely different post. In this post, I wanted to explore how to use the OIDC authentication with the azurerm backend, and some considerations depending on your state data storage structure.

TL;DR

If you’re just looking for the conclusion, here it is. To scope permissions at the container level when using OIDC authentication for the azurerm backend, enable both OIDC and Azure AD authentication using the use_azuread_auth and use_oidc arguments or the ARM_USE_OIDC and ARM_USE_AZUREAD environment variables.

If you don’t, you will get an access error that the ListKeys action failed. You’re welcome.

The AzureRM Backend Authentication

The azurerm backend stores state data in an Azure storage blob in a container in a storage account. There are many ways you can authenticate to the Azure storage API to execute the necessary actions:

  • Azure CLI
  • Service Principal
  • Azure AD
  • OIDC
  • SAS Token
  • MSI (Managed security identity)

The OIDC option was introduce in a recent version of Terraform, since the backend code is part of the core Terraform binary and not part of a provider. To use OIDC authentication, you will need to configure the azurerm backend, either by including the information in the backend block or by setting environment variables. Here is an example backend block:

terraform {
backend "azurerm" {
    resource_group_name  = "sa-rg"
    storage_account_name = "storageaccountname"
    container_name       = "tfstate"
    key                  = "terraform.tfstate"
    use_oidc             = true
    subscription_id      = "00000000-0000-0000-0000-000000000000"
    tenant_id            = "00000000-0000-0000-0000-000000000000"
    client_id            = "00000000-0000-0000-0000-000000000000"
  }
}

You can replace some of the argument with environment variables:

ArgumentEnvironment Variable
subscription_idARM_SUBSCRIPTION_ID
tenant_idARM_TENANT_ID
client_idARM_CLIENT_ID
use_oidcARM_USE_OIDC

The rest of the arguments can be specified at run time when you initialize Terraform using the -backend-config option for each argument.

Configuring Storage Account Permissions

One question you might ask is, how do I properly configure permissions on the storage account to adhere to the principle of least privilege? And the answer - as always - is it depends (TM). To better understand why it depends, you need to know what Terraform is doing when it leverages the azurerm backend.

Assuming you’re using a configuration block similar to what you see above, Terraform will take the following actions:

  1. Authenticate to Azure AD using OIDC and get a token
  2. Use the token to get a token from the Azure Storage API
  3. Use the Azure Storage API token to try and retrieve the access keys for the storage account
  4. Use one of the access keys to perform all subsequent operations on the storage account

What I want to highlight here is that Terraform is going to retrieve the access keys for the storage account, and then use those keys to perform operations. This means two important things:

  1. Terraform has access to do ANYTHING in that storage account b/c it has the access keys
  2. Terraform must have the ListKeys permission on the storage account to get those keys

For those that are not familiar, access keys are the equivalent of having root on the storage account. Those keys can do anything on the account.

I was… not excited about Terraform needing effectively root access to the storage account to write data to a storage blob. That seems like overkill. In case you don’t believe me, you can check out the source code right here.

Let’s assume for a second you are using a separate storage account to house state data for each Terraform configuration. Well then, I suppose we don’t really have a problem. I’m still not wild about the godlike power Terraform has over the storage account, but it isn’t going to affect anything but its own configuration.

What if you wanted to use a single storage account to house state data for multiple instances of a Terraform configuration? Well now we have a possible problem. Each instance now has the permissions to overwrite or delete anything in the storage account, even if you are using different service principals for each instance. Your development environment service principal has the ability to delete production state data. That’s… bad. And you can’t restrict it with Azure IAM, because it will be using the access key, which doesn’t check Azure permissions.

What’s a sad Azure boy to do?

Azure AD Authentication

If you’re using a service principal or the Azure CLI to authenticate to your azurerm backend, then you will see the behavior I am describing. But there is another way! Actually there’s two:

  • Shared Access Signature (SAS) Tokens - These are limited use tokens you can generate for a storage account. They can be scoped based on time, objects, source IP address, and access rights.
  • Azure AD Authentication - Selecting Azure AD authentication uses the Azure IAM permissions associated with the service principal to determine access rights.

Since we are using OIDC, there is no option to generate a SAS token, but we can use Azure AD authentication! (I think the Azure Storage API may be using SAS tokens under the covers, but that’s just speculation). In that case, instead of trying to get the access keys for the storage account, the backend simply tries to access the container and blobs in the container. Instead of assigning our service principal rights to ListKeys on the storage account, we can narrow down the scope of permissions to the container level (the lowest level you can scope a permission on Azure Storage).

You can either add the following argument to your backend block:

terraform {
    backend "azurerm" {
        use_azuread_auth = true
    }
}

Or set the environment variable ARM_USE_AZUREAD to TRUE.

It is now possible to use the same storage account for as many instances of state data you want, with each instance residing in its own storage container. Assuming you’re using a different service principal for each environment (dev, stage, prod, etc.), you can assign narrowly scoped rights to each service principal that only allows access to the state data corresponding to that environment.

According to the HashiCorp docs, you need to grant Storage Blob Data Owner permissions to the service principal when you’re using Azure AD auth. In practice, I have found that Storage Blob Data Contributor is sufficient. We don’t need our service principal to alter permissions.

To sum up, enable OIDC and Azure AD authentication on the backend, and assign the service principal the Storage Blob Data Contributor role scoped to the container that will house state data.

Additional Thoughts

Two additional things of note: workspaces and SAS tokens.

Workspaces

The container level is the narrowest scope you can define for Azure IAM on a storage account. You cannot set individual permissions on storage blobs. Savvy readers might take note that if you’re using workspaces, this presents a problem. When using the azurerm backend, each workspace resides in the same container, with the workspace name added to the blob name. Whatever service principal you use for each workspace will have permissions to alter the state data for all workspaces in the container. That is less that ideal.

Of course, these days I would recommend against using workspaces for most situations. There are better patterns for managing multiple environments with the same configuration, most notably using branches in your source control for each environment. Each environment can use a different storage container and the problem is solved. Be on the lookout for a blog post and video about exactly that.

SAS Tokens

In my humblest of opinions, I would prefer to use a SAS token over a service principal. The SAS token gives you an even higher level of control regarding access. Unlike Azure IAM permissions, you can scope a token to a specific storage blob, meaning you can use it with workspaces and prevent one workspace from accessing the state data of another.

The SAS token supports setting permissions, start and stop time, and source IP address as additional controls on the token. Given the sensitive nature of state data, I like the idea of being able to grant limited access for a short duration of time.

The downside is that you’ll need a portion of your automation workflow that can generate a SAS token before a Terraform run starts. There’s plenty of example scripts that do exactly that, but it is another piece of your pipeline that you need to maintain.