slide

Replacing the template cloudinit config data source

Ned Bellavance
8 min read

Cover

I was working on updating some Terraform code as part of a consulting engagement and I came across an EC2 configuration that was using the template_cloudinit_config data source to create user_data to send to the instance. Since I know that the template provider has been archived by HashiCorp and the recommendation is to use the templatefile function, I endeavoured to replace the template_cloudinit_config data source with templatefile and that is where I fell down a rabbit hole of the MIME format, cloud-init picadillos, and nested templates.

I thought I would write a post about my little adventure and the eventual workaround. If you don’t care and just want the answer, feel free to skip to the end or simply use the module I wrote to solve this problem.

The Template Provider

The template provider has been archived by HashiCorp in favor of the templatefile function. To understand why, you can check out a whole video I did on it, but I can quickly summarize here. The template provider has a data source called template_file which will render text based on a template and variable inputs. Since you’re using a provider, Terraform has to hand off the work to a provider plugin. The templatefile function does exactly the same thing, but because it’s a function, it is included in the Terraform binary.

The evaluation and execution time for a function is much faster than a provider plugin. With the introduction of the templatefile, the template_file data source was no longer required. However, there is another data source in the template provider that doesn’t have a comparable function, template_cloudinit_config.

The other data source in the template provider is template_cloudinit_config. To support folks who want to use that data source, HashiCorp created the cloudinit provider with a single data source called cloudinit_config. Essentialy, it functions exactly like the template_cloudinit_config data source, but it’s in a new provider that is being actively maintained by HashiCorp.

DIY Cloud-Init

But wait. If the templatefile function is faster than the template_file data source, wouldn’t the same be true for the template_cloudinit_config data source? Unfortunately, there is no templatecloudinit function, so how can I create the same thing using functions? First we need to understand what is being created by the template_cloudinit_config data source and recreate it.

MIME

The template_cloudinit_config data source creates a multipart MIME configuration for cloud-init. This is the moment I realized I was in for a yak-shaving expidition. What the hell is MIME? And what it multipart about it? And what does it have to do with cloud-ini? MIME at least sounds familiar.

MIME is the multipurpose internet mail extensions standard created for handling mail messages that use non-ASCII characters and to support attachments. As a former Exchange Admin I remember seeing MIME from time to time in various menus and dropdowns, but I never had to do anything with it.

Even though MIME was originally intended for email messages, it has been adapted for use in HTTP communcation and the cloud-init standard. In addition to supporting different media types, MIME allows you to construct a single configuration that contains multiple parts, each with their own content type. So there we have it, multipart MIME. Now, what does that have to do with cloud-init? Further down the rabbit hole we go!

Cloud-init is an industry standard used to initialize compute instances in a cloud by reading in information like cloud metadata, user data, and vendor data. User data is provided by the client to initialize the system after the cloud metadata portion is complete.

User data must be in a multipart MIME format and optionally gzipped to keep the user-data content under the 16KB limit. Cloud-init supports multiple content types in the MIME configuration including cloud-config, jinja2, and x-shellscript. If you don’t know what any of those are, don’t worry, the official cloud-init docs have you covered.

To sum up, multipart MIME is a format originally intended for email messages, but adatped for use by cloud-init to assist with configuring compute instances on first boot. The template_cloudinit_config data source creates a multipart MIME configuration. We need to understand the format to recreate it without the data source.

Multipart MIME Format

If I want to natively produce multipart MIME content using Terraform functions, I will need to know what the resulting content looks like. The general format is something like this:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=MIMEBOUNDARY

This is the beginning of the cloud-init, followed by a boundary delimiter for the next part.
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0

YAML for the cloud-config

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

Bash script to run
--MIMEBOUNDARY--

That should be pretty easy to replicate. I can use the templatefile function for each part of the MIME content and build it inline using standard Terraform constructs like for expressions.

The last thing to cover is the encoding. The template_cloudinit_config data source gives the option to compress with gzip and encode with base64. Fortuantely, Terrform has a base64gzip function which will take care of that for me.

Building the MIME Content

Here’s the orginal code that uses the template_cloudinit_config data source:

data "template_file" "cloud_init" {
  template = file("cloud-init.yaml")

  vars = {
    package_update  = "true"
    package_upgrade = "false"
  }
}

data "template_file" "x_shellscript" {
  template = file("startup-script.sh")
  vars = {
    name = "Arthur"
  }
}

data "template_cloudinit_config" "config" {
  gzip          = true
  base64_encode = true

  part {
    content_type = "text/cloud-config"
    content      = data.template_file.cloud_init.rendered
  }

  part {
    content_type = "text/x-shellscript"
    content      = data.template_file.x_shellscript.rendered
  }
}

We’re using the template_file data source twice and the template_cloudinit_config data source once. The goal is to replace all of those with native functions. First we need to build the parts for cloud-config and x-shellscript. Ideally, this should be extensible, so if someone wants to add more parts, it’s pretty easy to do so.

The parts information includes the content type, content from a file, and variables for that file. We can store that with a list of objects stored in a local value:

locals {
    cloud_init_parts = [
        {
            filepath = "cloud-init.yaml"
            content-type = "text/cloud-config"
            vars = {
                package_update  = "true"
                package_upgrade = "false"
            }
        },
        {
            filepath = "startup-script.sh"
            content-type = "text/x-shellscript"
            vars = {
                name = "Arthur"
            }
        }
    ]
}

We can add more parts by adding another object to the cloud_init_parts list. Next we need to render each part into the content used by the multipart MIME format. Using a local value and a for expression, each part can be stored in a list as a string.

locals {
    cloud_init_parts_rendered = [ for part in local.cloud_init_parts : <<EOF
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: ${part.content-type}
Mime-Version: 1.0

${templatefile(part.filepath, part.vars)}
    EOF
    ]
}

Finally, we need to put it all together with the header and footer of the format. I created a cloud-init.tpl file with a for expression we can pass our rendered parts to:

Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

%{~ for part in cloud_init_parts ~}
${part}
%{~ endfor ~}
--MIMEBOUNDARY--

Using a combination of the templatefile and base64gzip functions, we have the final product:

locals {
    cloud_init_gzip = base64gzip(templatefile("cloud-init.tpl", {cloud_init_parts = local.cloud_init_parts_rendered}))
}

Voila! The local value cloud_init_gzip can be used in place of the rendered content from the template_cloudinit_config data source. And I didn’t have to use any provider plugins to do it.

Results

You might be wondering if all this mucking about was worth it. I mean there is a perfectly good cloudinit provider. Why not just use that and call it a day? That’s fair! In part, I just wanted the challenge of doing it with native functions. But there are two other considerations here. First, we’ve removed a dependency on a plugin. That’s one less codebase we have to trust and pull on each terraform init.

Second, in theory using the native functions should be faster than the provider plugin. So, is it? Yes!

Here’s a run of the original code:

$ time terraform apply -auto-approve

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

real    0m4.598s
user    0m1.267s
sys     0m1.039s

And here’s a run of the updated code:

$ time terraform apply -auto-approve

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

real    0m0.197s
user    0m0.013s
sys     0m0.099s

So um, yeah. It’s a bit faster. Does saving ~4s matter in the grand scheme of things? Not at my scale. But imagine if you’ve got a large configuration that needs to render multiple template provider data sources on every run. And that configuration is baked into a CI/CD pipeline that runs every time someone opens a PR or makes a commit. The time savings could start to stack up!

Conclusion

Replacing the template_file data source with the templatefile function is a slam dunk in terms of simplicity and support. But getting rid of the template_cloudinit_config data source is less straightforward. While you could use the cloudinit provider, there’s an opportunity to save time and remove a dependency if you’re willing to do a little extra work. And I kinda did the extra work for you!

In fact, if you’d like to consume this as a module, you can do exactly that: https://registry.terraform.io/modules/ned1313/native/cloudinit/latest.

Of course that introduces a new dependency on an external module, so that’s entirely up to you. But at the very least, you’ll still get the performance improvements.