Infra as Code takes too much code

Introducing Stackattack - a high-level AWS component library for Pulumi

I’ve had a strong affinity for infra-as-code tools since I was first introduced to them. I started using Terraform early in my career, and quickly found that I loved the paradigm. The ability to declare the state I want things to be in and have a tool to carefully make the minimum set of changes needed to make that real never stops feeling magical. The elegance of the paradigm fascinated me so much that I even channeled my Covid restlessness into creating my own IaC tool called statey to understand the internals of how these tools worked more deeply.

At my current job at Anzen we’ve been managing 100% of our infrastructure with Pulumi from the very beginning, and while the experience has mostly been very positive it’s also made me realize that this is way harder than it needs to be. Prior to Anzen, I’d mostly used IaC tools for fairly limited use-cases e.g. setting up integrations with third-party tools or managing database users, and I’d always found the process of getting it working to be very enjoyable. It always took a few tries to get things set up correctly, but I always learned some things along the way and didn’t mind the time investment. I’ve found through managing Anzen’s infrastructure that as the complexity of your infrastructure grows, this time investment quickly becomes very daunting. AWS’s services are designed in such a way that getting high-level things working end to end requires many, many resources connected together in very specific and often nuanced ways. On one hand, this is precisely the type of thing that IaC tools are designed to handle very well—they allow you to model these dependencies in a first-class way and they’ll ensure that given a correct configuration, you create, update, and destroy these complex infrastructure stacks (mostly) reliably. The hard part is always the first step of obtaining that working configuration in the first place.

For anyone who’s spent much time setting up infrastructure with Terraform or Pulumi, you’ll likely be familiar with the iteration needed to get that first working version. It requires cross-checking Terraform or Pulumi resources docs vs. AWS docs that only tell you know to do things in the console vs. AWS API docs vs. examples vs. any number of other resources around the internet. Example code for Terraform is more prevalent around the internet than for Pulumi, and certainly AI can help a lot these days, but there are so many minor nuances to AWS that it’s rare to get something working correctly without some elbow grease needed on your part. All too often you’ll spend a day trying and trying again to re-apply your configuration, you’ll be met with an error, and you’ll have to dive back into the AWS docs to understand what you missed. Or maybe you do successfully apply your configuration, only to find that maybe you did need to better understand AWS networking after all, because none of your resources seem to be able to talk to each other.

Some of this is inherent—I saw a comment on reddit recently that I found pretty amusing, responding to a thread where the poster was talking about the pain they’d experienced using Pulumi at their current company:

Infrastructure is annoying by nature, terraform is just the bearer of bad news

Largely the poster is frustrated with a lack of examples available around the internet for Pulumi as opposed to Terraform, which leaves them on their own to figure things out. I’d point out that Pulumi and terraform resources are 1-1 identical because Pulumi literally reuses Terraform’s provider code for their own providers, but even so needing to translate HCL to the programming language of their choice is still a significant point of friction.

I find it interesting that this “cold start” problem for IaC code exists at all. Generally what happens when a lot of people need similar code to solve a problem, and that code takes a lot of work to create yourself, is eventually someone writes a library. I don’t think it would be overstating things to say that the whole point of code is to be able to create abstractions that hide complexity. Libraries do exactly that—they take some area of functionality and provide simplified interfaces that allow their users to make use of that functionality without taking on the complexity cost in their own codebase. Which ultimately leaves the question for IaC tools—why don’t more libraries exist to help us avoid the complexity of putting together hundreds of lines of configuration code for common things—running services on an ECS cluster, or setting up a VPC, or one of hundreds of other little things? Why does it always feel like I’m starting from scratch or hoping that I find a limited example that’s similar enough to what I’m looking for that I can copy + modify it without needing to spend hours in the docs?

I could probably theorize some answers to these questions, but I’ll just get to the point. I’ve been working on exactly that; I’ve created a library called Stackattack that provides a set of working, high-level AWS components using Pulumi that makes it easy to create things that should be easy, but aren’t right now. If you want to run some containers on an ECS cluster, you can do that in a few dozen lines instead of a few hundred. Setting up a role that you can use to authenticate to AWS in github actions without storing any credentials is just a couple lines. There’s a couple dozen different components available right now, and I plan to add more over time.

Although I’m distributing Stackattack as a library, I don’t see using the library as the only, or maybe even the “right” way to consume these components. I really admire the approach shadcn took as a React component “library” in being distributed via copying their components into your codebase rather than being installed a library. The reasoning provided for this approach is largely that your UI components should be 100% in your control. The same reasoning can be applied for your infrastructure—you should own it and be able to modify it however you see fit. For this reason, I see Stackattack as just as much a set of examples as it is a library. While Stackattack doesn’t provide a first-class way to “install” components by copying them into your codebase, it’s designed very specifically to allow for doing this on your own by simply copy/pasting the component code. The primary aspect of the design that makes this workflow achievable is that it uses functions to implement components rather than using Pulumi’s ComponentResource. This means that you can start by using the function from the library, but any time you want to pull the ejector lever you can simply copy the code into your codebase and modify it without Pulumi wanting to recreate your resources or having to do any painful stack migrations or import operations. It also, however, means that Stackattack components can’t be used from languages other than Typescript, the language they’re written in, even though Pulumi does provide tools to do this for custom resources.

If you work with Pulumi, AWS, and Typescript and you want to save a lot of time setting up working infrastructure, I encourage you to check out the components that Stackattack provides and see if they fit your needs before starting from scratch on your own. Each component includes a “View Source” link that you can use to see how it works and/or copy the code even if you don’t want to use the library directly. Like most of my projects, I initially made Stackattack to fit my own needs, but I think that reducing barriers to entry for managing infrastructure with IaC tools is a worthy cause, so I hope some others can find it useful as well. If you want to learn more about Stackattack you can check out the docs, which include lots of detail about each component including realistic usage code, cost implications, and more. They also include general recommendations for using Pulumi to manage your infrastructure including how to separate your resources into stacks and choosing a storage backend for your stack’s state.