I’m on the record as preferring declarative infrastructure as code (IaC) to imperative versions, such as the AWS CDK. I believe that declarative IaC has a lower total cost of ownership (TCO).
But while I prefer declarative to imperative, imperative IaC enables something I consider much worse: infrastructure as imperative programs that generate declarative IaC documents. Almost all imperative IaC frameworks work this way. There are two aspects to this that I consider particularly damaging.
First, these programs are generally not enforced to be deterministic, and when it’s not enforced, people are always going to find clever ways to solve their problems by externalizing state, and now your CI/CD process doesn’t have repeatable builds.
Is this fixable? Sure, if you stick to certain practices and processes, you can mitigate this problem, but as Forrest Brazeal says, CI/CD is the Conway’s Law-iest part of your stack, and your ability to accomplish those practices may be limited. So it’s better to stick to tooling that inherently has the properties you want, like determinism.
The second aspect is that the generative step is inevitably lossy. The primary reason people build these imperative frameworks is that they are unhappy with the constraints of the declarative language they decide to generate.
That’s actually not true — the primary reason is they just want tab completion — but the second most important reason is to allow constructions that aren’t possible otherwise. The reason I consider this to be a problem is that, when lossily flattening the abstractions built up in the imperative program, the developer’s mental models are lost, and not fully reflected in the deployed infrastructure.
We need our mental models, as expressed in the IaC we write, to be reified in the cloud, so that when something goes wrong, we can dive in and understand the deployed infrastructure in terms of the code we have written. The less context from the original that’s present, the less our tooling helps us understand what we’ve actually created. Metadata tags are not sufficient; we need proper cloud-side representation and API support.
An additional problem with the generative step is that in addition to being lossy, it often does not aim to produce human-consumable output. A regular refrain heard in CDK contexts is “CloudFormation is assembly code”.
How many C++ developers are adept at diving into assembly code to debug problems? Assembly code is not intended to be human-friendly. But for infrastructure as code, many people involved in the lifecycle of cloud resources are going to want to collaborate with the developer using CloudFormation.
Security folks aren’t going to want to review developer programs in a variety of languages across the organization, and given the lack of deterministic guarantees, they’re going to want to review exactly what’s being deployed. Operations personnel are in a similar situation. But if the generative step produces verbose templates that the developer can’t easily navigate, this collaboration is going to have a lot of friction.
When security points to a resource they have an issue with, it’s going to be harder to connect that with a remediation to the original program. Here again there are often suggestions that processes can mitigate this pain, but as before, that may not be possible. If IaC in a familiar language has value, a developer should be able to get most of the value without drawbacks without needing the whole organization to change.
It turns out there’s a way to address these problems.
Spoiler alert: it’s a project called InGraph. You should go read about it in their introductory blog post, but I want to add some color to the development process it went through, and where I see future value in the project.
Last fall, I gave a talk at Serverlessconf on this topic. One of the things I stressed was that with full parity between an imperative IaC system and a declarative system underlying it, the choice is then a matter of preference, without TCO pitfalls.
Some time later, I got a DM on Twitter from Farzad Senart asking if I’d like to look at a project for IaC in Python, based on the ideas I laid out in my talk. Obviously I was flattered, and agreed to see what they’d built. What they showed me blew my mind. They had not taken the route that everyone else had taken, building a library so that the user’s code became a program in the imperative language that generated declarative IaC. Instead, they hooked into the Python interpreter itself.
The user didn’t need to import anything; the interpreter understands how to map the AST, and the instantiated code, into a CloudFormation template. I’ve pointed out before that YAML isn’t a language — it’s a syntax. CloudFormation is a declarative programming language with YAML syntax.
What Farzad and his compatriot Lionel Suss had done was build a new IaC programming language with Python syntax! Everything an IDE can do with Python, it could do with these programs.
There was a lot of promise in the prototype I was shown. It didn’t draw the L1/L2 distinction that the CDK draws, where the design of abstractions departs from the representation of native CloudFormation resources; high-level abstractions use the same model.
Still, that initial version departed from what I hope to see in an ideal imperative IaC framework. For example, it still allowed nondeterministic programs. There were two features that I thought were really important.
The first is the ability to produce CloudFormation templates that have template parameters produces more parity between imperative and declarative; you are not producing a template that is an instantiation of your program for a specific stack, but rather transforming your program from an imperative language to a representation in the declarative language, which can then be used on its own.
This means that semantics of your program are not resolved at compile time, which is great when you want to create flexible templates that you can provide across your organization, for example in AWS Service Catalog.
The second is that your variable name for a given resource should become the logical id for that resource in the template. For me, this is the essence of the concept of reifying the developer’s mental models in the cloud. What you call that resource is what CloudFormation calls it too.
Over the next few months, Lionel and Farzad continued to evolve the framework, experimenting with different approaches. I was impressed by the speed and capability with which they could produce new features through hooking into the Python interpreter.
The framework started enforcing that the program was deterministic, and could be passed a JSON object at run time to set the value of parameters within the program.
But then they took it further: they created a feature where you could set a function parameter to turn into a template parameter only if was not given a value in the program. So the parts of your Python program that were not fully determined became not fully determined inside the output template!
Another very cool feature is that because they are running the interpreter, they know literally all the files you touch, and as part of a run, package up all of your source, the input parameters you’ve given, the output template, and the metadata about how all of it relates, and upload that all together to S3 as a build artifact that can be referenced, without you having to explicitly define anything other than the main Python file as the entry point.
They even demoed a Chrome extension that leveraged that artifact to let you go from the CloudFormation console from a resource back into your program’s source where it was defined.
The final piece of the puzzle, which is where I think they’ve really unlocked a special idea, is the following idea: what if you could only write Python that could be fully represented as a CloudFormation template? You’re now no longer writing in a different language. You’re writing CloudFormation with Python syntax instead of YAML syntax.
Obviously, this constrains the Python features you can use. For example, you can use string formatting, because that can be turned into
Fn::Sub in the template, but you can’t do numeric math, because that’s not possible in a template.
You might object, why would I want write Python with such constraints? It’s because your templates are literally your programs. You can look at the template and instantly know how your program was mapped to it. One reason I’m excited about this, and this might be counterintuitive, is that it’s a great way to demonstrate the most-needed features we should get support for in CloudFormation templates. It exposes the rough edges that we need solved in CloudFormation, rather than papering over these gaps on the client side.
For example, in an InGraph program you can build up abstractions, which use the same model as native CloudFormation resources. In the resource graph, they become subgraphs. CloudFormation doesn’t have a way of representing this nested structure, so it has to be flattened.
It’s represented to the best extent possible — the logical id of a resource is essentially the concatenation of its name with its containing resource’s name — but it demonstrates that CloudFormation needs to provide us ways to fully represent these subgraphs (since nested stacks do not perform this function, and have numerous shortcomings that people have documented).
Today, your InGraph program must be fully deterministic, and obey a subset of all deterministic Python features; you can write another Python program to nondeterministically and/or using any Python feature generate a JSON output, and use that as input to an InGraph program.
But I think there’s a cool future possibility here: you should be able to write Python code inside your program that goes beyond the constraints, as long as the execution graph is separable into the unconstrained and constrained parts. For example, numeric math should just be performed in the unconstrained part.
Then InGraph can snapshot the full state that comes out of the unconstrained part of the program, save it as an artifact, and then proceed to inject that state to resolve the constrained part that becomes the template. What you still wouldn’t be able to do is use unconstrained code that needed values that would only exist in the resolved template. So you couldn’t, say, have a for loop that used the output of a resource, since that output only exists at deployment time.
I’m excited about InGraph because while I believe the lowest TCO is still with declarative templates, at the point where we have a principled imperative tool that helps your program become fully represented cloud-side, it’s really just a matter of personal preference.
It has potential as a tool to make it easier for developers to learn CloudFormation — really learn CloudFormation, not stay arm’s length away from it — and be able to converse on the basis of their generated CloudFormation templates with others in their organization.
Finally, I’d like to give a plug to Farzad and Lionel as lifa.dev. Their level of expertise in this area has really impressed me. Over the course of the project, they built, each time basically from the ground up, several different versions, each one essentially a Domain Specific Language with Python syntax. This seems to me like a really useful skill.
Lionel and Farzad have created lifa.dev both as a home for their open-source developments and findings around AWS and as a way to provide their expertise in the field to companies worldwide. If you use AWS CloudFormation, you should check them out for assistance in your CI/CD journey.