AWS CloudFormation is an infrastructure graph management service — and needs to act more like it

CloudFormation should represent our desired infrastructure graphs in the way we want to build them

Ben Kehoe
4 min readAug 4, 2019

What’s AWS CloudFormation?

As Richard Boyd says, CloudFormation is not a cloud-side version of the AWS SDK. Rather, CloudFormation is an infrastructure-graph management service.

But it’s not clear to me that CloudFormation fully understands this, and I think it should more deeply align with the needs that result from that definition.

Chief among these needs is that CloudFormation resources should be formed around the lifecycle of the right concepts in each AWS service — rather than simply mapping to the API calls provided by those services.

What’s the Issue?

For an example, let’s talk about S3 bucket notifications. If there’s a standard “serverless 101”, it’s image thumbnailing. Basic stuff, right? You have an S3 bucket, and you use bucket notifications to trigger a Lambda that will create the thumbnails and write them back to the bucket.

Any intro-to-serverless demo should show best practices, so you’ll put this in CloudFormation. The best practice for CloudFormation is to never explicitly name your resources unless you absolutely have to — so you never have to worry about name conflicts.

But surprise! You simply can’t trigger a Lambda from an S3 bucket that has a CloudFormation-assigned name. The crux of it is this:

  • Bucket notification configuration is only settable through the AWS::S3::Bucket resource, and bucket notifications check for permissions at creation time. If the bucket doesn’t have permission to invoke the Lambda, creation of that notification config will fail.
  • The AWS::Lambda::EventSourcePermission resource that creates that permission requires the name of the bucket.
  • If CloudFormation is assigning the bucket name, it’s not available in the stack until the bucket (and its notification configuration) are created.

Thus, you end up with a circular dependency. The AWS-blessed solution, described in several different places, is to hard-code an explicit bucket name on both the Bucket and EventSourcePermission resources.

This isn’t necessary. If we look at the lifecycle of the pieces involved, we can see that existence of the bucket should be decoupled with the settings of that bucket.

If we had a AWS::S3::BucketNotification resource that took the bucket name as a parameter, we could create the AWS::S3::Bucket first, and provide its name to both the BucketNotification and the EventSourcePermission.

Despite this option, we’re still years into AWS explicitly punting on this issue and telling customers, in official communications, to just work around it.

What about Lambda?

Going back to infrastructure graph representation, let’s talk about Lambda. CloudFormation has traditionally managed the infrastructure onto which applications were deployed. But in a serverless world, the infrastructure is the application.

When I want to do a phased rollout of a new version of a Lambda function, I’m supposed to have a CodeDeploy resource in the same template as my function. I update the AWS::Lambda::Function resource, and CodeDeploy takes care of the phased rollout using a weighted alias—all while my stack is in the UPDATING state.

The infrastructure graph during the rollout, when two versions of the code are deployed at the same time, has no representation within CloudFormation — and that’s a problem.

What if I want this rollout to happen over an extended period of time? What if I want to deploy two versions of a Lambda function to exist alongside each other indefinitely?

The latter is literally impossible to achieve with a single CloudFormation template today. The AWS::Lambda:Version resource publishes what’s in the $LATEST, which is what is set by AWS::Lambda::Function.

Instead, when we have phased rollouts, we should be speaking of deployments, decoupled from the existence of the function itself.

A resource like AWS::Lambda::Deployment that had parameters for the function name, and the code and configuration, and published that, with the version number available as an attribute.

Multiple of these resources could be included in the same template without conflicting, and your two deployments could then be wired to a weighted alias for phased rollout. Note: To do this properly, we’d need an atomic UpdateFunctionCodeAndConfiguration API call from the Lambda service.

In this way, CloudFormation could represent the state of the graph during a rollout, not just on either side of it.

What’s the So What?

The important notion here is that a resource’s create/update/delete lifecycle doesn’t need to be mapped directly to create/update/delete API calls. Instead, the resources for a service need to match the concepts that allow coherent temporal evolution of an infrastructure that uses the service.

When this is achieved, CloudFormation can adequately represent our desired infrastructure graphs in the way we want to build them, which will only become more critical as serverless/service-full architecture grows in importance.

Epilogue: New tools like the CDK look to build client-side abstractions on CloudFormation. In general, I’m not a fan of those approaches, for reasons that I won’t detail here. In any case , they will never be fully successful if CloudFormation doesn’t support the infrastructure graph lifecycles that those abstractions need to build upon.

--

--