4 ultimate reasons to prefer AWS CDK over Terraform

4 ultimate reasons to prefer AWS CDK over Terraform

There is an Italian version of this article; if you'd like to read it click here.

Over the past few months I have been using AWS CDK for some projects, and every time I started talking about it, someone would ask: why should I abandon the tool I am using and switch to CDK? What advantages does it offer?

I will not dwell on implementation details in this post; there are many useful resources to be found online, from tutorials for beginners to very advanced articles.

Instead, I want to summarise what I consider to be very interesting features of the framework.

I am a passionate advocate of Infrastructure as Code and have been using it extensively since the earliest versions of the tools that have become established leaders in this field today. What you learn with experience is that there is no such thing as the perfect tool that solves every problem or that fits all occasions; there are tools that are adapted to many different situations, or that are selected for certain specific characteristics of the company you work for, its processes, the risks you accept to face, the problems you take on, and so on.

In order to explain the advantages (and limitations) I have found in CDK, it is necessary to take a step back and recall the characteristics of some of the most widely used Infrastructure as Code tools.

Cloudformation

Cloudformation is the Infrastructure as Code service of AWS. It has been active since 2011 (it seems like yesterday, but in the cloud era we are talking about geological eras before that), free of charge, and uses descriptive languages such as JSON and YAML (the latter as of 2016, to the relief of many) to create templates in which the resources to be created on AWS are defined. These templates are processed by the Cloudformation service, which creates the resources as described. If we want to change our infrastructure, we simply re-execute the modified template.

Advantages

The unbeatable advantage of Cloudformation is the automatic rollback management. If my template contains errors, Cloudformation stops the infrastructure update action and automatically returns to the previous state, i.e. to the last 'working' version of my template.

Limits

  1. Over the years, Cloudformation has undergone many evolutions, introduced features, cross-account usage and more... and yet, nobody loves it. At most, it is tolerated. Why? Because of the languages it uses. JSON and YAML are essentially data serialisation formats and work well with machines... less well with humans. They are certainly easy to read, but extremely tedious to write. Since they are not programming languages, there are no practical (as well as basic) mechanisms such as loops for repetitive operations: if I need to create 10 security groups, I have to list them all, one by one, without fail. If you have ever used Cloudformation, you know what I am talking about.

  2. It works exclusively on AWS.

Terraform

Terraform is an open-source tool from Hashicorp for Infrastructure as Code, initially released in 2014. It uses the declarative HashiCorp Configuration Language (HCL), which from the earliest releases immediately seemed friendlier to the writing of infrastructure. Once a user invokes Terraform on a given resource, Terraform performs CRUD actions via the cloud provider's API to obtain the desired state. The code can be factored into modules, promoting reusability and maintainability.

Advantages

  1. Terraform manages external resources with 'providers'. Users can interact with Terraform providers by declaring resources or using data sources; there are many providers maintained by both Hashicorp and the community, and AWS is one of them. The first advantage is therefore that it is a cross-platform tool.

  2. As the HCL language has evolved over the years, Terraform allows the use of several constructs that function as loops in order to shorten the repetitive writing of similar resources. For example, one of the most common constructs is to cycle through a list:

resource "aws_ecr_repository" "ecr_repo" {
  count                = length(local.repo_list)
  name                 = local.repo_list[count.index]
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }
}

but with the latest versions of Terraform, it is possible to use more complex constructs, such as extracting keys and values from a map to be used as required:

...
  dynamic "predicates" {
    for_each = [for k, v in each.value["sets"] : {
      set = v
    } if contains(keys(aws_waf_ipset.waf_ipset), v)]
    content {
      data_id = aws_waf_ipset.waf_ipset[predicates.value.set].id
      negated = false
      type    = "IPMatch"
    }
  }
...

Probably not the clearest code in the world, though still better than the endless lists of attributes in Cloudformation...

Limits

  1. The infamous state file! Terraform saves the state of the infrastructure in a JSON file that is generated at each execution. Keeping this file is extremely important because it is the "source of truth"; in fact, Terraform consults this file before each execution to establish the discrepancy between the desired state (i.e. the code we want to execute) and the current state, and from this comparison decides what action to take to close the gap. If the state file is lost, Terraform is unable to realise that part of the infrastructure had already been created previously and will want to start from scratch.

    Furthermore, keeping all the code of a very large infrastructure together is a bad practice, for several reasons: operational risk, shared management, handovers, and general maintainability of the code. Typically, each infrastructure 'stack' is created with blocks of code executed separately: this means that each stack will have its own state file, and consequently the preservation of these state files, in the long run, with large teams and very large infrastructures, becomes a very important and delicate issue.

  2. No rollback management. The CRUD operations performed by Terraform, as I mentioned earlier, are sequential calls to the cloud provider's API; if for some reason in mid-execution a call fails... Terraform stops and leaves it to the user to put back the changes left in the middle. Not the best way to behave, especially in production environments.

CDK

OK, finally to the point: what are the characteristics of CDK that make it preferable to the instruments just mentioned? Personally, I see at least four! Let's look at them in order of importance.

Advantage #1: Rollback

CDK is a framework that, when executed, "synthesises" a Cloudformation template and then applies it. Consequently, it inherits all the positive features of Cloudformation, and, in particular, the ability to automatically roll back to the previous state. This is a very important feature in my opinion, especially when making changes to previously created stacks, especially in a production environment. Rollback is a step too often underestimated... until something goes wrong.

Advantage #2: No state file

As I said, being Cloudformation templates synthesised by the framework, the management of the state of the infrastructure is left to Cloudformation itself, and there are no state files to manage. In addition, it is much easier to consult the status of resources from the same console as the AWS account. Given the risks I listed earlier regarding state file management, this is no small advantage.

Advantage #3: Friendly/familiar programming language

AWS CDK is available for the most popular languages: TypeScript, Python, Java, .NET, and Go. There are no particular differences between these implementations: the choice can be based solely on the user's familiarity with one language or another. In my case, I used Python and my experience was pleasantly simple and smooth, thanks also to extremely comprehensive documentation and support for the main IDEs.

The use of an actual programming language also has the considerable advantage of being able to perform any type of operation not necessarily linked to CDK, such as requests to external APIs to retrieve information or notifications, manipulation of strings, files, JSON and so on... the limit is your imagination!

Advantage #4: Automatic generation of IAM policies

Finally, there is hardly any need to write any IAM roles and policies. The framework, based on the relationships between the resources declared in the code, is able to automatically calculate the necessary permissions and create roles and policies itself, following the principle of only assigning strictly necessary permissions.

This is by no means a trivial advantage, considering that this mechanism ensures that you do not forget any permissions and, above all, avoid assigning more permissions than you need, either by mistake or out of haste.

Of course, it is always possible to add permissions that the framework is unable to calculate. For example, it may happen that a Lambda function is created that internally makes API calls to AWS services, in which case the Lambda code is not part of the CDK code and is therefore excluded from the 'calculation'. The permissions required by the function for its calls must therefore be added to the role that the CDK automatically creates.

In addition to the advantage from the point of view of security, there is also the enormous time-saving in the development of infrastructure code. An example? The creation of a CodePipeline resource with its CodeCommit repository and CodeBuild stage required me to write about 500 lines of Terraform code; in CDK, the IAM part is about ten lines. Impressive.

Final considerations

AWS CDK is a tool that solves the problems of Cloudformation without losing its positive features, adding further advantages over other tools. Its greatest limitation, however, is in the fact that it can, of course, only be used on AWS.

There are other tools that use programming languages for writing infrastructure code, and which are available for use on other cloud providers: for example, Pulumi or cdktf. However, these tools do not have the same advantages, as they still use API calls (so there is no rollback) and save the state of the infrastructure in special files that have to be managed.

The persistence of these limitations has always put me off the idea of changing Infrastructure as Code tools because the change of habits, paradigm and especially code base seemed not worth it. AWS CDK, on the other hand, has such advantages that I would seriously consider abandoning other tools.

And what do you think? Have you tried AWS CDK? Would you consider switching tools in light of the advantages? Let me know in the comments!