PreambleI’m sure that if you’ve been involved with Cloud Infrastructure in the past 5-10 years you would have heard about Infrastructure as Code – the art of defining and deploying your infrastructure through code templates rather than using a GUI in a cloud console or portal. Benefits of using infrastructure as code include things such as:
- Re-usable and re-deployable resource templates that you can spin up and tear down on demand
- Greater understanding of infrastructure configuration at a glance – rather than needing to navigate through heavy UI consoles
- Ability to embed infrastructure deployment into your CI/CD pipelines – allowing end to end continuous deployment from infrastructure to application
Template LanguagesOne of the main topics of debate when choosing Infrastructure as Code tools is the language used when creating infrastructure templates. Terraform, on the one hand, uses Hashicorp Configuration Language (HCL) for its templates, which is a non-turing complete configuration language created by Hashicorp, whereas CloudFormation uses either JSON or YAML markup languages depending on your preference. Whilst HCL supports some language features present in conventional programming such as Variables and Loops, sometimes this functionality is limited and nuanced. When it comes to JSON and YAML, these languages are essentially just Markup languages, so resource imports and variables need to be handled by the CloudFormation parser rather than the language itself.
What about other programming languages?In the last few years, engineers have noticed that configuration languages such as HCL and markup languages such as YAML and JSON are not very flexible and require compromises when building dynamic infrastructure, and have been seeking methods to make Infrastructure as Code more accessible for developers who don’t want to learn another configuration language like HCL or learn the complex variable system of CloudFormation. As the result of this, two tools have been developed in the last few years to write Infrastructure as Code in more popular programming languages: Pulumi and AWS CDK. In this article, I’ll be focusing on the latter.
- Full support for variables and functions in your IaC, even to the point of dynamically generating resources with functions
- Familiar syntax and data types – a string is a string and you can use any of the normal string functions and methods available in your language of choice to interact with it
- Rich type system – each resource is represented as a subtype of Construct which you can use in your other Constructs, for example the constructor for an aws-glue.Connection takes an ec2.Subnet object as an argument.
- Stack imports by reference the easy way – you can import and export resources between multiple stacks by reference rather than having to use the terse syntax of CloudFormation
So that’s great, right? Gone is the terseness of CloudFormation and the jank of HCL, I can now use whatever language I want (out of those with bindings) to manage my infrastructure. Move over CloudFormation, there’s a new tool in town.
The open source community
Since emerging from AWS in mid-2019, AWS CDK has been open-source on Github, seeking the input of the community to expand the features and iron out bugs. However, this strategy has been met with mixed results. In many cases, I find myself coming across bugs with core services such as VPC, and finding that that there’s been an open bug ticket on Github for more than two years – see https://github.com/aws/aws-cdk/issues/6683.
Updates are also very frequent and occasionally breaking, deprecating functions for updated versions often (usually naming them v2) and introducing native Constructs for new services at an alarming rate. Additionally, development on CDK v1 and CDK v2 happen alongside each other, and the v2 (which is now the officially supported version) lacks support for many services that v1 supports.
To add to this, the open-source community has introduced a lot of extra things such as Python scripts and the like that run “under the hood” when deploying your CDK stacks, and this may add unnecessary complexity which leads to longer deploy times.Speaking of which…
Anyone who has used AWS CDK will note how horribly, horribly slow it is. When you run cdk diff, everything is funnelled through the TypeScript codebase into CloudFormation templates by a process I would only describe as slow and chunky. Then, the CloudFormation template diffs are checked/deployed as usual, which some could argue is also pretty slow. In some cases, additional calls to the AWS CLI are made when importing already-existing resources, and these are usually made every time you generate a new CloudFormation template. I’m hoping for more optimisations in the future, but chances are that you will find CDK takes up a considerably large chunk of your CI/CD pipeline execution time just by running a diff on your infrastructure.
If you’ll remember back to the start of the article, one of the reasons for the invention of Infrastructure as Code was to be able to visualise your infrastructure configuration and be able to ensure your infrastructure is defined exactly as you specify. The way that IaC achieves this is through declarative templates which are deployed exactly as-is. However, AWS CDK takes an approach that I like to call Automagic: it performs a lot of additional operations and creates additional resources — often by default and sometimes without the option to turn it off — without telling you until you get to the final deploy confirmation. This means that, as when creating resources via the GUI console, you often end up with resources that you didn’t ask for and that you can’t override.
Take for example the aws-stepfunctions module. As outlined in the API reference, CDK automatically adds all of the required execution roles to the IAM Role you specify for the State Machine, and if you don’t specify a Role it creates one with the required permissions. You cannot disable this feature. So this means if, for instance, you like to define all of your IAM Roles and Policies in a separate stack to your workload infrastructure and import them to your workload stacks, to reduce the horribly slow CloudFormation stack generation time and resource state checking, this feature will happily introduce a cyclic dependency error that you cannot get around, except by moving the IAM Role to the same stack as the State Machine you are defining.
This Automagic behaviour seems to be widespread across AWS CDK, and presents itself in very unique and interesting ways that border on baffling at times, creating not just IAM Roles, but also Lambda functions and even custom CloudFormation resources sometimes that you need to dig through the source code to understand what they do. In short, it means that you’re often getting more than you bargained for, and means that you no longer have visibility over your infrastructure any longer.
Overall, I think that AWS CDK is a very promising tool for bridging the gap between Infrastructure/DevOps engineers and Developers by allowing them to define infrastructure in languages such as TypeScript or Python, however the tool is very immature currently, even as the transition is ongoing from v1 to v2, and issues such as speed, reliability and observability still have a long way to go before many engineers will choose it as their primary preference for AWS Infrastructure as Code. I’ll definitely be watching the space and updating this article in the future if any of these things improve over time, but I think we’ll be waiting a little while.
Author: Jaydon Hansen