$work, we have several Kubernetes clusters across different geographical and AWS regions. The reasons range from customer requirements, to our own desire to reduce operational “blast radius” issues that might come up. Our team has experience large outages before, and we try and build the smallest unit of deployment we possibly can for out platform.
Unfortunately, this brings with it new challenges, especially when it comes to running Kubernetes clusters. I’ve spoke extensively about this on this blog before, particularly regarding configuration management needs and the overhead that scaling out to multiple clusters brings.
As these clusters have become more utilized by application teams, a new consideration has arisen. Deploying regional applications has the same configuration complexity problems I’ve spoken about when it comes to the infrastructure management, and essentially the needs boil down to the same words we’re familiar with: configuration management.
I set out to try and make the task of deploying applications to regional clusters as easy as possible for our teams following the same philosophy frustrations I had before. The requirements were a bit like this:
- no templating languages (no helm!)
- continuous deployment made easy
- easy for developers to grasp - low barrier to entry
- abstract as much of configuration complexity away as possible
What I came up with works very nicely, and uses largely off the shelf tooling that you can replicate very easily.
This post is the first in what I hope will be a 2 part series of posts which covers the following topics:
- Generating regional/parameterised manifests using jkcfg
- Using Gitlab, Gitlab-CI, ArgoCD and GitOps to deploy to multiple clusters
Part 1: Generate your config
It’s the first step, but it’s also the hardest. How do you generate your YAML configuration for the different clusters?
Ksonnet uses jsonnet, which for us infrastructure people didn’t seem so bad (we used it in kr8) but there was very little desire for developers to actually learn this new and strange language for their application development needs. Luckily for me, it was deprecated just as I was trying to convince my developers otherwise, which meant I kept search for other solutions.
Helm is the defacto standard for this kind of thing but again, there were some confused questions when it came to the templating of YAML. I could sympathise with this, and at the time, Helm had some serious security problems with Tiller. Helm3 has largely addresses this, but I still can’t bring myself to template yaml.
Let’s generate some manifests
The jkcfg repo has some excellent examples you can use, and getting started was generally pretty straightforward.
Download the jk binary
Okay, so we’re ready to generate some manifests. A simple deployment might look like this:
Okay, we have a nice deployment manifest now, but how does this help me with different regions?
jkcfg supports “parameters” which can be passed either via a command line argument, or a file. This is similar to Helm’s values files which are evaluated at compile time. Using values in jkcfg is very straightforward.
You can then use this value inside your deployment manifest. Here’s the end result:
Once you’ve started using parameters, you probably don’t always want to use the same number of replicas. You can invoke the parameters in two ways. The first, and easiest, is on the command line:
Check the manifest now in
manifests/myapp.yaml: you’ll see we’ve set the replicas to 5!
The other way of overriding the parameters is using a parameters file. This can be YAML or JSON. Create a file called
params/myapp.yaml and populate it like so:
Then use it with jk like so:
Abstract, abstract, abstract
As I went through this journey, it became apparent there was a lot of code reuse across different services. Most of the services we build are using the same frameworks, and need a lot of similar configuration.
For example, every service we deploy to Kubernetes needs 4 basic things:
- A deployment spec
- With KIAM annotations
- With security contexts
- etc etc
- A service spec
- An ingress
- A configmap
Here’s the repo contents:
I’ll break down the
js files so we can get an idea of what this entails.
The main meat of the package is in
kube.js. Let’s take a look at this:
Obviously this is a lot more involved than our previous, very simple deployment from earlier. What’s worth noting though is that this is completely configurable by a
service object in our jkcfg configuration parameter. I’ve gone for an approach here as well where we load the configuration variable from a configmap, which is not managed by this module. That way, we can use a basic boilerplate module for most of the stuff we want to deploy, and we can have a pretty high degree of confidence that the deployments meet our standards.
This labels file is simply a way of ensuring we have the correct labels defined for all our resources. Here’s what it looks like:
Notice, we’re exporting this as a function. The “why” will become apparent later…
index.js where we export all this to be used:
We now have a very basic NPM package. We can push this to git or to the NPM registry and let people use. So how do we actually use it?
Using the package
Using this is pretty straightforward. First, add it as a dependency:
Then import it to be used in your jkcfg
Now we’ve imported it, the fun stuff starts. For your average person, you can generate the deployment, service and ingress with two files. First, pad out your
index.js like so:
Before we get too excited, we need to populate our params file:
As you can see, the service object does most of the work for us, and we can tailor it per region or per environment as needed.
At this stage, we’re ready to generate our manifests again. Let’s use the command from before:
Notice how we specify the version as a parameter outside the params file, simply because we expect this to be a dynamic value. I’ll talk more about this in my next post.
This should have generated manifests for you in
complex/manifests and now you’re ready to do some deployments!
Add a ConfigMap
The final part of this is the configuration. We’ve so far tried to build a jkcfg package that is as agnostic as possible, and can be reused across many different services and deployments.
The reality is though, you’re still going to need to add configuration data for the service. We can utilise what we’ve got here and add a configmap to the equation which can be customised per-deployment very easily.
index.js add the following:
Your end result should be this:
This will now generate a ConfigMap, but it’ll be empty. You need to add a
config object to your params file. It’ll end up looking like this:
And now we have a working deployment, with all the resources we might need to run a service.
You might be wondering at this point, this a lot of work! Why not just use a Helm chart? My personal thoughts about why I prefer this way are:
- Using Helm charts mean developers have to learn a new pattern, specifically Go templates. With this method, they can use a programming language which they will undoubtedly feel more familiar with
This concludes part 1 of my series. In part 2, I’ll talk a little bit about using the CI pipeline to generate these configs and push them to a deploy repo (specifically with Gitlab-CI) and then talk a little bit about using Argo do deployments.
All the code from this post can be found in the github repo if you want to take a look!
Stay tuned for the next post!
I’ve been building a Kubernetes based platform at $work now for almost a year, and I’ve become a bit of a Kubernetes apologist. It’s true, I think the technology is fantastic. I am however under no illusions about how difficult it is to operate and maintain. I read posts like this one earlier in the year and found myself nodding along to certain aspects of the opinion. If I was in a smaller company, with 10/15 engineers, I’d be horrified if someone suggested managing and maintaining a fleet of Kubernetes clusters. The operational overhead is just too high.
Despite my love for all things Kubernetes at this point, I do remain curious about the notion that “serverless” computing will kill the ops engineer. The main source of intrigue here is the desire to stay gainfully employed in the future - if we aren’t going to need OPS engineers in our glorious future, I’d like to see what all the fuss is about. I’ve done some experimentation in Lamdba and Google Cloud Functions and been impressed by what I saw, but I still firmly believe that serverless solutions only solve a percentage of the problem.
I’ve had my eye on AWS Fargate for some time now and it’s something that developers at $work have been gleefully pointing at as “serverless computing” - mainly because with Fargate, you can run your Docker container without having to manage the underlying nodes. I wanted to see what that actually meant - so I set about trying to get an app running on Fargate from scratch. I defined the success criteria here as something close-ish to a “production ready” application, so I wanted to have the following:
- A running container on Fargate
- With configuration pushed down in the form of environment variables
- “Secrets” should not be in plaintext
- Behind a loadbalancer
- TLS enabled with a valid SSL certificate
I approached this whole task from an infrastructure as code mentality, and instead of following the default AWS console wizards, I used terraform to define the infrastructure. It’s very possible this overcomplicated things, but I wanted to make sure any deployment was repeatable and discoverable to anyone else wanting to follow along.
All of the above criteria is generally achieveable with a Kubernetes based platform using a few external add-ons and plugins, so I’m admittedly approaching this whole task with a comparitive mentality - because I’m comparing it with my common workflow. My main goal was to see how easy this was with Fargate, especially when compared with Kubernetes. I was pretty surprised with the outcome.
AWS has overhead
I had a clean AWS account and was determined to go from zero to a deployed webapp. Like any other infrastructure in AWS, I had to get the baseline infrastructure working - so I first had to define a VPC.
I wanted to follow the best practices, so I carved the VPC up into subnets across availability zones, with a public and a private subnet. It occurred to me at this point that as long as this need was always there, I’d probably be able to find a job of some description. The notion that AWS is operationally “free” is something that has irked me for quite some time now. Many people in the developer community take for granted how much work and effort there is in setting up and defining a well designed AWS account and infrastructure. This is before we even start talking about a multi-account architecture - I’m still in a single account here and I’m already having to define infrastructure and traditional network items.
It’s also worth remembering here, I’ve done this quite a few times now, so I knew exactly what to do. I could have used the default VPC in my account, and the pre-provided subnets, which I expect many people who are getting started might do. This took me about half an hour to get running, but I couldn’t help but think here that even if I want to run lambda functions, I still need some kind of connectivity and networking. Defining NAT gateways and routing in a VPC doesn’t feel very serveless at all, but it has to be done to get things moving.
Run my damn container
Once I had the base infrastructure up and running, I now wanted to get my docker container running. I started examining the Fargate docs and browsed through the Getting Started docs and something immediately popped out at me:
Hold on a minute, there’s at least THREE steps here just to get my container up and running? This isn’t quite how this whole thing was sold to me, but let’s get started.
A task definition defines the actual container you want to run. The problem I ran into immediately here is that this thing is insanely complicated. Lots of the options here are very straightforward, like specifying the docker image and memory limits, but I also had to define a networking model and a variety of other options that I wasn’t really familiar with. Really? If I had come into this process with absolutely no AWS knowledge I’d be incredibly overwhelmed at this stage. A full list of the parameters can be found on the AWS page, and the list is long. I knew my container needed to have some environment variables, and it needed to expose a port. So I defined that first, with the help of a fantastic terraform module which really made this easier. If I didn’t have this, I’d be hand writing JSON to define my container definition.
First, I defined some environment variables:
Then I compiled the task definition using the module I mentioned above:
I was pretty confused at this point - I need to define a lot of configuration here to get this running and I’ve barely even started, but it made a little sense - anything running a docker container needs to have some idea of the configuration values of the docker container. I’ve previously written about the problems with Kubernetes and configuration management and the same problem seemed to be rearing its ugly head again here.
Next, I defined the task definition from the module above (which thankfully abstracted the required JSON away from me - if I had to hand write JSON at this point I’ve have probably given up).
I realised immediately I was missing something as I was defining the module parameters. I need an IAM role as well! Okay, let me define that:
That makes sense, I’d need to define an RBAC policy in Kubernetes, so I’m still not exactly losing or gaining anything here. I am starting to think at this point that this feels very familiar from a Kubernetes perspective.
At this point, I’ve written quite a few lines of code to get this running, read a lot of ECS documentation and all I’ve done is define a task definition. I still haven’t got this thing running yet. I’m really confused at this point what the value add is here over a Kubernetes based platform, but I continued onwards.
A service is partly how to expose the container to the world, and partly how you define how many replicas it has. My first thought was “Ah! This is like a Kubernetes service!” and I set about mapping the ports and such like. Here was my first run at the terraform:
I again got frustrated when I had to define the security group for this that allowed access to the ports needed, but I did so and plugged that into the network configuration. Then I got a smack in the face.
I need to define my own loadbalancer?
LoadBalancers Never Go Away
I was honestly kind floored by this, I’m not even sure why. I’ve gotten so used to Kubernetes services and ingress objects that I completely took for granted how easy it is to get my application on the web with Kubernetes. Of course, we’ve spent months building a platform to make this easier at $work. I’m a heavy user of external-dns and cert-manager to automate populating DNS entries on ingress objects and automating TLS certificates and I am very aware of the work needed to get these set up, but I honestly thought it would be easier to do this on Fargate. I recognise that Fargate isn’t claiming to be the be all and end-all of how to run applications - it’s just abstracting away the node management - but I have been consistently told this is easier than Kubernetes. I really was surprised. Defining a LoadBalancer (even if you don’t want to use Ingresses and Ingress controllers) is part and parcel of deploying a service to Kubernetes, and I had to do the same thing again here. It just all felt so familiar.
I now realised I needed:
- A loadbalancer
- A TLS certificate
- A DNS entry
So I set about making those. I made use of some popular terraform modules, and came up with this:
I’ll be completely honest here - I screwed this up several times. I had to fish around in the AWS console to figure out what I’d done wrong. It certainly wasn’t an “easy” process - and I’ve done this before - many times. Honestly, at this point, Kubernetes looked positively enticing to me, but I realised it was because I was very familiar with it. If I was lucky enough to be using a managed Kubernetes platform (with external-dns and cert-manager preinstalled) I’d really wonder what value add I was missing from Fargate. It just really didn’t feel that easy.
After a bit of back and forth, I now had a working ECS service. The final definition, including the service, looked a bit like this:
I felt like it was close at this point, but then I remembered I’d only done 2 of the required 3 steps from the original “Getting Started” document - I still needed to define the ECS cluster.
Thanks to a very well defined module, defining the cluster to run all this on was actually very easy.
What surprised me the most here is why I had to define a cluster at all. As someone reasonably familiar with ECS it makes some sense you’d need a cluster, but I tried to consider this from the point of view of someone having to go through this process as a complete newcomer - it seems surprising to me that Fargate is billed as “serverless” but you still need to define a cluster. It’s a small detail, but it really stuck in my mind.
Tell me your secrets
At this stage of the process, I was fairly happy I managed to get something running. There was however something missing from my original criteria. If we go all the way back to the task definition, you’ll remember my app has an environment variable for the password:
If I looked at my task definition in the AWS console, my password was there, staring at me in plaintext. I wanted this to end, so I set about trying to move this into something else, similar to Kubernetes secrets
The way Fargate/ECS does the secret management portion is to use AWS SSM (the full name for this service is AWS Systems Manager Parameter Store, but I refuse to use that name because quite frankly it’s stupid)
The AWS documentation covers this fairly well, so I set about converting this to terraform.
Specifying the Secret
First, you have to define a parameter and give it a name. In terraform, it looks like this:
Obviously the key component here is the “SecureString” type. This uses the default AWS KMS key to encrypt the data, something that was not immediately obvious to me. This has a huge advantage over Kubernetes secrets, which aren’t encrypted in etcd by default.
Then I specified another local value map for ECS, and passed that as a secret parameter:
A problem arises
At this point, I redeployed my task definition, and was very confused. Why isn’t the task rolling out properly? I kept seeing in the console that the running app was still using the previous task definition (version 7) when the new task definition (version 8) was available. This took me way longer than it should have to figure out, but in the events screen on the console, I noticed an IAM error. I had missed a step, and the container couldn’t read the secret from AWS SSM, because it didn’t have the correct IAM permissions. This was the first time I got genuinely frustrated with this whole thing. The feedback here was terrible from a user experience perspective. If I hadn’t known any better, I would have figured everything was fine, because there was still a task running, and my app was still available via the correct URL - I was just getting the old config.
In a Kubernetes world, I would have clearly seen an error in the pod definition. It’s absolutely fantastic that Fargate makes sure my app doesn’t go down, but as an operator I need some actual feedback as to what’s happening. This really wasn’t good enough. I genuinely hope someone from the Fargate team reads this and tries to improve this experience.
That’s a wrap?
This was the end of the road - my app was running and I’d met all my criteria. I did realise that I had some improvements to make, which included:
- Defining a cloudwatch log group, so I could write logs correctly
- Add a route53 hosted zone to make the whole thing a little easier to automate from a DNS perspective
- Fix and rescope the IAM permissions, which were very broad at this point
But honestly at this point, I wanted to reflect on the experience. I threw out a twitter thread about my experience and then spent the rest of the time thinking about what I really felt here.
What I realised, after an evening of reflection, was that this process is largely the same whether you’re using Fargate or Kubernetes. What surprised me the most was that despite the regular claims I’ve heard that Fargate is “easier” I really just couldn’t see any benefits over a Kubernetes based platform. Now, if you’re in a world where you’re building Kubernetes clusters I can absolutely see the value here - managing nodes and the control plane is just overhead you don’t really need. The problem is - most consumers of a Kubernetes based platform don’t have to do this. If you’re lucky enough to be using GKE, you barely even need to think about the management of the cluster, you can run a cluster with a single gcloud command nowadays. I regularly use Digital Ocean’s managed Kubernetes service and I can safely say that it was as easy as spinning up a Fargate cluster - in fact in some way’s it was easier.
Having to define some infrastructure to run your container is table stakes at this point. Google may have just changed the game this week with their Google Cloud Run product, but they’re massively ahead of everyone else in this field.
What I think can be safely said from this whole experience though is this: Running containers at scale is still hard. It requires thought, it requires domain knowledge, it requires collaboration between Operations and Developers. It also requires a foundation to build on - any AWS based operation is going to need to have some fundamental infrastructure defined and running. I’m very intrigued by the “NoOps” concept that some companies seem to aspire for. I guess if you’re running a stateless application, and you can put it all inside a lambda function and an API gateway you’re probably in a good position, but are we really close to this in any kind of enterprise environment? I really don’t think so.
Another realisation that struck me is that often the comparisons between technology A and technology B sometimes aren’t really fair, and I see this very often with AWS. The reality of the situation is often very different from the Jeff Barr blogpost. If you’re a small enough company that you can deploy your application in AWS using the AWS console and select all of the defaults, this absolutely is easier. However, I didn’t want to use the defaults, because the defaults are almost always not production ready. Once you start to peel back the layers of cloud provider services, you begin to realise that at the end of the day - you’re still running software. It still needs to be designed well, deployed well and operated well. I believe that the value add of AWS and Kubernetes and all the other cloud providers is it makes it much, much easier to run, design and operate things well, but it is definitely not free.
Arguing for Kubernetes
My final takeaway here is this: if you view Kubernetes purely as a container orchestration tool, you’re probably going to love Fargate. However, as I’ve become more familiar with Kubernetes, I’ve come to appreciate just how important it is as a technology - not just because it’s a great container orchestration tool but also because of its design patterns - it’s declarative, API driven platform. A simple thought that occurred to me during all of this Fargate process was that if I deleted any of this stuff, Fargate isn’t necessarily going to recreate it for me. Autoscaling is nice, not having to manage servers and patching and OS updates is awesome, but I felt I’d lost so much by not being able to use Kubernetes self healing and API driven model. Sure, Kubernetes has a learning curve - but from this experience, so does Fargate.
Despite my confusion during some of this process, I really did enjoy the experience. I still believe Fargate is a fantastic technology, and what the AWS team has done with ECS/Fargate really is nothing short of remarkable. My perspective however is that this is definitely not “easier” than Kubernetes, it’s just.. different.
The problems that arise when running containers in production are largely the same. If you take anything away from this post it should be this: whichever way you choose is going to have operational overhead. Don’t fall into the trap of believing that you can just pick something and your world is going to be easier. My personal opinion is this: If you have an operations team and your company is going to be deploying containers across multiple app teams - pick a technology and build processes and tooling around it to make it easier.
I’m certainly going to take the claims from people that certain technology is easier with a grain of salt from now on. At this stage, when it comes to Fargate, this sums up my feelings:
I made a statement during the talk which ignited some fairly fierce discussion both online, and at the conference:
To put this into my own words:
At some point, we decided it was okay for us to template yaml. When did this happen? How is this acceptable?
After some conversation, I figured it was probably best to back up my claims in some way. This blog post is going to try to do that.
The configuration problem
Once the applications and infrastructure you’re going to manage grows past a certain size, you inevitably end up in some form of configuration complexity hell. If you’re only deploying 1 or maybe 2 things, you can write a yaml configuration file and be done with it. However once you grow beyond that, you need to figure out how to manage this complexity. It’s incredibly likely that the reason you have multiple configuration files is because the $thing that uses that config is slightly different from its companions. Examples of this include:
- Applications deployed in different environments, like dev, stg and prod
- Applications deployed in different regions, like Europe or North American
Obviously, not all the configuration is different here, but it’s likely the configuration differs enough that you want to be able to differentiate between the two.
This configuration complexity has been well known for Operators (System Administrators, DevOps engineers, whatever you want to call them) for some years now. An entire discpline grew up around this in Configuration Management, and each tool solved this problem in their own way, but ultimately, they used YAML to get the job done.
My favourite method has always been hiera which comes bundled with Puppet. Having the ability to hierarchically look up the variables of specific config needs is incredibly powerful and flexible, and has generally meant you don’t actually need to do any templating of yaml at all, except perhaps for embedding Puppet facts into the yaml.
Did we go backwards?
Then, as our industries’ needs moved above the operating system and into cloud computing, we had a whole new data plane to configure. The tooling to configure this changed, and tools like CloudFormation and Helm appeared. These tools are excellent configuration tools, but I firmly believe we (as an industry) got something really, really wrong when we designed them. To examine that, let’s take a look at example of a helm chart taking a custom parameter
Helm charts can take external parameters defined by an
values.yaml file which you specify when rendering the chart. A simple example might look like this:
Let’s say my external parameter is simple - it’s a string. It’d look a bit like this:
That’s not so bad right? You just specify a value for
image in your values.yaml and you’re on your way.
The real problem starts to get highlighted when you want to do more complicated and complex things. In this particular example, you’re doing okay because you know you have to specify an image for a Kubernetes deployment. However, what if you’re working with something like an optional field? Well, then it gets a little more unwieldy:
Optional values just make things ugly in templating languages, and you can’t just leave the value blank, so you have to resort to ugly loops and conditionals that are probably going to bite you later.
Let’s say you need to go a step further, and you need to push an array or map into the config. With helm, you’d do something like this.
Firstly, let’s ignore the madness of having a templating function
toYaml to convert yaml to yaml and focus more on the whitespace issue here.
YAML has strict requirements and whitespace implementation rules. The following, for example, is not valid or complete yaml:
Generally, if you’re handwriting something, this isn’t necessarily a problem because you just hit backspace twice and it’s fixed. However, if you’re generating YAML using a templating system, you can’t do that - and if you’re operating above 5 or 10 configuration files, you probably want to be generating your config rather than writing it.
So, in the above example, you want to embed the values of
.Values.podAnnotations under the annotations field, which is indented already. So you’re having to not only indent your values, but indent them correctly.
What makes this even more confusing is that the go parser doesn’t actually know anything about YAML at all, so if you try to keep the syntax clean and indent the templates like this:
You actually can’t do that, because the templating system gets confused. This is a singular example of the complexity and difficulty you end up facing when generating config data in YAML, but when you really start to do more complex work, it really starts to become obvious that this isn’t the way to go.
Needless to say, this isn’t what I want to spend my time doing. If fiddling around with whitespace requirements in a templating system doing something it’s not really designed for is what suits you, then I’m not going to stop you. I also don’t want to spend my time writing configuration in JSON without comments and accidentally missing commas all over the shop. We (as an industry) decided a long time ago that shit wasn’t going to work and that’s why YAML exists.
So what should we do instead? That’s where jsonnet comes in.
JSON, Jsonnet & YAML
Before we actually talk about Jsonnet, it’s worth reminding people of a very important (but oft forgotten point). YAML is a superset of JSON and converting between the two is trivial. Many applications and programming languages will parse JSON and YAML natively, and many can convert between the two very simple. For example, in Python:
So with that in mind, let’s talk about Jsonnet.
Welcome to the church of Jsonnet
Jsonnet is a relatively new, little known (outside the Kubernetes community?) language that calls itself a data templating language. It’s definitely a good exercise to read and consume the Jsonnet design rationale page to get an idea why it exists, but if I was going to define in a nutshell what its purpose is - it’s to generate JSON config.
So, how does it help, exactly?
Well, let’s take our earlier example - we want to generate some JSON config specifying a parameter (ie, the image string). We can do that very very easily with Jsonnet using external variables.
Firstly, let’s define some Jsonnet:
Then, we can generate it using the Jsonnet command line tool, passing in the external variable as we need to:
Before, I noted that if you wanted to define an optional field, with YAML templating you had to define if statements for everything. With Jsonnet, you’re just defining code!
The output here, because our variable is null, means that we never actually populate resourceGroup. If you specify a value, it will appear:
Maps and parameters
Okay, now let’s look at our previous annotation example. We want to define some pod annotations, which takes a YAML map as its input. You want this map to be configurable by specifying external data, and obviously doing that on the command line sucks (you’d be very unlikely to specify this with Helm on the command line, for example) so generally you’d use Jsonnet imports to this. I’m going to specify this config as a variable and then load that variable into the annotation:
This might just be my bias towards Jsonnet talking, but this is so dramatically easier than faffing about with indentation that I can’t even begin to describe it.
The final thing I wanted to quickly explore, which is something that I feel can’t really be done with Helm and other yaml templating tools, is the concept of manipulating existing objects in config.
Let’s take our example above with the annotations, and look at the result file:
Now, let’s say for example I wanted to append a set of annotations to this annotations map. In any templating system, I’d probably have to rewrite the whole map.
Jsonnet makes this trivial. I can simply use the
+ operator to add something to this. Here’s a (poor) example:
The end result is this:
Obviously, in this case, it’s more code to this, but as your example get more complex, it becomes extremely useful to be able to manipulate objects this way.
We use all of these methods in kr8 to make creating and manipulating configuration for multiple Kubernetes clusters easy and simple. I highly recommend you check it out if any of the concepts you’ve found here have found you nodding your head.
TL;DR: - go here
I often spend time in my day job wishing I could implement $newtech. I’m lucky enough to be working on projects right now that many people would find exciting, interesting and challenging, however it’s often the case that I see something I’d like to try, but deploying it at $dayjob requires me to design for large scale and with security and compliance in mind.
When this happens, I generally try it out in my “homelab”. This might mean trying it in a cloud account (I’m particularly fond of DigitalOcean for this) but I also recently reinvested (I moved to another country last year, and had to sell my previous homelab equipment) in a very small homelab consisting of 3 mini PCs and a Dell T30 server, along with some UniFi.
My original intention was to blog about the journey, but I realised this might end up being more time consuming than I’d like, so with that in mind I decided that perhaps the best way to contribute knowledge back to the community was via Github.
Currently, the Org consists of 3 repos:
- tf-kubernetes-clusters - a repo containing simple terraform code for Kubernetes clusters for a wide variety of cloud providers. The intention here is to make launching a cluster easy and straightforward for testing purposes
- puppet-homelab - a Puppet control repo containing roles and profiles for my homelab. This could be used as a starting point for anyone wishing to build out a homelab, I’d encourage forking this and tailoring to your needs
- kr8-cluster-config - a repo containing configuration for kr8 which allows me to quickly and easily install components inside the Kubernetes clusters I build. As an example I have components like metallb which allow me to have Kubernetes LoadBalancers.
Some of the other tooling I’ve implemented includes:
- CoreDNS via a Puppet module which allows me to control my DNS infra
- Choria so I can quickly run tasks across the whole homelab
- external-dns via a kr8 component so I can automatically update DNS when deploying webapps on my homelab cluster
- cert-manager via a kr8-component for automated TLS on my homelab cluster
- consul via the Puppet module
In the near future, I plan on implementing other tech like:
- Vault for secret management
- eyaml encryption in Puppet
My hope is that doing this in the open can help other homelabbers learn about enterprise software, specifically DevOps related projects.
I encourage people to open issues in the repos, asking questions about how to implement things. Hopefully this can be my way to give back to the community.
Previous visitors to this blog will remember I wrote about configuration mgmt for Kubernetes clusters, and how the space was lacking. For those not familiar, the problem statement is this: it’s really hard to maintain and manage configuration for components of multiple Kubernetes clusters. As the number of clusters you have starts to scale, keeping the things you need to run in them (such as ingress controllers) configured and in sync, as well as managed the subtle differences that need to be managed across accounts and regions.
With that in mind, it’s my pleasure to announce that at my employer, Apptio we have tried to solve this problem with kr8. kr8 is an opinionated Kubernetes cluster configuration management tool, designed to be simple, flexible and use off the shelf tools where possible. This blog post details some of the design goals of kr8, as well as some of the benefits and a few examples.
The intention when making kr8 was to allow us to generate manifests for a variety of Kubernetes clusters, and give us the ability to template and override yaml parameters where possible. We took inspiration from a variety of different tools such as Kustomize, Kasane, Ksonnet and many others on our journey to creating a configuration management framework that is relatively simple to use, and follows some of the practices we’re used to as Puppet administrators.
Other design goals included:
- No templating engine
- Compatibility with existing deployment tools, like Helm
- Small binaries, with off the shelf tools used where needed
The end goal was to be able to take existing helm charts, or yaml installation manifests, then manipulate them to our needs. We chose to use jsonnet as the language of kr8 due to its data templating capabilities and ability to work with both JSON and YAML.
Terminology & Tools
kr8 itself is the only component of the kr8 framework that we wrote at Apptio. Its purposes are:
- Discover clusters in a hierarchical directory tree
- Discover components in a components directory
- Map components to clusters, using a cluster.jsonnet file
You can see the result of these purposes using a few of the tools in the kr8 binary, for example, listing clusters:
However, using the kr8 binary alone is probably not what you want to do. We bundle and use a variety of other tools with kr8 to achieve the ability to generate manifests for multiple clusters and deploy them.
Task does a lot of the heavy lifting for kr8. It is a task runner, much like Make but with a more flexible DSL (yep, it’s yaml/json!) and the ability to run tasks in parallel. We use Taskfiles for each component to allow us to build the component config. This gives us the flexibility to use rendering options for each component that make sense, whether it be pulling in a Helm chart or plain yaml. We can then input that yaml with kr8, and manipulate it with jsonnet code to add, modify the resulting kubernetes manifest. Alongside this, we use a taskfile to generate deployment tasks and to generate all components for a Task. This gives us the ability to execute lots of generate manifest jobs in relatively short periods of time.
We use Kubecfg to do the actual deployment of these manifests. Kubecfg gives us the ability to validate, diff and iteratively deploy Kubernetes manifests which
kubectl does not. The kubecfg jobs are generally inside Taskfiles at the cluster level.
Components are very similar to helm charts. They are installable resource collections for something you’d like to deploy to your Kubernetes clusters. A component has 3 minimal requirements:
params.jsonnet: contains configurable params for the component
Taskfile.yml: Instructions to render the component
- An installation source: This can be anything from a pure jsonnet file to a helm input values file. Ultimately, this needs to be able to generate some yaml
I intend to write many more kr8 blog posts and docs, detailing how kr8 can work, examples of different components and tutorials. Until then, take a look at these resources:
I want to thank Colin Spargo, who came up with the original concept of kr8 and how it might work, as well as contributing large amounts of the kr8 code. I also want to thank Sanyu Melwani, who had valuable input into the concept, as well as writing many kr8 components.
Finally, a thank you to our employer, Apptio, who has allowed us to spend time creating this tool to ease our Kubernetes deployment frustrations. If you’re interested in working on fun projects like this, we are hiring for remote team members