Serverless computing is all the rage at the moment, and why wouldn’t it be? The idea of deploying code without having to worry about anything like servers, or that pesky infrastructure everyone complains about seems pretty appealing. If you’ve ever used AWS lamdba or one of its related cousins, you’ll be able to see the freedom that triggering functions on events brings you.
The increase in excitement around serverless frameworks means that naturally, there’s been an increase in providers in the Kubernetes world. A quick look at the CNCF Landscape page shows just how many options there are to Kubernetes cluster operators.
Kubeless appealed to me specifically for a few reasons:
- Native Kubernetes resources (CRDs) for functions, meaning that standard Kubernetes deployment constructs can be used
- No external dependencies to get started
- Support for PubSub functions without having to manually bind to messages queues etc
- Lots of language support with the runtimes
To follow along here you’ll need:
- A working kubeless deployment, including the kubeless cli
- A working NATS cluster, perhaps using the NATS Operator as well the Kubeless NATS Trigger installed
In this walkthrough, I wanted to show you how easy it is to get Kubernetes events (in this case, pod creations) and then use kubeless to perform actions on them (like post to a slack channel).
I’m aware there are tools out there that already fulfill this function (ie events to slack) but I figured it was a good showcase of what can be done!
Publishing Kubernetes Events
Before you can trigger kubeless functions, you first need to have events from Kubernetes published to your NATS cluster.
To do this, I used the excellent kubeless python library
An easy way to do this is simply connect to the API using the in_cluster capabilities and then list all the pods, like so:
This simple script will log all the information for pods in all namespaces to stdout. It can be run on your local machine, give it a try!
The problem with this is that it’s just spitting information to stdout to test locally, so we need to publish this events to NATS. In order to do this, we’ll use the python aysncio-nats libarary
Now, your script has gotten much more complicated:
Okay, so now we have events being pushed to NATS. We need to fancy this up a bit, to allow for running in and out of cluster, as well as building a Docker image. The final script can be found here. The changes are to include a logger module, as well as argparse to allow for running in and out of the cluster, as well as make some options configurable.
You should now deploy this to your cluster using the provided deployment manifests, which also include the (rather permissive!) RBAC configuration needed for the deployment to be able to read pod information from the API.
This will install the built docker container to publish events to the NATS cluster configured earlier. If you need to, modify the environment variable
NATS_CLUSTER if you deployed your NATS cluster to another address.
Consuming Events with Kubeless functions
So now the events are being published, we need to actually do something with them. Let’s first make sure the events are coming in.
You should have the kubeless cli downloaded by now, so let’s create a quick example function to make sure the events are being posted.
As you can probably tell, this function just dumps any event sent to it and returns. So let’s try it out. With kubeless, let’s deploy it:
What’s happening here, exactly?
- runtime: specify a runtime for the function, in this case, python 3.6
- from-file: path to your file containing your function
- handler: this is the important part. A handler is the kubeless function to call when an event is received. It’s in the format
. . So in our case, our file was called test.py and our function was called dump, so we specify `test.dump`.
- namespace: make sure you specify the namespace you deployed kubeless to!
So you should now have a function deployed:
So now, we need to make this function be triggered by the NATS messages. To do that, we add a trigger:
What’s going on here?
- We create a trigger with name test
- function-selector: use labels created by kubeless function to select the function to run
- trigger-topc: specify a trigger topic of k8s_events (which is specified in the event publisher from earlier)
- Same namespace!
Okay, so now, let’s cycle the event publisher and test things out!
You should see something like this as an output log:
This is the modification event for the pod you just cycled. Awesome!
Publish the event to slack
Okay, so now you’ve got some events being shipped, it’s time to get a little bit more creative. Let’s publish some of these events to slack.
You can create a
slack.py with your function in, like so:
You’ll need to deploy your function using the kubeless binary:
The only thing you might be confused about here is the
--dependencies file. Kubeless uses this to determine which dependencies you need to install for the function runtime. In the python case, it’s a requirements.txt. You can find a working one in the related github repo linked to this post.
You’ll obviously need a slack org to try this out, and need to generate a slack token to get API access. However, now, once you cycle the events pod again (or, run another pod of course!) - you’ll now see these events pushed to slack!
Obviously this is a trivial example of using these functions, but the power of the event pipeline with kubeless is there to be seen. Anything you might need to happy when certain events happen in your Kubernetes cluster can be automated using this Kubeless event pipeline.
You can check out all the code, deployment and manifests for this post in the github repo that accompanies this post. Pull requests and feedback on my awful Python code are also welcome!
A few months back, I wrote an article which got a bit of interest around the issues configuring and maintaining multiple clusters, and keeping the components required to make them useful in sync. Essentially, the missing piece of the puzzle was that there was no cluster aware configuration management tool.
Internally, we created an excellent tool at
$work to solve this using jsonnet, which has been very nice because it’s meant we get to use actual code to solve this problem. The only issue is that the code we have to write in is relatively niche!
When Pulumi was announced back in June, I got very excited. After seeing the that Pulumi now supports Kubernetes natively, I wanted to dig in and see if it would help with the configuration management problem.
Before I speak about the Kubernete–specific components, I want to do a brief introduction into what Pulumi actually is.
Pulumi is a tool which allows you to create cloud resources using real programming languages. In my personal opinion, it is essentially terraform with full programming languages on top of it, instead of HCL.
Pulumi uses the concept of a stack to segregate things like environments or feature branches. For example, you may have a stack for your development environment and one for your production environment.
These stacks store their state in a similar fashion to terraform. You can either store the state in:
- The public pulumi web backend
- locally on the filesystem
There is also an enterprise hosted stack storage provider. Hopefully it’ll be possible to have an open source stack storage some time soon.
Components allow you to share reusable modules, in a similar manner to terraform modules. This means you can write reusable code and create boilerplate for regularly reused resources, like S3 buckets.
Finally, there’s configuration ability within Pulumi stacks and components. What this allows you to do is differentiation configuration in the code depending on the stack you’re using. You specify these configuration values like so:
pulumi config set name my_name
And then you can reference that within your code!
Kubernetes Configuration Management
If you remember back to my previous post, the issue I was trying to solve was being able to install components that every cluster needs (as an example, an ingress controller) to all the clusters but with often differing configuration values (for example, the path to a certificate arn in AWS ACM). Helm helps with this in that it allows you to specify values when installing, but then managing, maintaining and storing those values for each cluster becomes difficult, and applying them also becomes hard.
There are two main reasons Pulumi is helping here. Firstly, it allows you to differentiate kubernetes clusters within stacks. As an example, let’s say I have two clusters - one in GKE and one in Digital Ocean. Here you can see them in my kubernetes contexts:
I can initiate a stack for each of these clusters with Pulumi, like so:
Obviously, you’d repeat the stack and config steps for each cluster!
Now, if I want to actually deploy something to the stack, I need to write some code. If you’re not familiar with typescript (I wasn’t, until I wrote this) you’ll need to generate a
package.json and a
Fortunately, Pulumi automates this for us!
If you’re doing a Kubernetes thing, you’ll need to select kubernetes-typescript from the template prompt.
Now we’re ready to finally write some code!
Deploying a standard Kubernetes resource
When you ran
pulumi new, you got an
index.ts file. In it, it imports pulumi:
You can now write standard typescript and generate Kubernetes resources. Here’s an example nginx pod as a deployment:
Here you can already see the power that writing true code has - defining a constant for defaults and allowing us to use those values in the declaration of the resource means less duplicated code and less copy/pasting.
The real power comes when using the config options we set earlier. Assuming we have two Pulumi stacks,
and these stacks are mapped to different contexts, like so:
you can now also set different configuration options and keys which can be used in the code.
This will write out these values into a
Pulumi.<stackname>.yaml in the project directory:
and we can now use this in the code very easily:
Now, use Pulumi from your project’s root to see what would happen:
Obviously you can specify whicever stack you need as required!
Let’s verify what happened…
Okay, so as you can see here, I’ve deployed an nginx deployment to two different clusters, with two different images with very little effort and energy. Awesome!
Going further - Helm
What makes this really really awesome is that Pulumi already supports Helm charts!
In my previous post, I made the comment that Helm has lots of community supported charts which have done a whole load of configuration for you. However, Helm suffers from 2 main problems (in my humble opinion)
- Templating yaml with Go templates is extremely painful when doing complex tasks
- The helm chart community can be slow to merge pull requests. This means if you find an issue with a chart, you unfortunately have to fork it, and host it yourself.
Pulumi really helps and solves this problem. Let me show you how!
First, let’s create a Pulumi component using a helm chart:
And now preview what would happen on this stack:
As you can see, we’re generating the resources automatically because Pulumi renders the helm chart for us, and then creates the resources, which really is very awesome.
However, it gets more awesome when you see there’s a callback called
transformations. This allows you to patch and manipulate the generated resources! For example:
Of course, you can combine this with the configuration options from before as well, so you can override these as needed.
I think it’s worth emphasising that this improves the implementation time for any Kubernetes deployed resource dramatically. Before, if you wanted to use something other than helm to deploy something, you had to write it from scratch. Pulumi’s ability to import and them manipulate rendered helm charts is a massive massive win for the operator and the community!
I think Pulumi is going to change the way we deploy things to Kubernetes, but it’s also definitely in the running to solve the configuration management problem. Writing resources in code is much much better than writing yaml or even jsonnet, and having the ability to be flexible with your deployments and manifests using the Pulumi concepts is really exciting.
I’ve put the code examples from the blog post in a github repo for people to look at and improve. I really hope people try out Pulumi!
It’s been over a year since my last blog post, and since then I’ve been working on Kubernetes almost exclusively for $employer. During that time, I’ve noticed a growing need for something that many people in the DevOps/SRE/Sysadmin world take for granted. I wanted to come out of my blog post hiatus to discuss that need, and try and create a call to arms around this particular problem.
Back in the “old days” - way before I even got started in this game, a bunch of sysadmins decided that manually configuring their hosts was silly. Sysadmins at the time would configure their fleet of machines, whether it be large or small, using either a set of custom and hand crafted processes and runbooks, or maybe if they were very talented, a set of perl/bash or PHP scripts which would take weeks to figure out for anyone new to the org.
People got pretty tired of this, and a new suite of tools was born - configuration management. From CFEngine, to Puppet, to Chef, to Ansible, these tools came along and revolutionized our industry.
The driving forces behind this tooling was the need to know that machines looked the way you want them to, and also be able to describe that state with code. The Infrastructure as Code movement started with configuration management, and has evolved even further since then, to bring us tools like Terraform, Habitat and Vagrant.
What about Kubernetes?
So how does configuration management fit in with Kubernetes? The first thing you might be thinking is that deploying things to Kubernetes already solves the two problems above. Kubernetes is a declarative system, and you can simply define a deployment with a yaml config and Kubernetes takes care of the rest. You have the “code” there (although yaml is a poor imitation of code!) and then Kubernetes takes care of the rest. It will manage your process, ensure it converges and looks the way it should, and continue from there.
So what’s the problem?
The Kubernetes Abstraction Layer
When you think about what Kubernetes does for an organization, it’s worth thinking about what Kubernetes does for an engineering organization.
Before Kubernetes, the management layer generally existed above the Operating System. You would generally operate with individual machines when managing applications, and linking them together involved putting services on top of other operating systems which would allow to link those applications together. You probably automated this process with configuration management, but ultimately the operators worked at the operating system layer.
With Kubernetes hitting the marketplace, that abstraction layer has changed. To deploy an application, operators no longer need to install software on the OS. Kubernetes takes care of all that for you.
So scaling applications at the machine layer was managed by configuration management and automated. You had many machines, and you needed to make sure they all looked the same. However, with Kubernetes coming along, how do you do the same there?
So now we get to the meat of the problem. At $employer, we run multiple Kubernetes clusters across various “tiers” of infrastructure. Our Development environment currently has 3 clusters, for example.
Generally, we manage the installation of the components for the Kubernetes cluster (the kubelet, api server, controller manager, kube-proxy, scheduler) etc with Puppet. All of these components live at the operating system layer, so using a configuration management tool for the installation makes perfect sense.
But there are components of a Kubernetes cluster that need to be the same across all the clusters that are being managed. Some examples:
- Ingress Controllers. All clusters generally need an ingress controller of some kind, and you generally want it be the same across all your clusters so there are no surprises.
- Monitoring. Installing prometheus usually makes sense, and monitoring at the hardware layer is taken care of by DaemonSets. However, how do you make sure the prometheus config works the same across all clusters? Prometheus installations are designed to be small and modular, so you want to repeat this process over and over again.
- Kubernetes Dashboard. Generally there will be some users who want to see a picture, and the dashboard helps with that.
These are just some examples, but there are more. So, how do you do this?
We adopted Helm charts at $employer for a few reasons:
- The helm charts for bigger projects are generally well maintained, and have sane defaults out of the box.
- Helm allows you to do reasonable upgrades of things using helm releases.
- Helm’s templating system allows you to manipulate configuration values for cluster level naming etc.
The problem is that syncing helm releases across clusters isn’t really very easy. Therein lies the problem: how do I sync resources across Kubernetes cluster. Specifically, how do I sync helm releases across clusters? I want the same configuration but some slight changes in Helm values.
I believe this problem is unsolved. Here’s a few of the things I’ve tried.
The (unsatisfactory) solutions
Because my mind initially goes towards Puppet as configuration management, my first attempted solution used Puppet alongside Helm. I found the Puppetlabs Helm Module and got to work. I even refactored it and sent some pull requests to improve its functionality.
Unfortunately, I ran across a few issues, and these mainly related to Puppet’s lack of awareness of concepts higher than the OS, Puppet’s less than ideal management of staged upgrades, as well as some missing features in the module:
- Declaring helmcharts as Puppet defined types means they get initialized on all masters. This can cause issues and race conditions. The way Puppet determines if it should install a helm chart is by running a
helm lson the machine its operating on and finding the release. This isn’t ideal, because if a master gets out of sync with the cluster, you come across problems.
- The module itself has different defined types for helm chart installations and upgrades. You can’t just change the helm module version, or different values for the
values.yaml. This requires custom workflows for helm upgrades and makes it really difficult to manage across multiple installations.
- Sometimes when updating a helm release, you might want to do certain actions before updating the release, like switch a load balancer config. You can’t do this with Puppet, it’s a manual process and requires some knowledge of multi-stage deployments. Puppet isn’t great at this, so there has to be something else involved.
This is originally when my thoughts around this issue began to form. After coming to the conclusion that Puppet wouldn’t quite cut it, I began to look at alternatives..
My initial thought was to use ansible. Ansible has a native helm module and is (by their own admission) designed for multi-stage deployments from the get-go. Great!
Unfortunately, the reality was quite different.
- The module has at least two different issues open stating that the module is broken, and not ready: here and here. There is even a comment stating a user switch to the k8s_raw ansible module.
- Setting up and installing pyhelm, the module’s dependency, caused us a lot of issues on OS X. We had to run it in a docker container
- The ansible module relies on you configuring manual access to Tiller (helm’s orchestrator inside the k8s cluster) manually, which is not easy to do.
Once it was realised that the ansible helm module isn’t really ready for prime time, we reluctantly let Ansible go as an option. I’m hoping for this option to potentially improve in future, as I think personally this is the best approach
Terraform, by Hashicorp, was also considered as a potential option. It epitomizes the ideals of Infrastructure as Code by being both idempotent, as well as declarative. It interacts directly with the APIs of major cloud providers, and has a relatively short ramp-up/learning curve. Hashicorp also provides a well-written provider specifically for Kubernetes as well as a community written Helm provider and I was keen to give it a try. Unfortunately, we encountered issues that echoed the problems faced with Puppet:
- Terraform simply defines an endstate, and makes it so, similar to Puppet but at a different level. The multi stage deployments (ie do X, then do Y) aren’t really supported in Terraform which causes us problems when switching config.
- We have several clusters in AWS, but some outside AWS. Having to store state for non AWS resources in an S3 bucket or such like was annoying. Obviously it’s possible to store state elsewhere, but it’s not a simple process.
- By default, Terraform destroys and recreates resources you change. You can modify this behaviour using lifecycle but it’s mentality is very different than the Kubernetes/Helm concepts.
I was very excited by Ksonnet at one point. Ksonnet is a Heptio project that is designed to streamline deployments to Kubernetes. It has a bunch of really useful concepts, such as environments which allow you to tailor components to a unique cluster. This would have been perfect to run from a CI pipeline, unfortunately the issues were:
- Doesn’t support helm (yet). This means anything you install, you need to take all of the helm configuration that is figured out for you and write it again. A lot of overhead
- Has a steep learning curve. Jsonnet (which ksonnet uses under the hood) is a language itself, which takes some learning and isn’t well documented.
- Ksonnet’s multi cluster support still hasn’t really been figured out, so you need to manage the credentials manually and switch between them across clusters.
There were a few other concepts which looked ideal, but we didn’t try as they were missing key components.
kubed can sync configmaps across namespaces and even clusters. If only it could do the same with other resources, it would have been perfect!
helmfile allows you to create a list of helm charts to install, but at the time of looking at it, it didn’t really support templating for cluster differences, so it wasn’t considered. It now seems to support that, so maybe it’s time to reconsider it.
We briefly considered rewriting all the helm charts into standard ansible k8s_raw tasks, and putting the templating into the ansible variables. The amount of work to do that isn’t trivial, but we may have to go down that route.
So, as you can see here, this problem isn’t solved. In summary, the requirements seem to be a tool that is:
- Aware of multiple Kubernetes Clusters
- Supports templating
- Can perform operations in stages
- Understands helm charts and releases
- Is declarative
Unfortunately, I don’t believe this tool exists right now, so the search continues!
Day 3 of Kubecon! Before I begin, I have to make it clear that this was another day of frustration for me. As it was yesterday, all of the talks I really wanted to see were completely overflowing, and this was despite me making efforts to get to the talks well in advance.
The organisers did what they could to alleviate this, such as moving the deep dive track into the larger main conference room, but ultimately people paid vast sums of money to attend this conference, and I firmly believe they over subscribed the event and it was detrimental to the people attending. The building can hold X number of people, and it was obvious that this number of tickets were sold, but the smaller breakout rooms were very popular and ultimately it was impossible to take in the fantastic content because of the overcrowding.
I found the short Huawei talk very interesting. Firstly, the scale they’re operating at is inspirational, and the way they’ve solved those problems and the benefits they’ve reaped from implementing a cloud native approach is very impressive.
I was also intrigued by their in house/custom networking solution, iCAN. Always interesting to see how large enterprise innovate, given the resources at their disposal.
Scaling Kubernetes Users
Joe Beda of Heptio talked a little bit about the user experience of Kubernetes. Starting with the mantra of “Kubernetes sucks, like all software” is an interesting opening gambit from a co-founder of Kubernetes, but he backed this up with detailed examples of how the user experience can improve. One of my favourite thoughts of his was the idea that operators build software for people like us and this definitely ties with my experiences of Kubernetes being intimating to the average user.
Finally, Kelsey Hightower talked about Federation, and how he sees it being used in the Kubernetes landscape. As usual, Kelsey was a quote machine. Some of the more memorable ones:
- “Hybrid Cloud” means absolutely nothing!”
- “If you can run your business on one node, congratulations! You can go home and watch other people’s postmortems”
- “If you have 2 nodes, and you’re running Kubernetes, we’re going to have a conversation”
- “The result of configuration mgmt was devops, group therapy for inefficient tools”
After Kelsey was done cracking the audience up, there was a demo of ingress federation and the expected results. This was quite interesting to me, because one of the things I’ve been hoping to do is use federation to move workloads, but Kelsey’s argument is that you absolutely should not do that. It’s meant to be used to get an overview of the cluster as a whole, not magically move workloads around. As always, a very useful talk from Kelsey.
Using the k8s go client
Aaron Schlesinger did a fantastic demo of writing a custom thirdparty resource (in his case, a backup resource) using the golang API client. It was very informative, even for someone like myself who has written quite a few things that use the golang client. I highly recommend checking out the github repo to get an idea what was covered.
I half attended a storage talk by Björn Rabenstein detailing some of the ways of dealing with storage in Prometheus. This seemed interesting, but again it was difficult because of how full the room was. Essentially, what I took away from it was that storage in prometheus is not plug and play, and it’s worth reading the docs to ensure you’re doing it right.
The guys at kubermatic detailed k8sniff an interesting layer 3 TLS load balancer, that can terminate TLS down to the pod level. For those people offering kubernetes as a service to customers, this seems invaluable for separating traffic for different customers, however this isn’t a problem I’ve personally seen.
Consul & Ingress at Concur
Finally, I saw a fantastic talk about leveraging Consul & Ingress controllers to route traffic to pods. There was a discussion about the existing method, which used a custom load balancer endpoint and the pitfalls, and then porting that to a consul/ingress model. I thought this was quite interesting, but I also felt like you’d be losing some of the advantages of kubernetes by using consul to hit your pod endpoint IPs directly, such as the advantage of service IPs, as well as integrating your pod network to make it routable to external infrastructure. Nevertheless, it was very interesting to hear how a large company like Concur manages to sovle these complex problems.
Day 2 of KubeCon was absolutely jam packed! There were lots of tracks, so I won’t be able to cover everything that happened, but hopefully I can recap some of the stuff I found interesting.
One thing to note is that the Technical deep dive rooms were dramatically over subscribed, to the point where I walked out of some of them halfway through because the environment was unbearable. Maybe someone else will be able to recap those.
The kubernetes 1.6 announcement was done during the keynote, and we had a fantastic demo by Aparna Sinha of some of these features.
Pod Affinity/Anti Affinity
The pod affinity/anti affinity was shown off, which allows you to schedule workloads on certain nodes in your cluster. Anti affinity seemed particularly interesting to me, because it means you can say “if pod has label X, don’t schedule anything with the same label on a node” which is very powerful for environments where you need to ensure failure domains stay low. It also seems very obvious in terms of affinity, where you need to schedule workloads on certain nodes and ensure they stay there. As our master of ceremonies Kelsey Hightower would say - super dope!
Dynamic Storage Provisioning
Another thing that caught my eyes was the dynamic storage provisioning becoming a standard within 1.6. This has been around a while, and it basically goes off and creates volumes for you (EBS, GKE etc etc) and then creates a persistent volume claim for your pod to consume. Aparna did a demo of this and it really showed the power of this model when creating dynamic workloads. I’ve been doing this in the beta/alpha tree for a while, and I hope to have a blog post up soon showing this off for FlexVolumes and how you can extend it with out of tree provisioners!
Monzo & Linkerd
I attended a fantastically interesting talk co-chaired by Monzo and Buoyant to talk about how they worked together at Monzo. Monzo is incredibly interesting in that they built a bank (yes, you read that right, a fucking bank) powered by Kubernetes.
The story started with Monzo explaining how their previous architecture relied on infrastructure services like RabbitMQ to ensure high reliability, but this was failing them. They were very candid about a recent outage they had, which meant payment processing stopped. This seems like an incredible nightmare for a bank, if you can’t buy stuff because you can’t use your card, it can cause a real drop in trust. However, along came Linkerd to operate as a fabric in their k8s cluster which vastly increased their reliability.
There were several talks about service catalog maturing as an API, and this is a relatively new concept to me in kubernetes, but it’s incredibly interesting. After a great chat with the folks at the Deis booth, it made more sense.
Essentially, the service catalog will allow developers/applications to consume resources that may or may not live outside the API. The best example of this that I got during the day is as follows.
- You may or may not run your database outside your cluster.
- Your developers want to be able to provision and consume that resource, by creating a new database for their app and/or creating credentials to login
- Traditionally, you would have this be a manual process
- However, with a service catalog resource, you can make an API call, and it will provision the database and then you can bind to the resource, which will provide you with API generated credentials.
Needless to say, the potential here is mind boggling. With full kubernetes audit logging, and API driven provisioning of external components to the cluster, it becomes really obvious how you can create a programmable datacenter even outside of AWS/Google clouds.
The most up to date example of this is Steward by Deis. This is still in alpha state, but it’s definitely something I think needs to be considered when moving towards “Cloud native” workloads.
Some other fun tidbits I picked up today:
James Munnelly of Jetstack has created a very awesome keepalived cloud provider, which operates as an out of tree load balancer type, allowing those of us unlucky enough to be running bare metal based kubernetes clusters to use the load balancer service type.
The kubernetes job market is looking pretty healthy based on the job board
Having a grafana dashboard or the conference wifi network is pretty awesome