Published Mar 23, 2024 by Lee Briggs
Do you ever really think about how you get access to your Kubernetes control plane? Whatever mechanism you use to provision your cluster, you get a KUBECONFIG
and usually just go on your merry way to overcomplicating your infrastructure.
However, if you’ve ever looked at your KUBECONFIG
you’ll see you have a server address.
You can check the health of your cluster by doing the following:
curl -k $(kubectl config view --output jsonpath='{.clusters[*].cluster.server}')/healthz
Assuming everything is working as expected (and if it isn’t, you should probably stop reading and go figure out what), it should return ok
.
Has it ever come to your attention that this just works from a networking perspective? More than likely you didn’t have to a connect to a VPN, or SSH into a bastion/jump host host?
How did I know that? Well, because the vast majority of Kubernetes clusters are just hanging out on the public internet, without a care in the world.
If you search shodan.io for Kubernetes clusters you’ll see there’s almost 1.4 million clusters, readily accessible and open to the scary, scary world.
For reasons I can’t quite understand, we’ve sort of collectively decided that it’s okay to put our Kubernetes control planes on the public internet. At the very least, we’ve sort of decided it’s okay to give them a public IP address - sure you might add some security groups of firewall rules from specific IP addresses, but the control plane is still accessible from the internet.
To put this into some sort of perspective, how many of the people reading this get a cold shudder when they think about putting their database on the public internet? Or a windows server with RDP? Or a Linux server with SSH?
Established practices say this is generally not a good idea, and yet in order to make our lives easier, we’ve decided that it’s okay to let every person and their dog free access to try and make our clusters theirs.
As soon as you put something in a private subnet in the cloud, you add a layer of complexity to the act of actually using it. You have to explain to everyone who needs access to it, including that developer who’s just joined the team, where it is and how to use it. You might use an SSH tunnel, or a bastion instance, or god forbid that VPN server someone set up years ago that nobody dares touch.
We sort of accept these for things like databases because we very rarely need to get into them except in case of emergency, and we think it’s okay to have to route through something else because the data in them is important enough to protect.
In addition to this, when cloud providers started offering managed Kubernetes servers, most of them didn’t even have private control planes. It was only in the last few years that they started offering this as a feature, and even then, it’s not the default.
So the practice of putting a very important API on the internet has proliferated because it’s just easier to do it that way. The real concern here is that we’re one severe vulnerability from having a bitcoin miner on every Kubernetes cluster on the internet.
I previously wrote about the Tailscale Kubernetes Operator and its ability to expose services running inside your Kubernetes cluster to your Tailnet, but it has another amazing feature:
It can act as a Kubernetes proxy for your cluster.
Well, let’s say you provision a Kubernetes cluster in AWS. You decide that you that in order to give yourself another layer of protection, you’re going to make sure the control plane is only accessible within the VPC.
If you install the Tailscale Kubernetes operator and set the apiServerProxyConfig
flag, it’ll create a device in your Tailnet that makes it accessible to anyone on the tailnet. This means before you’re able to use the cluster, you need to be connected to the Tailnet. All of that pain I mentioned previously with Bastion hosts and networking just vanishes into thin air.
Let’s take it for a spin!
You’ll need to have your own Tailnet, and be connected to it. You can sign up for Tailscale and it’s free for personal use.
Once that’s done, you’ll need to make a slight change to your ACL. The Kubernetes operator uses Tailscale’s tagging mechanism so let’s create a tag for the operator to use, and then a tag for it to give to client devices it registers:
"tagOwners": {
"tag:k8s-operator": ["autogroup:admin"], // allow anyone in admin group to own the k8s-operator tag
"tag:k8s": ["tag:k8s-operator"],
},
Then, register an oauth client with Devices
write scope and the tag:k8s-operator
tag.
There’s a lot of ways to do this, choose the way you prefer. If you’re a fan of eksctl, this page shows you how to create a fully private cluster.
You can of course use the defaults and try this out with a public cluster if you like, but I’m going to assume you’re doing this because you want to make your cluster private.
You may have to do this from inside your actual VPC, because remember, any post install steps that interact with the Kubenrnetes API server won’t work. I leverage a Tailscale Subnet Router to make this easier, more on this later.
The easiest way to do this is with Helm. You’ll need to add the Tailscale Helm repository:
helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm repo update
Then install the operator with your oauth keys from the prerequites step, and enable the proxy:
helm upgrade \
--install \
tailscale-operator \
tailscale/tailscale-operator \
--namespace=tailscale \
--create-namespace \
--set-string oauth.clientId=${OAUTH_CLIENT_ID}> \
--set-string oauth.clientSecret=${OAUTH_CLIENT_SECRET} \
--set-string apiServerProxyConfig.mode="true" \
--wait
Now, if you look at your Tailscale dashboard, or use tailscale status
you should see a couple of new devices - the operator and a service just for the API server proxy.
You can now access your Kubernetes cluster from anywhere that has a Tailscale client installed, no faffing required.
You’ll need to configure your KUBECONFIG
using the following command:
tailscale configure kubeconfig <hostname>
This will set up your KUBECONFIG
to use the Tailscale API server proxy as the server for your cluster. If you examine your KUBECONFIG
you should see something like this:
apiVersion: v1
clusters:
- cluster:
server: https://eks-operator-eu.tail5626a.ts.net
name: eks-operator-eu.tail5626a.ts.net
contexts:
- context:
cluster: eks-operator-eu.tail5626a.ts.net
user: tailscale-auth
name: eks-operator-eu.tail5626a.ts.net
current-context: eks-operator-eu.tail5626a.ts.net
kind: Config
users:
- name: tailscale-auth
user:
token: unused
You probably have some questions. Firstly, what’s that unused
token all about? What does noauauthmodeth
mean in our operator installation? How does this work?
Well, if you run a basic kubectl
command now (and assuming you’re connected to your Tailnet) you’ll get something back, but it won’t help much:
kubectl get nodes
error: You must be logged in to the server (Unauthorized)
What’s happened here? Well, the good news is, we’ve been able to route to our private Kubernetes control plane, but we’re not sending any information back about who we are. So let’s make a small change to our Tailscale ACL in the Tailscale console. Add the following:
"grants": [
{
"src": ["autogroup:admin"], // allow any tailscale admin
"dst": ["tag:k8s-operator"], // to contact any device tagged with k8s-operator
"app": {
"tailscale.com/cap/kubernetes": [{
"impersonate": {
"groups": ["system:masters"], // use the `system:masters` group in the cluster
},
}],
},
},
],
By adding this grant, I’m assuming you’re an admin of your Tailnet.
Now, give that kubectl
command another try.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-18-183-129.eu-central-1.compute.internal Ready <none> 13h v1.29.0-eks-5e0fdde
ip-172-18-187-244.eu-central-1.compute.internal Ready <none> 3d18h v1.29.0-eks-5e0fdde
ip-172-18-42-205.eu-central-1.compute.internal Ready <none> 3d18h v1.29.0-eks-5e0fdde
That’s more like it.
As you can see here, I’ve solved two distinct problems:
Now, if you’re already happy with your current authorization mode, you can still use Tailscale’s access mechanism to solve the routing problem. In this particular case, you’d install the Operator in noauth
mode and then use your cloud providers existing mechanism to retrieve a token.
Modify your Tailscale Operator installation like so:
helm upgrade \
--install \
tailscale-operator \
tailscale/tailscale-operator \
--namespace=tailscale \
--create-namespace \
--set-string oauth.clientId=${OAUTH_CLIENT_ID}> \
--set-string oauth.clientSecret=${OAUTH_CLIENT_SECRET} \
--set-string apiServerProxyConfig.mode="noauth" \ # using noauth mode
--wait
Once that’s installed, if you run your kubectl
command again, you’ll see a different error message:
kubectl get nodes
Error from server (Forbidden): nodes is forbidden: User "jaxxstorm@github" cannot list resource "nodes" in API group "" at the cluster scope
The reason for this is a little obvious, the EKS cluster I’m using has absolutely no idea who jaxxstorm@github
is - because it uses IAM to authenticate me to the cluster.
So let’s modify our KUBECONFIG
to retrieve a token as EKS expects. We’ll modify the user
section to leverage an exec
directive - it should look a little bit like this:
apiVersion: v1
clusters:
- cluster:
server: https://eks-operator-eu.tail5626a.ts.net
name: eks-operator-eu.tail5626a.ts.net
contexts:
- context:
cluster: eks-operator-eu.tail5626a.ts.net
user: tailscale-auth
name: eks-operator-eu.tail5626a.ts.net
current-context: eks-operator-eu.tail5626a.ts.net
kind: Config
users:
- name: tailscale-auth
user:
exec:
apiVersion: "client.authentication.k8s.io/v1beta1"
command: "aws"
args: [
"eks",
"get-token",
"--cluster-name",
"kc-eu-central-682884c" # replace with your cluster name
]
env:
- name: KUBERNETES_EXEC_INFO
value: '{\"apiVersion\": \"client.authentication.k8s.io/v1beta1\"}'
Now we’re able to route to the Kubernetes control plane, and authenticate with it using the cloud providers authorization mechanism.
A common question I get asked is why wouldn’t I just use a subnet router in the VPC to route anything to the private address of the control plane? I leveraged this mechanism when I installed the operator, because my control plane was initially unrouteable from the internet anyway.
This is a legimate solution to the problem, and if you don’t want to use the operator in auth
mode, keep living your life. However, one benefit you get by installing the operator and talking directly to it via your KUBECONFIG
is being able to use Tailscale’s ACLs to dictate who can actually communicate with the operator.
If you recall our grant
from earlier we were able to dictate who was able to impersonate users inside the cluster in auth
mode, but with Tailscale’s ACL system we can also be prescriptive about connectivity. Consider if you removed the default, permissive ACL
//{"action": "accept", "src": ["*"], "dst": ["*:*"]}, # commented out because hujson
Then defining a group of users who can access the cluster, and adding a more explicit ACL:
"groups": {
"group:engineers": ["jaxxstorm@github"],
},
// some other stuff here
"acls": [
// Allow all connections.
//{"action": "accept", "src": ["*"], "dst": ["*:*"]},
{
"action": "accept",
"src": ["group:engineers"],
"dst": ["tag:k8s-operator:443"],
},
],
I can be much more granular about my access to my cluster, and have a Zero Trust model for my Kubernetes control plane at the network level as well as the authorization level. Your information security team will love you for this.
When you provision the operator in the cluster, you can modify the tags it uses to even further allow you to segment your Tailnet, see the operator config in the Helm chart’s values.yaml
Just remember to ensure your oauth clients has the correct permissions to manage those tags!
If you have the operator installed in auth
mode, you can scope the access both at the network level (using the aforementioned tags) and the Kubernetes RBAC system.
Let’s say we want to give our aforementioned engineers group access to only a single namespace called demo
.
First, we’d create the ClusterRole (or Role) and then cluster role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: engineers-clusterrole
rules:
- apiGroups: ["", "apps", "batch", "extensions"]
resources: ["*"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: engineers-rolebinding
namespace: demo
subjects:
- kind: Group
name: engineers # note the group name here
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: engineers-clusterrole
apiGroup: rbac.authorization.k8s.io
Then update our Tailscale ACL to modify the grants:
"grants": [
{
"src": ["group:engineers"],
"dst": ["tag:k8s-operator"],
"app": {
"tailscale.com/cap/kubernetes": [{
"impersonate": {
"groups": ["engineers"],
},
}],
},
},
],
Now, if I try to access the demo
namespace, I can do the stuff I need to do:
kubectl get pods -n demo
NAME READY STATUS RESTARTS AGE
demo-streamer-e3a170e4-85f4f7b88-gppcn 1/1 Running 0 16h
demo-streamer-e3a170e4-85f4f7b88-xc7gz 1/1 Running 0 16h
But not in the kube-system
namespace:
kubectl get pods -n kube-system
Error from server (Forbidden): pods is forbidden: User "jaxxstorm@github" cannot list resource "pods" in API group "" in the namespace "kube-system"
As always with the posts I write on here, I’m writing in my personal capacity, but obviously I’m a Tailscale employee with a vested interest in the success of the company. Do I want you to sign up for Tailscale and pay us money? You bet I do. Do I want you to get your Kubernetes clusters off the public internet even if you don’t want to sign up for Tailscale and pay us money? Yes.
I’ve always had an uneasy feeling about these public clusters, and I can’t help but feel we’re one RCE away from a disaster.
So now you know how easy it is to get your Kubernetes control plane off the internet, what are you waiting for?