As part of this deployment, I deployed a healthcheck using Consul’s TCP Checks to check the puppetmasters was responding in its default port (8140). In Puppet, it looked like this:
The problem with this approach is that it’s a dumb check - the puppetmaster runs in a webserver and while the port might be open, what happens if the application is returning a 500 internal server error, for example?
In order to rectify this, I decided to make use of a Puppet HTTP API endpoint to query the status.
I must admit, I didn’t even know that Puppet had a HTTP API until recently. Looking through the docs brought up some gems, but the problem is that by default it’s pretty locked down - and rightly so. It’s a powerful API and a compromised Puppetmaster via API is a dangerous prospect.
While digging through the API docs, I found a nice status endpoint. However, while querying it, I got a 404 access denied:
This seems easily fixable and extremely useful. In order to make this work, I made a quick change to the auth.conf:
This needs go to above the default policy in auth.conf, which looks like this:
Now, when I try the curl command again, it works!
Sweet, now we can make a proper healthcheck!
Because we set the auth.conf entry to be auth any, it’s straightforward to make a query to the API endpoint. I used the nagios check_http check to get this looking nice. The command looks a bit like this:
Simply, we’re querying localhost on port 8140 and then providing an environment (production is my default environment). The Puppetmaster wants pson, so we send a PSON header, and then we check for the string is_alive. The output looks like this:
This is much, much better than our port check. If we get something other than a 200 OK HTTP code, we’re in trouble.
The original point of this post was replacing the consul check of TCP. In Puppet code, that looks like this:
We’ll now get an accurate an reliable healthcheck from our consul check!