Published Feb 17, 2016 by Lee Briggs
In my last post I wrote about service discover with my Puppetmasters using consul
As part of this deployment, I deployed a healthcheck using Consul’s TCP Checks to check the puppetmasters was responding in its default port (8140). In Puppet, it looked like this:
::consul::check { 'puppetmaster_tcp':
interval => '60',
tcp => 'localhost:8140',
notes => 'Puppetmasters listen on port 8140',
service_id => 'puppetmaster',
}
The problem with this approach is that it’s a dumb check - the puppetmaster runs in a webserver and while the port might be open, what happens if the application is returning a 500 internal server error, for example?
In order to rectify this, I decided to make use of a Puppet HTTP API endpoint to query the status.
I must admit, I didn’t even know that Puppet had a HTTP API until recently. Looking through the docs brought up some gems, but the problem is that by default it’s pretty locked down - and rightly so. It’s a powerful API and a compromised Puppetmaster via API is a dangerous prospect.
Managing this is done via auth.conf and you use the allow directive.
While digging through the API docs, I found a nice status endpoint. However, while querying it, I got a 404 access denied:
curl --cert /var/lib/puppet/ssl/certs/puppetmaster.example.com --key /var/lib/puppet/ssl/private_keys/puppetmaster.example.com.pem --cacert /var/lib/puppet/ssl/ca/ca_crt.pem -H 'Accept: pson' https://puppetmaster.example.com:8140/production/status/test?environment=production
Forbidden request: puppetmaster.example.com(192.168.4.21) access to /status/test [find] authenticated at :119
This seems easily fixable and extremely useful. In order to make this work, I made a quick change to the auth.conf:
# allow access to the status API call to test if the master is alive
path /status
auth any
method find
allow_ip 192.168.4.21,127.0.0.1
This needs go to above the default policy in auth.conf, which looks like this:
# deny everything else; this ACL is not strictly necessary, but
# illustrates the default policy.
path /
auth any
Now, when I try the curl command again, it works!
curl --cert /var/lib/puppet/ssl/certs/puppetmaster.example.com --key /var/lib/puppet/ssl/private_keys/puppetmaster.example.com.pem --cacert /var/lib/puppet/ssl/ca/ca_crt.pem -H 'Accept: pson' https://puppetmaster.example.com:8140/production/status/test?environment=production
{"is_alive":true,"version":"3.8.4"}
Sweet, now we can make a proper healthcheck!
Because we set the auth.conf entry to be auth any, it’s straightforward to make a query to the API endpoint. I used the nagios check_http check to get this looking nice. The command looks a bit like this:
/usr/lib64/nagios/plugins/check_http -H localhost -p 8140 -u /production/status/test?environment=production -S -k 'Accept: pson' -s '"is_alive":true'
Simply, we’re querying localhost on port 8140 and then providing an environment (production is my default environment). The Puppetmaster wants pson, so we send a PSON header, and then we check for the string is_alive. The output looks like this:
HTTP OK: HTTP/1.1 200 OK - 312 bytes in 0.127 second response time |time=0.127082s;;;0.000000 size=312B;;;0
This is much, much better than our port check. If we get something other than a 200 OK HTTP code, we’re in trouble.
The original point of this post was replacing the consul check of TCP. In Puppet code, that looks like this:
::consul::check { 'puppetmaster_healthcheck':
interval => '60',
script => "/usr/lib64/nagios/plugins/check_http -H ${::fqdn} -p 8140 -u /production/status/test?environment=production -S -k 'Accept: pson' -s '\"is_alive\":true'",
notes => 'Checks the puppetmaster\'s status API to determine if the service is healthy',
service_id => 'puppetmaster',
}
We’ll now get an accurate an reliable healthcheck from our consul check!