Statsd(bucky) / graphite / grafana

graphite install
yum install -y gcc gcc-c++ libffi libffi-devel httpd24 httpd24-tools mysql-server mysql MySQL-python27 mod24_wsgi-python27 cairo-devel freetype* urw-fonts
pip install cairocffi pytz scandir

export PYTHONPATH=”/opt/graphite/lib/:/opt/graphite/webapp/”
pip install –no-binary=:all: https://github.com/graphite-project/whisper/tarball/master
pip install –no-binary=:all: https://github.com/graphite-project/carbon/tarball/master
pip install –no-binary=:all: https://github.com/graphite-project/graphite-web/tarball/master

export GRAPHITE_ROOT=/opt/graphite

vim $GRAPHITE_ROOT/webapp/graphite/local_settings.py
add —
DATABASES = {
‘default’: {
‘NAME’: ‘graphiteDB’,
‘ENGINE’: ‘django.db.backends.mysql’,
‘USER’: ‘graphite’,
‘PASSWORD’: ‘${PASSWORD}’,
‘HOST’: ‘localhost’,
‘PORT’: ‘3306’
}
}

sudo /etc/init.d/mysqld start
sudo /usr/bin/mysqladmin -u root password ‘${PASSWORD}!’
mysql -u root -p${PASSWORD}
CREATE USER ‘graphite’@’localhost’ IDENTIFIED BY ‘${PASSWORD}’;
GRANT ALL PRIVILEGES ON *.* TO graphite@’%’ IDENTIFIED BY ‘${PASSWORD}’;
GRANT ALL PRIVILEGES ON *.* TO graphite@’localhost’ IDENTIFIED BY ‘${PASSWORD}’;
FLUSH PRIVILEGES;
exit;
mysql -u graphite -p${PASSWORD}
create database graphiteDB;
exit;

PYTHONPATH=$GRAPHITE_ROOT/webapp django-admin.py migrate –settings=graphite.settings –run-syncdb
PYTHONPATH=$GRAPHITE_ROOT/webapp django-admin.py collectstatic –noinput –settings=graphite.settings

cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite-vhost.conf
vim /etc/httpd/conf.d/graphite-vhost.conf
add —
<Directory /opt/graphite/static/>
<IfVersion < 2.4>
Order deny,allow
Allow from all
</IfVersion>
<IfVersion >= 2.4>
Require all granted
</IfVersion>
</Directory>

cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi

cd /opt/graphite/conf
from .example
carbon.conf
storage-aggregation.conf
storage-schemas.conf

yum install collectd
sudo pip install bucky
sudo mkdir /etc/bucky
vim /etc/bucky/bucky.conf
add contents from https://github.com/trbs/bucky
add “/usr/share/collectd/types.db” to types.db list in /etc/bucky/bucky.conf

create init scripts for bucky and carbon

#!/bin/bash
# bucky Init script for running the bucky daemon
#
#
# chkconfig: – 98 02
#
# description: some description
# processname: bucky

PATH=/usr/bin:/sbin:/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:$PATH
export PATH

lockfile=’/var/lock/subsys/bucky’
pidfile=’/var/run/bucky.pid’
bucky=’/usr/local/bin/bucky’
config=’/etc/bucky/bucky.conf’
logfile=’/var/log/bucky/bucky.log’

RETVAL=0

# Source function library.
. /etc/rc.d/init.d/functions

# Determine if we can use the -p option to daemon, killproc, and status.
# RHEL < 5 can’t.
if status | grep -q — ‘-p’ 2>/dev/null; then
pidopts=”-p $pidfile”
fi

start() {
echo -n $”Starting bucky daemon: ”
$bucky $config >> $logfile 2>&1 &
RETVAL=$?

local PID=`pgrep -f “${bucky} ${config}”`
echo $PID > ${pidfile}

[ $RETVAL -eq 0 ] && (touch ${lockfile}; echo_success) || echo_failure
echo

return $RETVAL
}

stop() {
echo -n $”Stopping bucky daemon: ”
killproc $pidopts $bucky
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f ${lockfile} ${pidfile}
}

restart() {
stop
start
}

rh_status() {
status $pidopts $bucky
RETVAL=$?
return $RETVAL
}

rh_status_q() {
rh_status >/dev/null 2>&1
}

case “$1″ in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
condrestart|try-restart)
rh_status_q || exit 0
restart
;;
status)
rh_status
;;
*)
echo $”Usage: $0 {start|stop|restart|condrestart|try-restart|status}”
exit 1
esac

exit $RETVAL

yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.2.0-1.x86_64.rpm

Pushin’ and Popin’ like a pro in Bash

This was blatantly stolen from here.

Ever wish

$ cd -

took you back to the previous directory more than once?

The answer was so obvious once I saw it that I smacked my head and said “D’OH” out loud (causing several people around the office to give me that ‘wth!?’ look.)

Stick this in your .bashrc and be welcomed into the world of directory history:

function cd {
    if (("$#" > 0)); then
        if [ "$1" == "-" ]; then
            popd > /dev/null
        else
            pushd "$@" > /dev/null
        fi
    else
        cd $HOME
    fi
}

I suppose that one could just type ‘pushd’ or ‘popd’ or alias those to shorter commands, but my muscle memory has simply chiseled cd (and ls for that matter) into stone.

Phantomjs & Quality Engineering

I’m not about to dismiss the need for selenium or other tools to test specific browsers, but when it comes to quickly getting an indication of how long it takes for a site to render and whether there are any errors, phantomjs is hard to beat. The script below opens a webpage in a specific viewport size, prints the title, reports the timing and captures the result as a png.

This is critical for testing responsive design when the website will render differently at different screen sizes (break points). Having something like this configured and scheduled to run early for all of a project’s templates means that it is easy to track and identify when changes in latency or failures occur. As always transparency becomes key and having this executed automatically as part of the build process creates a small closed feedback loop that helps ensure quick turn around times in the event of a problem.


var page = require('webpage').create(),
system = require('system'),
t, address;
swidth = '1366';
sheight = '768';
if (system.args.length === 1) {
  console.log('Usage: test.js optional( <screen width> <screen height> )');
phantom.exit();
}
t = Date.now();
address = system.args[1];
if (system.args[2]) {
  swidth = system.args[2];
}
if (system.args[3]) {
  sheight = system.args[3];
}
page.onConsoleMessage = function (msg) {
  console.log('Page title is ' + msg);
};
page.onInitialized = function () {
  page.evaluate(function (swidth, sheight) {
    (function () {
      window.screen = {
        width: swidth,
        height: sheight
      };
    })();
  }, swidth, sheight);
};
page.open(address, function (status) {
  if (status !== 'success') {
    console.log('FAIL to load the address');
  } else {
    t = Date.now() - t;
    console.log('The default user agent is ' + page.settings.userAgent);
    console.log('Loading time ' + t + ' msec');
    console.log(JSON.stringify(page.evaluate(function () { return window.screen })));
    page.render('images/test.png');
    page.evaluate(function () {
      console.log(document.title);
    });
  }
  phantom.exit();
});

Taken a step further you can then calculate the average render time for an entire site backed by a CMS by tracking outbound links from a given entry point (the homepage).

Pulling the links off the page looks like this:

function getLinks() {
  var links = document.querySelectorAll('li a');
  return Array.prototype.map.call(links, function(aLink) {
    return aLink.getAttribute('href');
  });
}

Using this method also creates a report of all the pages for a website weighted by outbound links. For high content sites that are backed by a CMS this means that I now have a current list of pages that is representative of inbound traffic to the website. This ends up being a critical piece of performance testing (in addition to any transactional elements).

Jmeter JTL Reporting

When I first migrated a previous employer off Loadrunner and onto Jmeter several years ago 2 things happened quickly. In a big WIN! We jumped from 30->500 test cases (in <1 year) since we were able to provide unlicensed access to the system. In a big FAIL! we no longer had Loadrunner’s reports to use at meetings and show users after the fact. We quickly automated reporting that went through a number of iterations.

Now that I am in my new role I needed to rebuild a similar framework and reporting became even more important since it was for external customers rather than internal.

I realized others may need a single report from time to time and that the reporting can be a big hurdle in migrating technologies. The end result is JTL Report Creator.

This will take an JTL(XML output) process it into an aggregate report table with zoomable graphs using jqPlot for Latencies, Count, and Percent Failed. Since I am using my godaddy e-mail as a relay I am limited to 250 reports per day. This means all JTLs will queue and be processed every 5 minutes and then a link will be sent to your e-mail. Here is a simple example of the output.

This script requires that you save in the xml format AND that you have the following attributes
“save success”, s, Success flag (true/false)
“save elapsed time”, t, Elapsed time (milliseconds)
“save label”, lb, Label
“save timestamp”, ts, timeStamp (milliseconds since midnight Jan 1, 1970 UTC)

Only latencies of successful results are reported and averaged.
Please submit your file as a .jtl

Mail Relays using python

I recently needed to send e-mail from my home computer in a python script. Unfortunately I use Comcast and they have decided that anyone that does this is a spammer so they have blocked all outbound communication over port 25. Big thanks to Lars for pointing me in the right direction!

In order to get around this both gmail and godaddy provide relay services.

Here is the solution:
Create the message

sender = 'ghoti@cultureofqualityengineering.com' #you will need to use your email of course
password = 'MyPassword'
receiver = #submitted email address

message = """From: Geoff <""" + sender + """>
To: To Person <""" + email + """>
Subject: Kapow!
Hey thanks for using this service!
and other interesting text
"""

Send the message

try:
  #session = smtplib.SMTP('smtp.gmail.com',587) #for gmail uncomment this
  #session = smtplib.SMTP_SSL('smtpout.secureserver.net',465) #for godaddy uncomment this line
  session.ehlo()
  #session.starttls() #uncomment for gmail
  #session.ehlo() #uncomment for gmail
  session.login(sender, password)
  session.sendmail(sender,receiver,message)
  session.quit()
except smtplib.SMTPException:
  print "Error: unable to send email"

Advanced SSH Tunneling with the ProxyCommand Directive

So if you’re anything like me, you have at least 2 or 3 Linux or OS X machines at home with SSH enabled. I frequently find that I need to access some of the data on those machines from remote locations, and scp doesn’t work too well through a NAT / Masquerading router. You can just poke holes through your firewall until it looks like Swiss cheese, or you can Get Serious(tm), setup your own DHCP and Gateway server, then use the ProxyCommand directive in your .ssh/config.

First, pick a gateway machine, one that will be be left online at all times, and install the ISC DHCP server (not the client) using your favorite package manager or tarball.

Next, edit the dhcpd.conf file (usually in /etc somewhere) and configure the subnet section to match the network of your NAT Router. For example, my NAT router is 192.168.2.1, and gives out 192.168.2.0/24 addresses.  Here’s what it would look like:

subnet 192.168.2.0 netmask 255.255.255.0 {
  range 192.168.2.10 192.168.2.254;
  option routers 192.168.2.1;
}

In the same file, add some sane DNS servers in the options (I’m using the public Google DNS servers in this example):

option domain-name-servers 8.8.8.8, 8.8.4.4;

The last change to this file  will involve getting the MAC addresses of the devices you’ll want to SSH into and giving them static IPs OUTSIDE the range you provided above. Mine are all the single digit addresses between 3-9:

host bob  {
  hardware ethernet 11:11:11:11:11:11;
  fixed-address 192.168.2.3;
}
host joe {
  hardware ethernet 22:22:22:22:22:22;
  fixed-address 192.168.2.4;
}
host jim {
  hardware ethernet aa:aa:aa:aa:aa:aa;
  fixed-address 192.168.2.5;
}
host jim-wifi {
  hardware ethernet aa:aa:aa:aa:aa:ab;
  fixed-address 192.168.2.6;
}

Notice I’ve given the wifi and wired adapters their own entries.

Now,  setup a static IP on your DHCP server (192.168.2.2), shut off DHCP on your router and test it out. After renewing DHCP on your devices, you should see the fixed addresses showing up on the devices you configured, and higher numbered 192.168.2.10-255 on unconfigured devices (phones, tablets, your Windows laptop, etc.)  Here’s a map of the configuration so far:

192.168.2.1      - Router / Default Gateway 
192.168.2.2      - SSH Gateway & DHCP server
192.168.2.3-9    - Fixed-address hosts
192.168.2.10-254 - Dynamic hosts
192.168.2.255    - Broadcast address

On the DHCP/Gateway server, setup a simple /etc/hosts file with these IP addresses, host names and a common suffix, such as ‘.home’.

192.168.2.3 bob.home
192.168.2.4 joe.home
192.168.2.5 jim.home
192.168.2.6 jim-wifi.home

Lastly, create or edit a .ssh/config file on the remote device / laptop to handle these names:

Host *.home
  user mysuername
  ProxyCommand ssh -v MY_EXTERNAL_IP -p 22 nc %h %p

Now you should be able to ‘ssh bob.home’ without any more typing, and scp to/from bob.home without using the gateway server as an intermediate step, or poking a hole in your firewall.

I’d recommend setting up public key authentication, and disabling password authentication.  You can also set it up to listen on a high-number port (remember to change the -p option in the ProxyCommand) but that’s only security through obscurity.

If your IP changes frequently, you can also look into Dynamic DNS services to give yourself a hostname rather than use your external IP address directly. DynDNS is no longer free, but there are other services out there if you want to look.

JMXTrans Setup (Monitoring a proprietary Java Application)

I recently had the need to improve monitoring of a proprietary java application.  After poking around at the options available to me I found that there was extensive monitoring available in the administration console, but that it was a challenge to scrape, parse, or export.  Luckily the internal monitoring all seemed to be provided by mbeans which I quickly verified by attaching to the app with jconsole.

Its at this point I set about writing my own mbean poller and while working on some challenges I was referred to JMXTrans by these guys (thanks to Lars).  While I am eager to try their javaagent solution as well I initially set up the poller:

This entry was almost pointless as the installation docs are great!!!!

Making sure that mbeans are exposed is easy. For a NON-production server that is BEHIND a firewall add the following parameters to start up

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only="false"
-Dcom.sun.management.jmxremote.port=
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname="the public hostname"

I still need to set up jmx authentication, so more on that soon…

RPM install the latest stable version from: https://github.com/jmxtrans/jmxtrans/downloads

$ sudo rpm -Uhv https://github.com/downloads/jmxtrans/jmxtrans/jmxtrans-20121016.145842.6a28c97fbb-0.noarch.rpm


$ cd /var/lib/jmxtrans/
vim server_to_poll.json

And then populate it, this is an example for writing to graphite. It is really helpful to connect via jconsole and browse the mbeans there so you can get the “obj” and “attr” information cleanly


{
    "servers" : [ {
        "port" : "",
        "host" : "",
        "alias" : "",
        "queries" : [ {
            "obj" : "java.lang:type=Memory",
            "resultAlias": "Memory",
            "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage", "ObjectPendingFinalizationCount" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=Threading",
            "resultAlias": "Threads",
            "attr" : [ "DaemonThreadCount", "PeakThreadCount", "CurrentThreadCpuTime", "CurrentTheeadUserTime", "ThreadCount", "TotalStartedThreadCount" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=GarbageCollector,name=Copy",
            "resultAlias": "GCCopy",
            "attr" : [ "CollectionCount", "CollectionTime" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=GarbageCollector,name=MarkSweepCompact",
            "resultAlias": "GCCMS",
            "attr" : [ "CollectionCount", "CollectionTime" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphiteport}"
                }
            } ]
        } ]
    } ]
}

${mygraphiteport} and ${mygraphiteport} are not variables for substitution. Instead these are specified in

$ sudo vim /etc/sysconfig/jmxtrans
EDIT: export JMXTRANS_OPTS="-Dmygraphiteport= -Dmygraphitehost="

Setting up Graphite (alpha docs)

The latest releases of graphite have some nice new shiny features thanks to these guys.  Once I get some feedback I will edit the below to make it suitable for the official documentation.  In the meantime:

Installing the latest version of graphite (alpha documention)

Prep Work

If you do not have the below yum repos:

$ sudo yum install -y python python-devel python-setuptools gcc python-devel httpd mysql-server MySQL-python python-zope-interface python-twisted python-rrdtool python-memcached wget git pycairo python-ldap mod_wsgi dejavu-sans-fonts

If you do not have pip:

$ sudo /usr/bin/easy_install-2.6 pip

Install Django (note 1.5 does not work OOB and pip will not install previous versions)

$ sudo /usr/bin/easy_install-2.6 Django==1.4

If you do not have the below packages:

$ sudo /usr/bin/pip-2.6 install pytz django-tagging txamqp Twisted

pyparsing never installs right from pip for me (transparent fail), do it the hard way

http://sourceforge.net/projects/pyparsing/files/latest/download
unzip
$ sudo python setup.py install

If you are using the alpha release (YES) you need ceres:

wget https://github.com/graphite-project/ceres/tarball/master -O ceres-latest.tar.gz
tar -xzf
$ sudo python setup.py install

if py2cairo doesn’t install from yum, get it the hard way if yum didn’t install the right version (good as of 6.4)

wget http://cairographics.org/releases/py2cairo-1.10.0.tar.bz2
tar -xjf
$ sudo python setup.py install

If you run into issues with Twisted:

Download Twisted from
http://twistedmatrix.com/trac/
tar -xjf
$ sudo python setup.py install

Installing Graphite:
Clone the repos into the right place

$ cd /opt/
$ sudo git clone https://github.com/graphite-project/graphite-web.git
$ sudo git clone https://github.com/graphite-project/carbon.git
$ sudo git clone https://github.com/graphite-project/whisper.git

verify that you got all of your dependencies:

$ pushd graphite-web;sudo python check-dependencies.py;popd
All necessary dependencies are met.
All optional dependencies are met.

Make sure you get above.

install the services:

$ pushd whisper; sudo python setup.py install; popd
$ pushd carbon; sudo python setup.py install; popd
$ pushd graphite-web; sudo python setup.py install;popd

copy the example configs:

$ pushd /opt/graphite/conf; sudo cp carbon.conf.example carbon.conf; sudo cp storage-schemas.conf.example storage-schemas.conf; popd
$ sudo cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite-vhost.conf
$ sudo cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi

If testing this locally you can now add
graphite to your /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 graphite
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 graphite

if you are doing an actual setup it is expected you will know how to configure

sqllite never works for me, here are the mysql instructions:

$ sudo /etc/init.d/mysqld start
$ sudo /usr/bin/mysqladmin -u root password 'NewPassword'
$ mysql -u root -pNewPassword
CREATE USER 'graphite'@'localhost' IDENTIFIED BY 'AnotherNewPassword';
GRANT ALL PRIVILEGES ON *.* TO graphite@'%' IDENTIFIED BY 'AnotherNewPassword';
GRANT ALL PRIVILEGES ON *.* TO graphite@'localhost' IDENTIFIED BY 'AnotherNewPassword';
FLUSH PRIVILEGES;
exit;
$ mysql -u graphite -pAnotherNewPassword
create database graphiteDB;
exit;

Copy the example settings and configure

$ sudo cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py
$ sudo vim /opt/graphite/webapp/graphite/local_settings.py

edit the appropriate section to

DATABASES = {
    'default': {
        'NAME': 'graphiteDB',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'graphite',
        'PASSWORD': 'GraphitePassword',
        'HOST': 'localhost',
        'PORT': '3306'
    }
}

sync the DB

$ cd /opt/graphite/webapp/graphite
$ sudo python manage.py syncdb

yes to django auth and set accordingly.

start the carbon cache

$ sudo /opt/graphite/bin/carbon-cache.py start

unless someone wants to update this to support selinux config

$ sudo setenforce 0
$ sudo vim /etc/sysconfig/selinux
EDIT to: SELINUX=permissive

change permissions and start the webserver

$ sudo chown -R apache /opt/graphite/
$ sudo /etc/init.d/httpd start

How to submit adhoc Graphite data

Python

import socket
host = ‘graphite.server.com’
port = 2003
sock = socket.socket()
sock.connect((host,port))
sock.send(data)
sock.close()
Perl
use IO::Socket::INET;
my $sock = new IO::Socket::INET( PeerAddr => ‘graphite.server.com’, PeerPort => ‘2003’, Proto => ‘tcp’);
$sock or die “no socket: $!”;
$sock->send($data);
close($sock);
Where data is
metric.name.object value epochtimestamp (newline)
metric.name.object2 value epochtimestamp (newline)
and that 2 values sent within the same precision will overwrite with the most recent one.
I.E. If your precision is 1 minute and you send
thing.one.count 5 1354216764
And 10 sec later
thing.one.count 2 1354216774
Graphite will report 2 for 1:19 on the day I wrote this (as opposed to 7).
Remember to reach out to your friendly graphite admin before sending any data

AppDynamics and Capistrano Tricks

We stood up a new performance environment last week and took some lessons learned and assistance that allowed us to get this up and running in record time.  We did have a series of one off tasks (that will get added into the kick off process) but that we needed to solve now… Capistrano to the rescue!

Except of course we had the original definition of the environment, plus all the changes and add-ons were in Jira somewhere and we set of to scrape together the complete server list until we remembered that all of our hosts were instrumented with AppDynamics (and incidentally GLU or even Graphite can be used for this if they have app/server info)

# Capfile using AppDynamics
require ‘rubygems’
require ‘json’
require ‘open-uri’

default_run_options[:pty] = true
set :sshkey, “id_rsa”
ssh_options[:keys] = [File.join(ENV[“HOME”], “.ssh”, “#{sshkey}”)]

doc = open(‘http://appdynamics.server.com/controller/rest/applications/new_perf_env/nodes?&output=JSON‘, :http_basic_authentication=>[‘User@Customer1′,’Password’]).read
parsed = JSON.parse(doc)
mname = Hash.new
parsed.each do |hash|
if hash.has_key? ‘machineName’
mname[hash[‘machineName’]] = 1
end
end
mname.each do |name,value|
server “#{name}”, :server
end

desc “get hostname as root for no reason whatsoever”
task :hostname, :max_hosts => 100,:on_error => :continue do
run “#{sudo} hostname”
end