2 suggestions and 1 recommendation for implementing DevOps

Before getting into the suggestions for how to implement DevOps it is necessary to answer the age old question, what is DevOps purpose?

“In an ideal state DevOps allows business features to be delivered quickly and continuously while maintaining 100% uptime.”

It always comes down to the bu$ine$$. All of us know that downtime costs real money and delaying features costs potential money. Many companies have even prioritized one or the other at different points in their maturity cycle. The goal of DevOps is to minimize both costs.

While automation is often a focus of DevOps, and I believe that a lazy admin is a good admin, I said nothing about automation in my definition. It wasn’t an oversight, automation is a means, not an end to DevOps goals. I also said nothing about developers or operations in this description since the roles are largely irrelevant if that goal is met. Here are 3 suggestions for bringing your organization into the light of DevOps.

Suggestion 1:
Lock your developers and operations teams into the same room and buy them enough beer to keep them happy with each other.

Ok, seems silly at first, but this is how some larger companies have done it. Each development team has 1 or 2 SRE members join and each literal DevOps team is focused on a particular business deliverable. This moves operations from a shared service to a team deliverable and gives all members a sense of ownership of the code as well as how it functions in production. From everything I have heard it has been a wildly successful brute force approach at several companies. Larger companies that have taken this approach seem to have a leg up on most of us since they can attract a level of talent most companies cannot and that they have the resources to expand each development team by an additional 1-2 people.

Suggestion 2:
Have your CEO look into the mirror and say “DevOps” three times.

While the ghost of DevOps won’t appear to slay your evil over-the-wall methodologies, getting top down support will put people on the path to figuring it out. A leader’s vision dedicated to moving in this direction can produce a successful DevOps strategy. I would be remiss to point out its flaws if done in isolation though. With a description as vague as my own, or any other DevOps description, it can be difficult to break down and measure progress on the path to DevOps. At larger organizations it can be an even greater challenge if some groups are happy with the status quo.

Suggestion 3:
What’s measured is managed!

There are 4 core DevOps metrics:

  • Mean Time To Detection (MTTD)
  • Mean Time To Resolution (MTTR)
  • Mean Time Between Incidents (MTBI)
  • Mean Time Between Releases (MTBR).

By measuring these for your team’s deliverables several conclusions can usually be drawn that promote the path to DevOps. For instance if your MTBI is high as a result of impacts during changes then you probably need to prioritize improvements to your release process or architecture to remove that downtime. Maintenance windows be damned, without 0 downtime deployments you will never be able to meet the goals of DevOps. If you aren’t able to decrease your MTBR then automation of deployments and testing will give a big boost. MTTR will be decreased when problems occur by providing accurate alerting and having well defined process, ownership and escalation. Keeping MTTD low ensures that monitoring is added as part of new features rather than after the fact.

Measuring the Mean Time application metrics will also identify hot spots in your application stack. Using these 4 metrics you can effectively score your applications and the risk they pose, not based on the potential impacts, but rather on the experience of your organization to deliver them.  This will remove barriers for applications that have a history of being successful in prod and ensure the right oversight is available when an application needs it.

When release processes need to be improved, code and infrastructure needs to be re-architected, monitoring needs to be refined and success is visibly measured it almost always leads to development and operations working together and DevOps becomes a byproduct of the business goals.

Jmeter JTL Reporting

When I first migrated a previous employer off Loadrunner and onto Jmeter several years ago 2 things happened quickly. In a big WIN! We jumped from 30->500 test cases (in <1 year) since we were able to provide unlicensed access to the system. In a big FAIL! we no longer had Loadrunner’s reports to use at meetings and show users after the fact. We quickly automated reporting that went through a number of iterations.

Now that I am in my new role I needed to rebuild a similar framework and reporting became even more important since it was for external customers rather than internal.

I realized others may need a single report from time to time and that the reporting can be a big hurdle in migrating technologies. The end result is JTL Report Creator.

This will take an JTL(XML output) process it into an aggregate report table with zoomable graphs using jqPlot for Latencies, Count, and Percent Failed. Since I am using my godaddy e-mail as a relay I am limited to 250 reports per day. This means all JTLs will queue and be processed every 5 minutes and then a link will be sent to your e-mail. Here is a simple example of the output.

This script requires that you save in the xml format AND that you have the following attributes
“save success”, s, Success flag (true/false)
“save elapsed time”, t, Elapsed time (milliseconds)
“save label”, lb, Label
“save timestamp”, ts, timeStamp (milliseconds since midnight Jan 1, 1970 UTC)

Only latencies of successful results are reported and averaged.
Please submit your file as a .jtl

Mail Relays using python

I recently needed to send e-mail from my home computer in a python script. Unfortunately I use Comcast and they have decided that anyone that does this is a spammer so they have blocked all outbound communication over port 25. Big thanks to Lars for pointing me in the right direction!

In order to get around this both gmail and godaddy provide relay services.

Here is the solution:
Create the message

sender = 'ghoti@cultureofqualityengineering.com' #you will need to use your email of course
password = 'MyPassword'
receiver = #submitted email address

message = """From: Geoff <""" + sender + """>
To: To Person <""" + email + """>
Subject: Kapow!
Hey thanks for using this service!
and other interesting text
"""

Send the message

try:
  #session = smtplib.SMTP('smtp.gmail.com',587) #for gmail uncomment this
  #session = smtplib.SMTP_SSL('smtpout.secureserver.net',465) #for godaddy uncomment this line
  session.ehlo()
  #session.starttls() #uncomment for gmail
  #session.ehlo() #uncomment for gmail
  session.login(sender, password)
  session.sendmail(sender,receiver,message)
  session.quit()
except smtplib.SMTPException:
  print "Error: unable to send email"

Advanced SSH Tunneling with the ProxyCommand Directive

So if you’re anything like me, you have at least 2 or 3 Linux or OS X machines at home with SSH enabled. I frequently find that I need to access some of the data on those machines from remote locations, and scp doesn’t work too well through a NAT / Masquerading router. You can just poke holes through your firewall until it looks like Swiss cheese, or you can Get Serious(tm), setup your own DHCP and Gateway server, then use the ProxyCommand directive in your .ssh/config.

First, pick a gateway machine, one that will be be left online at all times, and install the ISC DHCP server (not the client) using your favorite package manager or tarball.

Next, edit the dhcpd.conf file (usually in /etc somewhere) and configure the subnet section to match the network of your NAT Router. For example, my NAT router is 192.168.2.1, and gives out 192.168.2.0/24 addresses.  Here’s what it would look like:

subnet 192.168.2.0 netmask 255.255.255.0 {
  range 192.168.2.10 192.168.2.254;
  option routers 192.168.2.1;
}

In the same file, add some sane DNS servers in the options (I’m using the public Google DNS servers in this example):

option domain-name-servers 8.8.8.8, 8.8.4.4;

The last change to this file  will involve getting the MAC addresses of the devices you’ll want to SSH into and giving them static IPs OUTSIDE the range you provided above. Mine are all the single digit addresses between 3-9:

host bob  {
  hardware ethernet 11:11:11:11:11:11;
  fixed-address 192.168.2.3;
}
host joe {
  hardware ethernet 22:22:22:22:22:22;
  fixed-address 192.168.2.4;
}
host jim {
  hardware ethernet aa:aa:aa:aa:aa:aa;
  fixed-address 192.168.2.5;
}
host jim-wifi {
  hardware ethernet aa:aa:aa:aa:aa:ab;
  fixed-address 192.168.2.6;
}

Notice I’ve given the wifi and wired adapters their own entries.

Now,  setup a static IP on your DHCP server (192.168.2.2), shut off DHCP on your router and test it out. After renewing DHCP on your devices, you should see the fixed addresses showing up on the devices you configured, and higher numbered 192.168.2.10-255 on unconfigured devices (phones, tablets, your Windows laptop, etc.)  Here’s a map of the configuration so far:

192.168.2.1      - Router / Default Gateway 
192.168.2.2      - SSH Gateway & DHCP server
192.168.2.3-9    - Fixed-address hosts
192.168.2.10-254 - Dynamic hosts
192.168.2.255    - Broadcast address

On the DHCP/Gateway server, setup a simple /etc/hosts file with these IP addresses, host names and a common suffix, such as ‘.home’.

192.168.2.3 bob.home
192.168.2.4 joe.home
192.168.2.5 jim.home
192.168.2.6 jim-wifi.home

Lastly, create or edit a .ssh/config file on the remote device / laptop to handle these names:

Host *.home
  user mysuername
  ProxyCommand ssh -v MY_EXTERNAL_IP -p 22 nc %h %p

Now you should be able to ‘ssh bob.home’ without any more typing, and scp to/from bob.home without using the gateway server as an intermediate step, or poking a hole in your firewall.

I’d recommend setting up public key authentication, and disabling password authentication.  You can also set it up to listen on a high-number port (remember to change the -p option in the ProxyCommand) but that’s only security through obscurity.

If your IP changes frequently, you can also look into Dynamic DNS services to give yourself a hostname rather than use your external IP address directly. DynDNS is no longer free, but there are other services out there if you want to look.

Get a bigger hammer!

Get a bigger hammer!

So after getting back into StarCraft and playing some Heart of the Swarm I had a little problem with my CPU overheating. While the water cooler(Corsair H60, working great) was on order I “solved” the problem by getting a bigger fan. That is my dog Boo wondering why I was moving her sleeping quarters around.

JMXTrans Setup (Monitoring a proprietary Java Application)

I recently had the need to improve monitoring of a proprietary java application.  After poking around at the options available to me I found that there was extensive monitoring available in the administration console, but that it was a challenge to scrape, parse, or export.  Luckily the internal monitoring all seemed to be provided by mbeans which I quickly verified by attaching to the app with jconsole.

Its at this point I set about writing my own mbean poller and while working on some challenges I was referred to JMXTrans by these guys (thanks to Lars).  While I am eager to try their javaagent solution as well I initially set up the poller:

This entry was almost pointless as the installation docs are great!!!!

Making sure that mbeans are exposed is easy. For a NON-production server that is BEHIND a firewall add the following parameters to start up

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only="false"
-Dcom.sun.management.jmxremote.port=
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname="the public hostname"

I still need to set up jmx authentication, so more on that soon…

RPM install the latest stable version from: https://github.com/jmxtrans/jmxtrans/downloads

$ sudo rpm -Uhv https://github.com/downloads/jmxtrans/jmxtrans/jmxtrans-20121016.145842.6a28c97fbb-0.noarch.rpm


$ cd /var/lib/jmxtrans/
vim server_to_poll.json

And then populate it, this is an example for writing to graphite. It is really helpful to connect via jconsole and browse the mbeans there so you can get the “obj” and “attr” information cleanly


{
    "servers" : [ {
        "port" : "",
        "host" : "",
        "alias" : "",
        "queries" : [ {
            "obj" : "java.lang:type=Memory",
            "resultAlias": "Memory",
            "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage", "ObjectPendingFinalizationCount" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=Threading",
            "resultAlias": "Threads",
            "attr" : [ "DaemonThreadCount", "PeakThreadCount", "CurrentThreadCpuTime", "CurrentTheeadUserTime", "ThreadCount", "TotalStartedThreadCount" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=GarbageCollector,name=Copy",
            "resultAlias": "GCCopy",
            "attr" : [ "CollectionCount", "CollectionTime" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphitehost}"
                }
            } ]
        } , {
            "obj" : "java.lang:type=GarbageCollector,name=MarkSweepCompact",
            "resultAlias": "GCCMS",
            "attr" : [ "CollectionCount", "CollectionTime" ],
            "outputWriters" : [ {
                "@class" : "com.googlecode.jmxtrans.model.output.GraphiteWriter",
                "settings" : {
                    "port" : "${mygraphiteport}",
                    "host" : "${mygraphiteport}"
                }
            } ]
        } ]
    } ]
}

${mygraphiteport} and ${mygraphiteport} are not variables for substitution. Instead these are specified in

$ sudo vim /etc/sysconfig/jmxtrans
EDIT: export JMXTRANS_OPTS="-Dmygraphiteport= -Dmygraphitehost="

Setting up Graphite (alpha docs)

The latest releases of graphite have some nice new shiny features thanks to these guys.  Once I get some feedback I will edit the below to make it suitable for the official documentation.  In the meantime:

Installing the latest version of graphite (alpha documention)

Prep Work

If you do not have the below yum repos:

$ sudo yum install -y python python-devel python-setuptools gcc python-devel httpd mysql-server MySQL-python python-zope-interface python-twisted python-rrdtool python-memcached wget git pycairo python-ldap mod_wsgi dejavu-sans-fonts

If you do not have pip:

$ sudo /usr/bin/easy_install-2.6 pip

Install Django (note 1.5 does not work OOB and pip will not install previous versions)

$ sudo /usr/bin/easy_install-2.6 Django==1.4

If you do not have the below packages:

$ sudo /usr/bin/pip-2.6 install pytz django-tagging txamqp Twisted

pyparsing never installs right from pip for me (transparent fail), do it the hard way

http://sourceforge.net/projects/pyparsing/files/latest/download
unzip
$ sudo python setup.py install

If you are using the alpha release (YES) you need ceres:

wget https://github.com/graphite-project/ceres/tarball/master -O ceres-latest.tar.gz
tar -xzf
$ sudo python setup.py install

if py2cairo doesn’t install from yum, get it the hard way if yum didn’t install the right version (good as of 6.4)

wget http://cairographics.org/releases/py2cairo-1.10.0.tar.bz2
tar -xjf
$ sudo python setup.py install

If you run into issues with Twisted:

Download Twisted from
http://twistedmatrix.com/trac/
tar -xjf
$ sudo python setup.py install

Installing Graphite:
Clone the repos into the right place

$ cd /opt/
$ sudo git clone https://github.com/graphite-project/graphite-web.git
$ sudo git clone https://github.com/graphite-project/carbon.git
$ sudo git clone https://github.com/graphite-project/whisper.git

verify that you got all of your dependencies:

$ pushd graphite-web;sudo python check-dependencies.py;popd
All necessary dependencies are met.
All optional dependencies are met.

Make sure you get above.

install the services:

$ pushd whisper; sudo python setup.py install; popd
$ pushd carbon; sudo python setup.py install; popd
$ pushd graphite-web; sudo python setup.py install;popd

copy the example configs:

$ pushd /opt/graphite/conf; sudo cp carbon.conf.example carbon.conf; sudo cp storage-schemas.conf.example storage-schemas.conf; popd
$ sudo cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite-vhost.conf
$ sudo cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi

If testing this locally you can now add
graphite to your /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 graphite
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 graphite

if you are doing an actual setup it is expected you will know how to configure

sqllite never works for me, here are the mysql instructions:

$ sudo /etc/init.d/mysqld start
$ sudo /usr/bin/mysqladmin -u root password 'NewPassword'
$ mysql -u root -pNewPassword
CREATE USER 'graphite'@'localhost' IDENTIFIED BY 'AnotherNewPassword';
GRANT ALL PRIVILEGES ON *.* TO graphite@'%' IDENTIFIED BY 'AnotherNewPassword';
GRANT ALL PRIVILEGES ON *.* TO graphite@'localhost' IDENTIFIED BY 'AnotherNewPassword';
FLUSH PRIVILEGES;
exit;
$ mysql -u graphite -pAnotherNewPassword
create database graphiteDB;
exit;

Copy the example settings and configure

$ sudo cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py
$ sudo vim /opt/graphite/webapp/graphite/local_settings.py

edit the appropriate section to

DATABASES = {
    'default': {
        'NAME': 'graphiteDB',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'graphite',
        'PASSWORD': 'GraphitePassword',
        'HOST': 'localhost',
        'PORT': '3306'
    }
}

sync the DB

$ cd /opt/graphite/webapp/graphite
$ sudo python manage.py syncdb

yes to django auth and set accordingly.

start the carbon cache

$ sudo /opt/graphite/bin/carbon-cache.py start

unless someone wants to update this to support selinux config

$ sudo setenforce 0
$ sudo vim /etc/sysconfig/selinux
EDIT to: SELINUX=permissive

change permissions and start the webserver

$ sudo chown -R apache /opt/graphite/
$ sudo /etc/init.d/httpd start

How to submit adhoc Graphite data

Python

import socket
host = ‘graphite.server.com’
port = 2003
sock = socket.socket()
sock.connect((host,port))
sock.send(data)
sock.close()
Perl
use IO::Socket::INET;
my $sock = new IO::Socket::INET( PeerAddr => ‘graphite.server.com’, PeerPort => ‘2003’, Proto => ‘tcp’);
$sock or die “no socket: $!”;
$sock->send($data);
close($sock);
Where data is
metric.name.object value epochtimestamp (newline)
metric.name.object2 value epochtimestamp (newline)
and that 2 values sent within the same precision will overwrite with the most recent one.
I.E. If your precision is 1 minute and you send
thing.one.count 5 1354216764
And 10 sec later
thing.one.count 2 1354216774
Graphite will report 2 for 1:19 on the day I wrote this (as opposed to 7).
Remember to reach out to your friendly graphite admin before sending any data

Fish Tank

I picked out a nice fishtank for my birthday present with my wife’s blessing.  Here are the pics all set up!

fistank_miles fishtank_light

 

Miles is 3ft tall now to give you an idea of size.

AppDynamics and Capistrano Tricks

We stood up a new performance environment last week and took some lessons learned and assistance that allowed us to get this up and running in record time.  We did have a series of one off tasks (that will get added into the kick off process) but that we needed to solve now… Capistrano to the rescue!

Except of course we had the original definition of the environment, plus all the changes and add-ons were in Jira somewhere and we set of to scrape together the complete server list until we remembered that all of our hosts were instrumented with AppDynamics (and incidentally GLU or even Graphite can be used for this if they have app/server info)

# Capfile using AppDynamics
require ‘rubygems’
require ‘json’
require ‘open-uri’

default_run_options[:pty] = true
set :sshkey, “id_rsa”
ssh_options[:keys] = [File.join(ENV[“HOME”], “.ssh”, “#{sshkey}”)]

doc = open(‘http://appdynamics.server.com/controller/rest/applications/new_perf_env/nodes?&output=JSON‘, :http_basic_authentication=>[‘User@Customer1′,’Password’]).read
parsed = JSON.parse(doc)
mname = Hash.new
parsed.each do |hash|
if hash.has_key? ‘machineName’
mname[hash[‘machineName’]] = 1
end
end
mname.each do |name,value|
server “#{name}”, :server
end

desc “get hostname as root for no reason whatsoever”
task :hostname, :max_hosts => 100,:on_error => :continue do
run “#{sudo} hostname”
end