Using nagios4 as a babysitter for your envionment

Out of many different monitoring solutions, nagios is one of the most used solutions. The following article shall describe in brief, how to set up some useful monitoring tasks:

Defining custom checks

in /etc/nagios4/commands.cfg, you can define a custom command. Let’s do this with a http check, which evaluates the result and checks in this result for a regex:

define command {
 command_name check_http_host_regex
 command_line $USER1$/check_http -H $HOSTADDRESS$ -p $ARG1$ -u $ARG2$ -r $ARG3$ -t30
}

This check is to be called from your /etc/nagios4/$cfg_dir/your_host.cfg:

define service {
 use generic-service
 host_name your.host.name
 service_description nginx_http_result
 is_volatile 0
 notification_options c,r
 check_command check_http_host_regex!80!/foo.css!your_regex
}

After that, you can run nagios4 -v /etc/nagios4/nagios.cfg to verify, that your configuration is valid.

Machine monitoring with OpenTSDB

(Partially updated in July 2016)

Inspired by a nice presentation at http://de.slideshare.net/oliverhankeln/opentsdb-metrics-for-a-distributed-world, I wanted to set up an OpenTSDB environment on my machine to replace the old munin monitoring, I’m still using and fighting with.

The following guide shall describe the steps for setting up OpenTSDB monitoring on an Ubuntu machine.

A word on disk space

According to the SlideShare presentation, referenced above, any data point consumes less than 3 Bytes on the disk if compressed and less than 40 Bytes, if uncompressed.

With that number in mind, you shall be able to give an estimation, how much data you will gather the next year(s).

Installing HBase

Follow https://hbase.apache.org/book/quickstart.html, download the latest binary (at the time of writing: hbase-1.2.2-bin.tar.gz ) and install it e.g. in /opt:

clorenz@machine:~/Downloads $ cd /opt
clorenz@machine:/opt $ tar -xzvf ~/Downloads/hbase-1.2.2-bin.tar.gz
clorenz@machine:/opt $ ln -s hbase-1.2.2 hbase

Next, edit conf/hbase-site.xml:

<configuration>
 <property>
  <name>hbase.zookeeper.quorum</name>
  <value>127.0.0.1</value>
 </property>
 <property>
  <name>hbase.rootdir</name>
  <value>file:///opt/hbase</value>
 </property>
 <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/opt/zookeeper</value>
 </property>
</configuration>

If you already have a running zookeeper instance, you must instruct OpenTSDB not to start its own zookeeper. For that, please add the following configuration to conf/hbase-site.xml:

<property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
</property>

And in conf/hbase-env.sh set the following line:

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

Now, in any case, regardless of zookeeper, continue and edit conf/hbase-env.sh:

...
export JAVA_HOME=/opt/java8
...

Be careful to ensure, that your local hostname is resolved properly, the best is:

clorenz@machine:/opt/hbase $ ping machine
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.050 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.042 ms
^C

Finally, start hbase by running

clorenz@machine:/opt/hbase $ ./bin/start-hbase.sh

With ps -ef | grep -i hbase you can ensure, that your hbase instance is running properly:

clorenz@machine:/opt/hbase/logs $ ps -ef | grep -i hbase
root 31701 2795 0 14:37 pts/2 00:00:00 bash /opt/hbase-0.98.9-hadoop2/bin/hbase-daemon.sh --config /opt/hbase-0.98.9-hadoop2/bin/../conf internal_start master
root 31715 31701 44 14:37 pts/2 00:00:07 /opt/java7/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/opt/hbase-0.98.9-hadoop2/bin/../logs -Dhbase.log.file=hbase-root-master-ls023.log -Dhbase.home.dir=/opt/hbase-0.98.9-hadoop2/bin/.. -Dhbase.id.str=root -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start

Congratulations: You’ve finished your first step. Let’s take the next one:

Installing OpenTSDB

At first, download the latest source code from github:

clorenz@machine:/opt/git $ git clone git://github.com/OpenTSDB/opentsdb.git
Klone nach 'opentsdb'...
remote: Counting objects: 5518, done.
remote: Total 5518 (delta 0), reused 0 (delta 0)
Empfange Objekte: 100% (5518/5518), 27.09 MiB | 6.39 MiB/s, done.
Löse Unterschiede auf: 100% (3704/3704), done.
Prüfe Konnektivität... Fertig.

Next, build a debian package:

clorenz@machine:/opt/git/opentsdb (master)$ sh build.sh debian

If you encounter an error (e.g. like ./bootstrap: 17: exec: autoreconf: not found ), it’s likely possible, that you’re missing the prerequisite packages. Be sure to install at least the following ones:

  • dh-autoreconf
  • gnuplot

If everything went well with the debian build, you can install it:

clorenz@machine:/opt/git/opentsdb (master)$ sudo dpkg -i build/opentsdb-2.2.1-SNAPSHOT/opentsdb-2.2.1-SNAPSHOT_all.deb

Initial preparings for OpenTSDB

Before you can run OpenTSDB, you have to create the hbase tables:

clorenz@machine:/opt/git/opentsdb (master)$ env COMPRESSION=GZ HBASE_HOME=/opt/hbase ./src/create_table.sh

and at least in the beginning, it is helpful, that OpenTSDB creates the metrics automatically. For that, you have to set the following line in /etc/opentsdb/opentsdb.conf:

tsd.core.auto_create_metrics = true

Starting OpenTSDB

sudo service opentsdb start

When you access http://localhost:4242/ you will see the OpenTSDB GUI.

Now it’s time to start gathering data. We’ll use TCollector for the most basic data:

Installing TCollector

Again, we’re fetching the sourcecode from github:

clorenz@machine:/opt/git $ git clone git://github.com/OpenTSDB/tcollector.git

Let’s configure tcollector, so that it uses our own OpenTSDB instance by adding one single line to /opt/git/tcollector/startstop :

TSD_HOST=localhost

Starting tcollector is pretty easy:

clorenz@machine:/opt/git/tcollector (master *)$ sudo ./startstop start

It’s done!

Now, you can access your very first graph in the interface by selecting a timeframe and the metric df.bytes.free. You shall see now a graph!

Writing custom collectors

Any collector writes one or more lines with the following format:

metric timestamp value tag1=data1 tag2=data2

With in the subdirectory collectors of your tcollector installation, there are numerical subdirectories, which denote, how often a collector is executed. A directory name of 0 stands for “runs indefinitely, like a daemon”, values greater zero stand for “runs each n seconds”.

With that in mind, it shall be fairly easy to write own collectors now, like on the following example. Note, that these collectors are not neccessiarly written in Python, but you can basically use any language

#!/usr/bin/python
import os
import sys
import time
import glob

from collectors.lib import utils

def main():
 ts = int(time.time()) 
 yesterdayradio = glob.glob('/home/clorenz/data/wav/yesterdayradio-*')
 
 print "wavfiles.total %d %d type=yesterdayradio" %( ts, len(yesterdayradio))
 
 sys.stdout.flush()


if __name__ == "__main__":
 sys.stdin.close()
 sys.exit(main())

You can test your collector by executing it on the shell:

PYTHONPATH=/opt/tcollector /usr/bin/python /opt/tcollector/collectors/300/mystuff.py

Pretty straightforward, isn’t it?

If you for some reason generated wrong data, you can delete it, but beware, that this command is very dangerous, so the “1h ago” parameter in the following script actually means “now”, since the resolution is about one hour:

/usr/share/opentsdb/bin $ sudo ./tsdb scan --delete 2h-ago 1h-ago sum wavfiles.total type=*

Find more about manipulating the raw collected data at http://opentsdb.net/docs/build/html/user_guide/cli/scan.html

Let’s now polish the whole installation with a nicer frontend to get a real dashboard:

Installing Status Wolf as frontend

Since the standard GUI of OpenTSDB is a little raw, it’s a good idea to install an alternative for it, the one which is currently best looking (not only visually, but also in terms of features, like anomaly detection) is StatusWolf. To install StatusWolf, you need only a few steps:

  • install apache2
  • install libapache2-mod-php5
  • install php5-mysql
  • install php5-curl
  • ensure, that mod_rewrite is working:
    sudo a2enmod rewrite
    sudo a2enmod actions
    sudo service apache2 restart
  • download StatusWolf and install it into /opt
  • install pkg-php-tools
  • install composer ( https://getcomposer.org/download/ ):
    sudo mkdir -p /usr/local/bin
    sudo chown clorenz /usr/local/bin
    curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/local/bin --filename=composer
  • install mysql-server (remember the root user password for later creation of the database)
  • Follow the StatusWolf setup instructions (be sure to remove all comment lines in the JSON configuration!)
  • Ensure, that all files belong to the www-data user:
    clorenz@machine:/opt $ sudo chown -R www-data StatusWolf
  • Create /etc/apache2/sites-available/statuswolf.conf:
    Listen 9653
    <VirtualHost your.host.name:9653>
     ServerRoot /opt/StatusWolf
     DocumentRoot /opt/StatusWolf
     <Directory /opt/StatusWolf>
     Order allow,deny
     Allow from all
     Options FollowSymLinks
     AllowOverride All
     Require all granted
     </Directory>
    </VirtualHost>
  • Link this file to /etc/apache2/sites-available
  • Ensure, that in /etc/apache2/mods-available/php5.conf, the php_admin_flag is disabled:
    # php_admin_flag engine Off
  • Create an user in the database (please use different values unless you want to create a security hole!):
    mysql statuswolf -u statuswolf -p
    INSERT INTO auth VALUES('statuswolf',MD5('statuswolf'),'Statuswolf User');
    insert into users values(2,'statuswolf','ROLE_SUPER_USER','mysql');

Re-using an old laptop as monitoring screen

This article describes, how to set up an old laptop with a monitor connected to it as a two-screen monitoring dashboard

monitoring

We want to install Ubuntu 14.04 LTS on the laptop, assuming, that he doesn’t have a builtin CDROM anymore. So, we need to prepare an USB stick first:

1. Download initrd.gz and linux from http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/installer-i386/current/images/netboot/ubuntu-installer/i386/

2. Format the stick (assuming, it is /dev/sdb):

mkdosfs /dev/sdb1

3. Transform the stick into a bootable device:

syslinux /dev/sdb1

4. Mount the stick and copy the two previously downloaded files initrd.gz and linux into the root directory of the stick

5. Create (again in the root of the stick) a file syslinux.cfg with the following contents:

default vmlinuz
append initrd=initrd.gz video=vesa:ywrap,mtrr vga=788

6. Download your desired ISO image (e.g. ubuntu-14.04-desktop-i386.iso ) and copy it onto the stick

7. Install a master boot record on the stick with

install-mbr /dev/sdb

 

Now, let your laptop boot from the USB stick and follow the installation dialogue.

Be sure to tick the automatic security updates, if you don’t plan to manually check for updates.

After setting up the machine, there’s not much to do any more but installing a few additional packages, such as

  • openssh-server
  • google chrome
  • xdotool

The, enable auto-login for your user and disable any kind of screensaver.

Now, write a small shell script in ~/bin/monitoring:

#!/bin/bash

google-chrome --start-fullscreen --window-position=0,0 --window-size=1680,1050 --app=http://your.monitoring.page.one.html &
sleep 10
xdotool key F11
sleep 5
google-chrome --window-position=1680,0 --window-size=1680,1050 --app=http://your.monitoring.page.two.html &
sleep 5
xdotool windowfocus --sync
xdotool getactivewindow windowmove 1680 0
xdotool key F11

and insert that script into the list of autostart programs (“Startprogramme” in the dash) in unity.

As a further enhancement, you can add a cronjob, which shuts down the laptop at the end of your working day. This means, that on each working day, you have to power on the laptop manually, but due to the startup script, the windows will be placed automatically, and there’s nothing, you have to do manually.

Application monitoring with JMX and Jolokia

(or: It’s the inner values, that count)

Remember last time, when your application was all green in your monitoring suite, but you got complaints, because it did not do, what it was expected to? Or have you ever been in the situation, where you wanted to measure what your application does, without going through megabytes of logfiles? Do you need some KPI based monitoring? Don’t want to reinvent the wheel?

For any of these cases, the following monitoring approach, using the standard Java JMX approach together with Jolokia as HTTP bridge, will be perfect!

At first, let’s take a look at JMX, the Java Management Extensions:

JMX is a Java API for ressource management. It is a standard from the early days (JSR 3: JMX API, JSR 160: JMX Remote API), got some overhaul recently (Java 6: Merge of the both APIs into JSR 255 – the JMX API version 1.3) and since Java 7, we have the JMX API version 2.0. Basically, JMX consists of three layers, the Instrumentation Layer (the MBeans), the Agent Layer (the MBean Server) and the Distributed Layer (connectors and management client).

Although you can use JMX for managing virtually everything (even services), we just contentrate here on using JMX for monitoring purposes.The same we do for MBeans.

What are MBeans?

Generally spoken, MBeans are resources (e.g. a configuration, a data container, a module, or even a service) with attributes and operations on them. Everything else, like notifications or dynamic structures are out of scope for us now.

Technically, a MBean is a class, which implements an Interface and uses a naming convention, where the Interface name is the same as the class name plus “MBean” at the end:

class MyClass implements MyClassMBean

Now, let’s create a sample counting MBean:

public interface MyEventCounterMBean {
  public long getEventCount();
  public void addEventCount();
  public void setEventCount(long count);
}
package my.monitoring;
public class MyEventCounter implements MyEventCounterMBean {
  public static final String OBJECT_NAME="my.monitoring:type=MyEventCounter";
  private long eventCount=0;

  @Override
  public long getEventCount() {
    return eventCount;
  }

  @Override
  public void addEventCount() {
    eventCount++;
  }

  @Override
  public void setEventCount(long count) {
    this.count = count;
  }
}

Before we can use the bean, we have to make it available. For that, we need to wire it with the MBean server. The MBean server acts as a registry for MBeans, where each MBean is registered by its unique object name. Those object names consists of two parts, a Domain and a number of KeyProperties. The Domain can be seen as the package name of the bean, and one of these KeyProperties, the type, is its class name. if you use the “name” property, it denotes one of its attributes.

So, for our example above, the ObjectName would be:

my.monitoring:type=MyEventCounter

In every JVM, there’s at least one standard MBean server, the PlatformMBeanServer, which can be reached via

MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();

In theory, you could use more than one MBean server per JVM, but normally, using only the PlatformMBeanServer is sufficient.

Next step: Accessing MBeans

To access our MBean, we can either use Spring and its magic, or we do it manually.

The manual way looks the following:

We once have to register our bean, e.g. in an init method:

MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName myEventCounterName = new ObjectName(MyEventCounter.OBJECT_NAME);
MyEventCounter myEventCounter = new MyEventCounter();
mbs.registerMBean(myEventCounter, myEventCounterName);

And for every access, we have to retrieve it from the MBeanServer so that we can invoke the methods:

MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName myEventCounterName = new ObjectName(MyEventCounter.OBJECT_NAME);
mbs.invoke(myEventCounterName, "addEventCount", null, null);

Have you seen the second argument of the invoke method? It’s the name of the operation, you want to invoke. If you want to pass arguments, you pass their values as an object array as third, and their signature as fourth string array, e.g.

mbs.invoke(myEventCounterName, "setEventCount", new Object[] {number}, new String[] {int.class.getName()});

If we’re lucky, and our whole application is managed by Spring, it’s sufficcient to work with configuration and annotation only.

The MBean needs to be annotated as @Component and @ManagedResource with the object name as parameter:

@Component
@ManagedResource(objectName="my.monitoring:type=MyEventCounter")

and the attributes need a @ManagedAttribute:

@Override
@ManagedAttribute
public void addEventCount() {
  eventCount++;
}

In your spring configuration, besides the <context:component-scan> tag, you need one additional line for exporting the MBeans:

<context:mbean-export>

And those classes, which want to use the bean, just have to import it with the @Autowired annotation:

@Autowired
private MyEventCounterMBean myEventCounterMBean

Accessing MBeans from outside, using Jolokia

Of course, with the jconsole, you can access your MBeans, but a more elegant and more firewall-friendly way is use an HTTP bridge, which allows you to access the MBeans over HTTP. That’s, where Jolokia joins the game.

Jolokia is JMX-JSON-HTTP bridge, which allows access to your MBeans over HTTP and returns their attributes in JSON. Nice, isn’t it? Besides that, it allows bulk requests for improved performance, has got a security layer to restrict access and is really easy to install.

If you want to monitor your webapp, which runs inside tomcat, all you need is, to deploy the Jolokia agent webapp (available as a .war file) into your tomcat.

For a standalone Java application, just apply the Jolokia JVM agent (which in fact acts as an internal HTTP server) as javaagent in your start script:

java -javaagent:$BASE_DIR/libs/jolokia-jvm-1.2.1-agent.jar=port=9999,host=*

And if you build with gradle, apply the following line to your build.gradle:

runtime (group:"org.jolokia", name:"jolokia-jvm", classifier:"agent", version:"1.2.1")

Helpers – jmx4perl

Now, that we can access our MBeans from outside, it would be nice to have a tool available to just read the values on the comand line. The best tool for that is jmx4perl, which is available on github at https://github.com/rhuss/jmx4perl

The installation reminds of the good old Perl days with CPAN. If you’ve never worked with CPAN, just install jmx4perl according to the documentation and ACK all questions.

Now, let’s get an overview of all available MBeans:

jmx4perl http://your.application.host:9999/jolokia list

And if you want a decicated bean, run:

jmx4perl http://your.application.host:9999/jolokia read my.monitoring:type=MyEventCounter

Your output is in JSON and will be like:

{
 EventCount => 234,
 Name => 'MyEventCounter'
}

And finally, if you just need one attribute, run:

jmx4perl http://your.application.host:9999/jolokia read my.monitoring:type=MyEventCounter EventCount

In that case, you’ll get nothing but the value as a result.

Let’s go!

With these tools and figures, you can monitor virtually everything inside your application. All you have to do now is to provide the data (and you, as the developer of your application know, what exactly shall be monitored) and to monitor it with Nagios, OpenTSDB, whatever you want. All these tools are able, either directly, or with helpers like jmx4perl, to access, process and monitor the data.