afs-monitor

A distributed system is one that stops you from getting any work done when a machine you've never even heard of crashes.

Leslie Lamport

Warning

This package is orphaned. Although I believe it is still useful, I no longer use AFS and am no longer maintaining this collection of scripts. If you would like to pick up maintenance of this package, please feel free. Contact me if you would like this page to redirect to its new home.

Description

This is a collection of Nagios plugin scripts for monitoring various aspects of AFS servers. They all follow the Nagios standards for check_* scripts (although don't support a few features) and use exit statuses that give Nagios the right information. They don't use any Nagios-specific libraries, however, and should be suitable for use with other monitoring systems such as mon. Any monitoring system that understands remote checks for services should be able to use these scripts with some adaptation.

check_afs_quotas checks either a single volume or all volumes on a server or server partition for quota usage and reports errors or warnings if the used space is over a configurable threshold.

check_afs_space uses vos partinfo to check the available space on each partition on a file server. It reports a critical error if the percentage used is above a configurable threshold (90% by default) and a warning if it is above a lower configurable threshold (85% by default).

check_afs_bos runs bos status on a file server or volume location server and scans the output, making sure that all commands are running normally and the file server isn't salvaging. If it sees any output it doesn't expect from bos status, it reports that output in an alert.

check_afs_rxdebug runs rxdebug against a file server and looks for any client connections that are in the state "waiting for a thread." This indicates client connections that are blocked waiting for a file server thread. We've found this to be a reliable test for detecting serious file server performance problems. It reports a critical error if the count of such connections is above a configurable level (8 by default) and a warning if it is above a lower configurable threshold (2 by default).

check_afs_udebug runs udebug against a ubik service (vlserver, ptserver, kaserver, or buserver) and makes sure that it is in a reasonable state. It checked to be sure that there is a sync site for the service, and when there is, that the sync site believes that the recovery state is 1f (indicating that all of the slaves have the same version of the database).

These scripts were written by Xueshan Feng, Neil Crellin, Quanah Gibson-Mount, and Russ Allbery. Many modifications to the scripts were based on work by Steve Rader.

Requirements

These scripts are primarily intended to be run from inside Nagios, but can also be run separately via any other monitoring system or even manually. They are all written in Perl and require various versions, but 5.006 is sufficient to run all of them.

They use the standard AFS utilities to do their work, so you need to have rxdebug, udebug, bos, and vos available. Obviously, since these scripts monitor AFS, you want to have copies of these utilities on local disk rather than relying on copies out of AFS. You may want to change the logic at the top of the scripts to search for a suitable version; by default, it expects to find them in /usr/bin or /usr/local/bin.

The scripts all have default timeouts and check_afs_space and check_afs_rxdebug have default thresholds that you may want to change. You may also want to look at the regexes for acceptable bos status lines (for example, if you want to get paged whenever your file server has a core file, you will want to modify the regex to not filter that out).

check_afs_quotas and check_afs_space will use Number::Format, if available, to format sizes with IEC 60027 prefixes.

Download

The distribution:

afs-monitor 2.4 2013-01-13 Download PGP signature

An archive of older releases is also available.

A Debian package (as nagios-plugins-afs) is available from my personal repository.

afs-monitor is maintained using the Git version control system. To check out the current development tree, clone:

    git://git.eyrie.org/afs/afs-monitor.git

You can also browse the current development source.

Documentation

User documentation:

Developer documentation:

License

Copyright 2003, 2004, 2005, 2006, 2010, 2011, 2013 The Board of Trustees of the Leland Stanford Junior University

These programs are free software; you may redistribute them and/or modify them under the same terms as Perl itself. This means that you may choose between the two licenses that Perl is released under: the GNU GPL and the Artistic License. Please see your Perl distribution for the details and copies of the licenses.

Last spun 2022-02-06 from thread modified 2014-08-10