Blog Archives

Recover from failed Dell perc raid5 logical disk

We encountered a failed logical disk on a Dell Perc SAS controller. After a quick review we discovered that two disks out of the four configured for RAID5 had failed. This event triggered the Perc controller to put the logical disk offline. Now what…

Everyone knows that when using a raid5 distributed partity with 4 disks the maximum redundancy is losing 1 disk. With two failed disks data loss is usually inevitable. SO, if this is also the case with your machine, please realize your chances of  recovering are slim. This article will not magically increase the chances you have on recovering. The logic of the Dell Perc SAS controller actually might.

First off, I will not accept any responsibility for damage done by following this article. Its content is intended to offer the troubleshooting engineer an possible solution path. Key knowledge is needed to interpret your situation correctly and with that the applicability of this article.

TIP: Save any data still available to you in a read-only state.
(If you have read only data, this article does not apply to you!)

What do you need?

Obviously you need to have two replacement disks available.
You also need to have a iDRAC (Dell remote access card) or some other means to access the systemlog.
You need to have physical access to the machine (to replace the disks and review the system behavior)

 

What to do?

Our specific setup:
 Controller  0
 -Logical volume 1, raid5
 + Disk 0:2     Online
 + Disk 0:3     Failed
 + Disk 0:4     Online
 + Disk 0:5     Failed

The chance both problematic disks 0:3 and 0:5 failed simultaneously is near to zero. What I mean to say by this is that disks 0:3 and 0:5 will have failed in a specific order. This means that the disk who failed first will have ancient data on it. In order to make an recovery attempt we need actual and not historical data. To this end we first need to identify the disk that failed first. This will be the disk that we will be replacing shortly.

Identifying the order in which the disks failed
Luckily most Dell machines ship with a Dell Remote Access Card (Drac). HP and other vendors have similar solutions. The iDRAC keeps an system log. In this log the iDRAC will report any system events. This also goes for the events triggered by the Perc SAS controler. Enter the iDRAC interface during boot <CTR+?> and review the eventslog. Use the timestamps in this log to identify the first disk that failed. Below an example of the Log output:

fotoIn our case, disk 0:5 failed prior to disk 0:3. Be absolutely sure that you identify the correct disk. We want the most current data to be used for a rebuild. If this is for any reason historical data, you will end up with corrupted data. Write the disk number of the disk that failed first on a piece of paper. This is the disk that needs to be replaced with a new one. This could be a stressful situation (for your manager), so be mindful that a stressed manager chasing you gut could confuse you. You do not want to mix the disks up, so keep checking your paper and do not second guess but check if your not sure.

Exit the IDRAC interface and reboot the machine and enter the Perc controller, usually <CTR+?> during boot. Note that the controller also reports the logical volume  being offline. If this is the case, enter the Physical Disks (PD) page (CTR+N in our case). Also note here that disks 0:3 and 0:5 are in a failed state. Select disk 0:3 and force this disk online using the operations menu (F2 in our case)  and accept the warning. DO NOT SELECT or ALTER THE DISK WITH HISTORICAL DATA (0:5)!!!

Now physically replace disk 0:5 with the spare you have available. If all is well, you should notice that the controller is automatically starting a rebuild (LEDS flashing fanatically).  Review your lcd-screen and note that disk 0:5 is now in a rebuilding state. Most controllers let you review the progress. On our controller the progress was hidden on the next page of the disk details in the physical disk (PD) page, which was reachable using the tab key. Wait for the controller to finish. (This can take quite a while, clock the time between % and muliply that with 100 then divide that with 60 to get the idea. Get a coffee or good night sleep).

Once the controller is finished it will in most cases place the replaced disk 0:5 in an OFFLINE state and the forced online disk (0:3) back in FAILED state. Now use the operations menu to force DISK 0:5 (rebuild disk) online and note the logical volume becoming available in a degraded state. Reboot the machine and wait for the OS to boot.

All done?

Well the logical volume should be available to the OS. This doesnt mean there is any readable data left on the device. Usually this will become apparent during OS boot. Most operating systems will perform a quick checkdisk during mount. Most errors will be found there. One of two things can happen:

1) Your disk is recovered but unclean and will be cleaned by the OS after which the boot will be successful or…
2) the disk is corrupted beyond the capabilities of a basic scan disk.

In the latter case you might want tot attempt additional repair steps and perform a OS partition recovery. In most cases, if this is your scenario, the chance you will successfully recover the data is very slim.

I hope you, like me, successfully recovered your disk.
(Thanks to the failure imminent detection and precaution functions the Dell Perc controller implement)

Advertisements

Extract all content to disk from a SPS2007 content DB using PHP

Today I ran into a problem. We needed to migrate a huge amount of data from an old SharePoint 2007 content database without the availability of the MOSS front-end. All i had was the database and a corrupted sharepoint install that wasnt going to help me allot.

To overcome this problem I decided to write a little PHP application that would do this task for me. I allready had WAMP setup on my desktop, so i figured this to be the quickest route. Then i figured, maybe other people face this problem as well. So here it is, the code, and some helpers to get you going.


<?php
/**
* @name           : index.php - MSSql Content connector
* @Author        : Chris Gralike
* @version        :
* @copyright     : WETFYWTDWI - what ever ** you want to do with it, no guarantees 🙂
*  This script ONLY READS the database tables, so dont give it more permissions 🙂
*/

// What to search for in the directory structure.
$search = '';
// Where too put the files
$createdir = './Downloaded';
// What server too connect to.
$ServerName = 'amisnt05.amis.local';
// Database connection parameters.
$connectionInfo = array('Database' => 'MOSS_PROD_WSS_Content_WebApp02',
'UID' => 'php_login',
'PWD' => 'welcome12345678');
// This can be a very long task to complete, so disable the timelimit.
set_time_limit(0);
// Create a connection
$conn = sqlsrv_connect($ServerName, $connectionInfo)
or die( print_r( sqlsrv_errors(), true));

// The SQL statment to query the AllDocs tables.
$tsql = "SELECT dbo.AllDocs.Id,
dbo.AllDocs.SetupPath,
dbo.AllDocs.LeafName,
dbo.AllDocs.DirName,
dbo.AllDocs.SetupPath,
dbo.AllDocs.Extension,
dbo.AllDocs.ExtensionForFile,
dbo.AllDocStreams.Id as StreamId,
dbo.AllDocStreams.Content
FROM dbo.AllDocs
RIGHT OUTER JOIN dbo.AllDocStreams ON dbo.AllDocs.Id = dbo.AllDocStreams.Id
WHERE AllDocs.DirName LIKE '%{$search}%'
AND AllDocs.SetupPath IS NULL
AND AllDocs.Extension != ''
";
// The result set
$result = sqlsrv_query($conn, $tsql);

// Process the results
while($row = sqlsrv_fetch_array($result,  SQLSRV_FETCH_ASSOC)){
// When create is true, then it will create the folders in
// in the foreach
$create = false;
$dirptr = $createdir;

// Find the folders and recreate them starting from the searchstring.
$folders = explode('/', $row['DirName']);
foreach ($folders as $val){
if($val == $search || $create == true || empty($search)){
$create = true;
$dirptr .= '/'.$val;
if(!is_dir($dirptr)){
mkdir($dirptr);
echo "INFO: created $dirptr <br/>";
}else{
echo "WARN: skipping $dirptr allready exists. <br />";
}
}
}

// Recreate the file
$filepath = $dirptr.'/'.$row['LeafName'];
if(!is_file($filepath)){
touch($filepath);
}
if($fp = fopen($filepath,'w')){
fwrite($fp, $row['Content']);
echo "INFO: file {$row['LeafName']} written. <br />";
}else{
echo "ERROR: file {$row['LeafName']} could not be written in $filepath. <br />";
}
fclose($fp);
}
// Close the database connection.
sqlsrv_close($conn);
?>

Simply configure the first vars in the script and run the file. It might take a huge while before you get some output.

TIP: Use the $search to narrow down the query a bit.
It searches the DirName (I.e. Site\DocLib\Folder\SubFolder\)

The output will look like this.

INFO: created ./Downloaded/SearchCenter
INFO: created ./Downloaded/SearchCenter/Pages
INFO: file facetedsearch.aspx written.
WARN: skipping ./Downloaded/SearchCenter allready exists.
WARN: skipping ./Downloaded/SearchCenter/Pages allready exists.
INFO: file resultskeyword.aspx written.

ANY THOUGHTS, OR NEED SOME HELP?
Then please leave a comment 🙂

WARNING!: YOU NEED THE Microsoft MSSQL DRIVER FOR PHP, not the old php equivalent.
here are some tips on where to get it. My version was php 5.3.8

First off, mssql isnt supported out of the box anymore. when using PHP 5.2 and up, you need to get the Microsoft for PHP driver. Check this site for more information : http://sqlsrvphp.codeplex.com/

Its a bit of an hassle ill give you that.
Challenge: I needed 1.5hours to find the correct Lib,Install,Coding info and get it working.

Basically it requires you to download the native client, the drivers and an correct update of the php.ini your wamp instance is using.

Tip: Use <?php phpinfo() ?> to find the right version for your PHP compilation.
Search for : PHP Extension Build : API20090626,TS,VC9

DIFFERENT SHAREPOINT VERSION?!

Be sure to verify the SQL query inside the $tsql=” var and alter it accordingly. The other part should be pretty straight forward.

Check_iostat.pl version 0.9.7

Previous version and detaild information
https://sysengineers.wordpress.com/2010/02/05/check_iostat-for-nagios-version-0-9-5/

Application Goal:
Check disk IO using a simple perl script and possibly the counters allready available in the linux /proc/ directory. The program must be able to run under Nagios and should return a ‘Nagios’ correct syntax to the prompt. This way the plugin should be usable for Linux users to monitor their Disk IO within Nagios. The program should also report back performance information in the ‘Nagios’ compatible syntax so that graphing is available in the ‘Nagios’ / Cacti / Centreon add-ons.

Changes
Fixed lacking devision in validated summed results
Added sprintf functions rounding the validated numbers with 2 dimentions
Fixed some declaration problems
Added -debug switch to enable the debugging options
Coded arround the used math::Complex functions and removed the module
Improved the debugging output (readability)
Small fixes in de syntax

Installation
1. Copy the code below to your clipboard (ctr+c or the copy-button inside the code field)
2. Logon to the linuxbox with the nagios or nrpe client.
3. browse to the plugin-dir usually cd /usr/local/nagios/libexec/
4. Create a new file vi ./check_iostat.pl
5. Press insert and paste the code inside (when using putty paste is done with a rightmouse click)
6. save the file (esc > : > rq > enter, when using vi)
7. give the file execute rights using chmod +x ./check_iostat
8. make nagios the owner chown nagios:nagios ./check_iostat
9. Test it using updatedb& ./check_iostat -d sd -dbu -dbuw 70 -dbuc 88 -kbs -kbsw 10000 -kbsc 50000 -p

Next configure a nagios command or use nrpe and have fun 🙂

#!/usr/bin/perl
# Written by Chris Gralike @ AMIS.
# Perl based check command to fetch and report the
# TPS (transactions per second) and IO wait times.
# Plugin uses iostat for opperation.
# Verion 0.9.7
#
# Changes post 0.9.7 >> 28-05-2010
# Line 298...303 Prevent devision by zerro else exit because no data was collected  - Bug reported by Epiq.
# Line 380       Correction of a type that prevented pref data of r/s w/s from being printed. - Bug reported by Epiq.
# Line 40        Added dm as possible device input used by the linux LVM. - Suggested by Epiq.
# Line 269       Extended the device if/pragmatch validation to match more devices - Added by Jean Ventura.
#
###########################
use Switch;
use warnings;

my($numArgs,
   $debug,	# Print debugging information.
   $DevType,	# Used to match a certain devicetype from the resulting IOstat rows.
   $IOBIN,	# Is used to store a path to the iostat binairy for execution.
   $Samples,	# Used to store the initial Samples returned by iostat.
   @SampleRows, # Used to store the rows generated by the splitted samples.
   $firstseen,  # Used to keep track of the found devices (IOstat might return a set of devices i.e sda, sdb, sdc etc.)
   $Items,      # Used in the foreach to store the row being parsed.
   @cols,       # Used to store the columns in a row after an split.
   $dev,	# Used to create a symbolic link to dynamicly create a var.
   $rqm,
   $val,
   $devtypes,
   $rws, $rws_warn, $rws_crit,
   $kbs, $kbs_warn, $kbs_crit,
   $awt, $awt_warn, $awt_crit,
   $svc, $svc_warn, $svc_crit,
   $devices, $itd,$v1,$v2,$v3,
   $v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,
   $dbu, $dbu_warn, $dbu_crit
   );

# Preparing to collect the dangerious user input.
# Here is a list of known device types. Please add any device you would like to monitor..
$devtypes=";sd;hd;dm;";
$numArgs = $#ARGV + 1;
$critical_global = 0;
$warning_global = 0;

if($numArgs gt '0'){
	for($i=0;$i<$numArgs;$i++){
	# Process our command line arguments and do some basic testing.
	# Could be make human save in the future.
	switch ($ARGV[$i]) {
		# Enable debugging.
		case '-debug'{
				$debug = 1;
			     }
		# Handle device type
		case '-d'    {
				$val=$ARGV[$i+1];
		 		if( (index($devtypes, $val)) gt '-1'){
					$DevType=$val;
					$i++;
		  		}else{
					print "Ivalid Disktype found. Typo?\n"; exit 1;
		  		}
			     }
		# Do we need to check rqm?
		case '-rqm'  { $rqm='1'; }
		# What is the warning treshold?
		case '-rqmw' {
				$val=$ARGV[$i+1];
				# Is the value nummeric?
				if($val=~m/[0-9]*/){
					$rqm_warn= int $val;
					$i++;
				}else{
				# Possible type?
					print "Non Numeric value used in rqmw, typo? \n"; exit 1;
				}
			     }
		case '-rqmc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rqm_crit= int $val;
					$i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rqmc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check rws?
		case '-rws'  { $rws='1'; }
		case '-rwsw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsw, typo? \n"; exit 1;
                                }
			     }
		case '-rwsc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check kbs?
		case '-kbs'  { $kbs='1'; }
		case '-kbsw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsw, typo? \n"; exit 1;
                                }
			     }
		case '-kbsc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check awt?
		case '-awt'  { $awt='1'; }
		case '-awtw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtw, typo? \n"; exit 1;
                                }
			     }
		case '-awtc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check svc?
		case '-svc'  { $svc='1'; }
		case '-svcw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcw, typo? \n"; exit 1;
                                }
			     }
		case '-svcc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check dbu?
		case '-dbu'  { $dbu='1'; }
		case '-dbuw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuw, typo? \n"; exit 1;
                                }
			     }
		case '-dbuc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuc, typo? \n"; exit 1;
                                }
			     }
		# performance data. Might make the string human unreadable.. ow well, no loss there <img src="https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley">
		case '-p'    { $prf='1'; }
		# Print full messages per device overview.
		#case '-m'    { $fms='1'; }
		# Most used help switches.
		case '--help'{ USAGE(); }
		case '-h'    { USAGE(); }
	}
	}
	# Check if the basic requirements are met.
	if(!($DevType) ||  !(($rqm || $rws || $kbs || $awt || $svc || $dbu))){
		print "Minimal requiremens, device and checktype are not met\n";
		USAGE();
	}
}else{
	# No input was given. Show the Usage();
	USAGE();
}

# Locate the IOBin binairy needed to fetch the stats.
chomp($IOBIN=`which iostat`);

if( !(-f $IOBIN) || !(-x $IOBIN)){
	print "A working iostat command is needed for this script to work \n";
	print "Also make sure the sysstat service is running! /etc/init.d/sysstat \n";
	exit 1;
}

if($DevType){
	# IF IOStat is found, lets collect some data.
	chomp($Samples=`$IOBIN -d -x -k 1 5 | grep $DevType`);
}else{
	print "Please select a valid devicetype \n";
	exit 1;
}

# Break the samples up in lines so we can evaluate them.
# The first set of samples are avg. values counted from boot.
# We need to discard them and collect the remaining samples.
@SampleRows=split(/\n/, $Samples);

# Firstseen is used to track de device names and to skip the first itteration of iostat that contains
# avg stats counted from system boot time, stats we cant use here sadly <img src="https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley">
$firstseen='';
$CT=0;
$Devices=0;

# Print Debugging information
if($debug){
	    print "### Data collected ###\n";
	    print "dev|rrqm|wrqm|r/s|w/s|rKB/s|wKB/s|rq-sz|qu-sz|await|svctm|util%|\n";
}

foreach $Items (@SampleRows){
	 # Break the latter up in usable columns.
        @cols=split(/\s+/,$Items);
	 # We only want a certain device type. So lets match what we know.
	 # Disks usualy have a prefix for example (scsi = sd) its set using
	 # the DevType var.
         if($cols[0]=~m/$DevType-[0-9]$/ || $cols[0]=~m/$DevType[a-z]$/ || $cols[0]=~m/$DevType[a-z][0-9]$/){

		$dev=$cols[0];
		if((rindex $firstseen, $dev) gt '-1'){
			#Declare new $$
			#my @$dev;
			# Store the collected data in the correct (dynamic) vars.
			$$dev[1]+=$cols[1];	# rrqm/s   Read Requests Merged per Second.
			$$dev[2]+=$cols[2];	# wrqm/s   Write Requests Merged per Second.
			$$dev[3]+=$cols[3];	# r/s      Number of read requests issued per Second
			$$dev[4]+=$cols[4];	# w/s      Number of write requests issued per Second
			$$dev[5]+=$cols[5];	# rKB/s    Number of Kilobytes read per Second.
			$$dev[6]+=$cols[6];	# wKB/s    Number of Kilobytes written per Second.
			$$dev[7]+=$cols[7];	# Avgrq-sz Avarage size (in sectors) of the issued requests.
			$$dev[8]+=$cols[8];	# Avgqu-sz Avarage Queue length of the requests issued.
			$$dev[9]+=$cols[9];	# Await	  Avarage wait time in ms for IO requests to be served.
			$$dev[10]+=$cols[10];	# svctm    Avarage service time in ms for IO requests that where issued.
			$$dev[11]+=$cols[11];   # %util    Precentage of CPU time during IO requests (bandwidth util), saturation at 90~100%
			$CT++; # Add a new itteration to the count
			# Print some debugging vars if requested. to show the data is collected.
			if($debug){
				    print "$dev|$cols[1]|$cols[2]|$cols[3]|$cols[4]|$cols[5]|$cols[6]|$cols[7]|$cols[8]|$cols[9]|$cols[10]|$cols[11]|\n";
			}
		}else{
			$Devices++;
			$firstseen.="$dev;";
		}
	}
}

# Prevent $itd (itterations / disk) from becomming zerro and exit when no devices are found.
# on line 299.
if($Devices > 0){
	$itd = ($CT / $Devices);
}else{
	print "No performance data was captured. Please check if the device name is correct\n"; exit 1;
}

# Print debugging information
if($debug){
	print "###Devices Counted###\n";
	print "Number of devices : $Devices\n";
	print "Number of itterations per device : $itd\n";
	print "Total Number of Itterations : $CT\n";
}
# Lets collect the device information from the firstseen var
# and start processing it for some perf check/data
# Lets also recycle some previously used vars for this.
@cols=split(/;/,$firstseen);
foreach $Items (@cols){
	# Items now contains the devicenames needed to access the data again.
	# We now need to check them against some basic tresholds
	# Print a nice table with the calculated values when we are in debug.

  	# Print debugging information
	if($debug){
		   print "### Counted Values ###\n";
		   print "$Items|$$Items[1]|$$Items[2]|$$Items[3]|$$Items[4]|$$Items[5]|$$Items[6]|$$Items[7]|$$Items[8]|$$Items[9]|$$Items[10]|$$Items[11]|\n";
        }

	#What do we want to check against a treshold?
	#First check the selection if any.
	if($rqm || $rws || $kbs || $awt || $svc || $dbu){

		#Set the counts to zerro
		$critical_state='0';
		$warning_state='0';
		$ok_state='0';
		# Devide
		$round="%.2f";
		$v1 = sprintf($round, ($$Items[1] / $itd));
        $v2 = sprintf($round, ($$Items[2] / $itd));
		$v3 = sprintf($round, ($$Items[3] / $itd));
        $v4 = sprintf($round, ($$Items[4] / $itd));
		$v5 = sprintf($round, ($$Items[5] / $itd));
        $v6 = sprintf($round, ($$Items[6] / $itd));
		$v7 = sprintf($round, ($$Items[7] / $itd));
        $v8 = sprintf($round, ($$Items[8] / $itd));
		$v9 = sprintf($round, ($$Items[9] / $itd));
        $v10 = sprintf($round, ($$Items[10] / $itd));
        $v11 = sprintf($round, ($$Items[11] / $itd));

		# Requests Merged per second.
		if($rqm){
			# Critical
			if(($v1 >= $rqm_crit) || ($v2 >= $rqm_crit)){
				$critical_state+='1';
			# Warning?
			}elsif(($v1 >= $rqm_warn) || ($v2 >= $rqm_warn)){
				$warning_state+='1';
			# Ok
			}else{
				$ok_state+='1';
			}
			# Add the counters to the performance vars
			$perf.="$Items-rrqm/s=$v1; $Items-wrqm/s=$v2;";
		}
		# Reads / Writes per second.
		if($rws){
			if(($v3 >= $rws_crit) || ($v4 >= $rws_crit)){
	                    	$critical_state+='1';
             		# Warning?
               	 	}elsif(($v3 >= $rws_warn) || ($v4 >= $rws_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			# Add the counters to the performance var.
			$perf.="$Items-r/s=$v3; $Items-w/s=$v4; ";
		}
		# KB Read/Writes per second.
		if($kbs){
			if(($v5 >= $kbs_crit) || ($v6 >= $kbs_crit)){
	                        $critical_state+='1';
	                # Warning?
       	         	}elsif(($v5 >= $kbs_warn) || ($v6 >= $kbs_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                       	 	$ok_state+='1';
                	}
			$perf.="$Items-rKB/s=$v5; $Items-wKB/s=$v6; ";
		}
		# Avarage wait time
		if($awt){
			if(($v9 >= $awt_crit)){
                        	$critical_state+='1';
                	# Warning?
                	}elsif(($v9 >= $awt_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-await=$v9; ";
		}
		# Avarage service time issuing time
		if($svc){
			if($v10 >= $svc_crit){
       	                	$critical_state+='1';
                	# Warning?
                	}elsif($v10 >= $svc_warn){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-svctm=$v10; "
		}
		# Disk bandwidth Utilization
		if($dbu){
			if($v11 >= $dbu_crit){
                	        $critical_state+='1';
              	  	# Warning?
               	 	}elsif($v11 >= $dbu_warn){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-util=$v11%; ";
		}
	}else{
		print "At least select a value to measure..\n";
		exit 1;
	}
	# Print Debugging information about the validated values.
	if($debug){ print "### validated Devisions ###\n";
                    print "$Items|$v1|$v2|$v3|$v4|$v5|$v6|$v7|$v8|$v9|$v10|$v11|\n";
	}

	# Create a messages var.
	$mgs.="$Items=O:$ok_state,W:$warning_state,C:$critical_state; ";

	# Track the global state (1 crit 1 warn)
	if(($critical_state gt '0') || ( $critical_global gt '0')){
		$critical_global+='1';
	}
	if(($warning_state gt '0') || ( $warning_global gt '0')){
		$warning_global+='1';
	}
}

# Compose a nice nagios output.
if($critical_global >= '1'){
	print "CRITICAL:";
	$exit=2;
}elsif($warning_global >= '1'){
	print "WARNING:";
	$exit=1;
}else{
	print "OK:";
	$exit=0;
}
# Print the remainder, the most important data was processed.
print $mgs; if($prf){ print "|$perf"; } print "\n";
exit $exit;

###Subroutines
sub USAGE{
	print "
                Usage : $0 -d [Dev] [options]

		-p Print performance data about the measured samples.

		-d {grep string used on IOstat}
		examples;
                 sd     #All scsi devices.
                 hd     #All Cdrom devices.
		 sda	#Only device sda

                [Available Measurement Options]
                -rqm -rqmw val -rqmc val        # read/write merged             [#]
                -rws -rwsw val -rwsc val        # read/write per second.        [s]
                -kbs -kbsw val -kbsc val        # KBs read/written per second.  [s]
                -awt -awtw val -awtc val        # Avarage IO wait time.         [ms]
                -svc -svcw val -svcc val        # Avarage service IO wait time. [s]
                -dbu -dbuw val -dbuc val        # Disk utilization              [%]\n";
        exit 1;
}

check_iostat / check_io for nagios Version 0.9.5

Oke here is the first working version of check_iostat.pl

NEW Version available HERE

https://sysengineers.wordpress.com/2010/05/27/check_iostat-pl-version-0-9-7

Why this version?
Well to my big surprise I was not able to find any satisfying check plugin for diskio that didnt need all kinds of additional software or undocumented perl plugins. What i really wanted was a simple check plugin that would only require the default sysstat daemon allready present on allot of our systems (Oracle DB requirement).

Because no one seemed to have written such a plug-in I decided to write one myself. And here is the result.

How to get started?
Copy paste the source into a check_iostat.pl file within nagios/libexec.

The following needs to be on your Linux machine.
sysstat must be installed!
iostat must be available!
perl (base) should be installed!

What can you do?
1. Let us know on what distro you have tested this code! And or what changes you made to make it work.
2. Tell me if you spotted any improvements in the code.
3. Tell me if there are any bugs.

What more?
Questions? Different requirements? Improvements? Code cleanup?
Please let me know…

Tested with?
PERL
v5.8.8
v5.8.3

LINUX
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)
Enterprise Linux Enterprise Linux Server release 5.2 (Carthage)
SUSE LINUX Enterprise Server 9 (i586)

Note!
the code is still a bit messy but functional. Keep an eye out here for updates. I will be cleaning up this code further.

Rgrds,
Chris.

Code below…
Read the rest of this entry

Phase one, Check IOStat for Nagios.

Full Working version can be found here

/* CHECK OUT THE LINK AT THE TOP OF THIS POST 😉 */
/* OR CHECK THE SOURCE DIRECTLY HERE ;               
     http://technology.amis.nl/download/iostat/

    THIS SCRIPT REQUIRES THE SYSSTAT PACKAGE (IOSTAT)
*/

Thanks, Rgrds,

altering check_disk_smb nagios v3 to add performance data.

Open a linux console (putty?)
type the following command;
vim /usr/local/nagios/libexec/check_disk_smb

In the file scroll down to rule “187” it should be located right below ”

my ($mountpt) = "<a>\\\\$host\\$share</a>;"

Add the following bulk of text by using “Shift + I” to enter the insert mode.

Then you need to make sure the data is output on the prompt as soon as nagios executes this command. The returned string is created a bit lower in the file. Search for the following.

if ((($warn_type eq "P") &amp;&amp; (100 - $capper) &lt; $warn) || (($warn_type eq "K") &amp;&amp; ($avail_bytes &gt; $warn))) {
                $answer = "Disk ok - $avail ($capper%) free on $mountpt \n";
        } elsif ((($crit_type eq "P") &amp;&amp; (100 - $capper) &lt; $crit) || (($crit_type eq "K") &amp;&amp; ($avail_bytes &gt; $crit))) {
                $state = "WARNING";
                $answer = "WARNING: Only $avail ($capper%) free on $mountpt\n";
        } else {
                $state = "CRITICAL";
                $answer = "CRITICAL: Only $avail ($capper%) free on $mountpt\n";
        }

Replace it with something like this…

if ((($warn_type eq "P") &amp;&amp; (100 - $capper) &lt; $warn) || (($warn_type eq "K") &amp;&amp; ($avail_bytes &gt; $warn))) {
                $answer = "Disk ok - $avail ($capper%) free on $mountpt | total=$total; used=$used; free=$avail;\n";
        } elsif ((($crit_type eq "P") &amp;&amp; (100 - $capper) &lt; $crit) || (($crit_type eq "K") &amp;&amp; ($avail_bytes &gt; $crit))) {
                $state = "WARNING";
                $answer = "WARNING: Only $avail ($capper%) free on $mountpt | total=$total; used=$used; free=$avail;\n";
        } else {
                $state = "CRITICAL";
                $answer = "CRITICAL: Only $avail ($capper%) free on $mountpt | total=$total; used=$used; free=$avail;\n";
        }

effectivly we added this as performance output for nagios or any other iterp. like centreon…
| total=$total; used=$used; free=$avail;

Have fun!

 

my ($total) = ($1*$2)/1024;
 my ($used) = int($total-$avail);
        if (int($total / 1024) &gt; 0){
                $total = int($total / 1024);
                if (int($total /1024) &gt; 0){
                        $total = (int(($total / 1024)*100))/100;
                        $total = $total . "G";
                }else{
                        $total = $total . "M";
                }
        }else{
                $total = $total . "K";
        }
        if (int($used / 1024) &gt; 0){
                $used = int($used / 1024);
                if (int($used /1024) &gt; 0){
                        $used = (int(($used / 1024)*100))/100;
                        $used = $used . "G";
                }else{
                        $used = $used . "M";
                }
        }else{
                $used = $used . "K";
        }