Module: check_mk
Branch: master
Commit: 518a623ed8f06257b4d900fcdbee070d20372517
URL: http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=518a623ed8f062…
Author: Lars Michelsen <lm(a)mathias-kettner.de>
Date: Wed Jan 12 22:52:57 2011 +0100
Added new check df.trend to analyze df growth within given timeranges
---
checkman/df.trend | 78 +++++++++++++++++++++++++++++++
checks/df | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 211 insertions(+), 0 deletions(-)
diff --git a/checkman/df.trend b/checkman/df.trend
new file mode 100644
index 0000000..8bdd86d
--- /dev/null
+++ b/checkman/df.trend
@@ -0,0 +1,78 @@
+title: Check filesystem space usage trends within a given timerange
+agents: linux, windows, aix, solaris, vms
+author: lars Michelsen <lm(a)mathias-kettner.de
+license: GPL
+distribution: check_mk
+description:
+ This check measures the disk usage trend within a given timerange.
+ Thresholds can be used to check if the trend is within a given range
+ and the disk space is enough for a given amount of time.
+
+ The check snapshots the current disk usage and compares it with the
+ current usage until it is too old for the given timerange. Then the
+ snapshot is refreshed.
+ When setting up the check for a filesystem it needs one interval of
+ the given timerange to get an average value for this time. Until this
+ is done the check might be to sensitive.
+
+ The check tries to calculate the time left until the disk is full. This
+ calculation is done on the average bytes per second within the given
+ timerange. This calculation assumes a linear growing disk usage. If you
+ find this senseless these checks can be disabled.
+
+item:
+ The mount point of the filesystem (UNIX) or the drive
+ letter in upper case followed by a colon (Windows).
+
+examples:
+ # Set default parameters to:
+ # 1. analyze the last 24 hours
+ # 2. growing of 1% within the given timerange raises a WARNING
+ # 3. growing of 2% within the given timerange raises a CRITICAL
+ # 4. 10 days left until disk is full (on linear growing) raises a CRITICAL
+ df_trend_default_levels = (24 * 60 * 60, 1.0, 2.0, 10, None)
+
+ # Exclude temporary backup from inventory
+ inventory_df_exclude_mountpoints = [ "/mnt/backup" ]
+
+ # Exclude certain filesystems from being inventorized at all
+ inventory_df_exclude_fs = [ "iso9660", "romfs" ]
+
+perfdata:
+ One variable: the used space of the filesystem in MB. Also the minimum (0 MB),
+ maximum (size of the filesystem) and the warning and critical
+ levels in MB are provided.
+
+inventory:
+ df.trend supports inventory. All filesystem the agent reports
+ will be inventorized except mount points listed in
+ {inventory_df_exclude_mountpoints} and filesystem types
+ listed in {inventory_df_exclude_fs}. The Windows agent
+ only reports fixed disks. The Linux agent reports filesystems
+ that have a size and are not of type smbfs, tmpfs, cifs or nfs.
+
+[parameters]
+ # 1. timerange to look at
+ # 2. warning grow percent
+ # 3. critical grow percent
+ # 4. warning days left (based on linear growing)
+ # 5. critical days left (based on linear growing)
+timerange (int): The timerange to analyze (in seconds). Defaults to 86400.
+warning_perc (float): Grow percent compared to the volume size within the given
+ timerange to raise a {WARNING} state. Can be set to {None} to disable this check.
+ Defaults to 1.0.
+critical_perc (float): Grow percent compared to the volume size within the given
+ timerange to raise a {CRITICAL} state. Can be set to {None} to disable this check.
+ Defaults to 2.0.
+warning_days (int): Days left when assuming linear growing as measured in the given
+ timerange to raise a {WARNING} state. Can be set to {None} to disable this check.
+ Defaults to {10} days.
+critical_days (int): Days left when assuming linear growing as measured in the given
+ timerange to raise a {CRITICAL} state. Can be set to {None} to disable this check.
+ Defaults to {None}.
+
+[configuration]
+inventory_df_exclude_fs (list of strings): Lists of filesystem types to exclude from inventory
+inventory_df_exclude_mountpoints (list of strings): List of mount points to exclude from inventory
+df_trend_default_levels (int, float, float, int, int): Default levels for filesystem trends detected by inventory. This variable is preset to {(86400, 1.0, 2.0, 10, None)}.
+
diff --git a/checks/df b/checks/df
index a809e35..1555b66 100644
--- a/checks/df
+++ b/checks/df
@@ -101,3 +101,136 @@ def inventory_df(checkname, info):
return inventory
check_info['df'] = (check_df, "fs_%s", 1, inventory_df)
+
+
+
+# New check: df.trend with checks the amount of space
+# which has been allocated in a given time range
+#
+# Author: Lars Michelsen <lm(a)mathias-kettner.de
+
+df_trend_default_levels = (86400, 1.0, 2.0, 10, None)
+
+# This stores the current value and compares it with the value
+# which before "backlog" seconds. If there is no value found
+# with this age (check is too new) the oldest known value is used.
+def get_trend(itemname, this_time, this_val, backlog):
+
+ # first call: take current value
+ if not itemname in g_counters:
+ g_counters[itemname] = (this_time, this_val)
+
+ if opt_dont_submit:
+ return 0, this_val
+ raise MKCounterWrapped(itemname, 'Counter initialization')
+
+ # Get previous value and time difference
+ last_time, last_val = g_counters.get(itemname)
+ timedif = this_time - last_time
+
+ # Only update when old data is older than 24h
+ if this_time - last_time > backlog:
+ print "get_trend: Saving new val"
+ g_counters[itemname] = (this_time, this_val)
+
+ return timedif, (this_val - last_val)
+
+def inventory_df_trend(checkname, info):
+ inventory = []
+ for line in info:
+ try:
+ fs_type = line[1]
+ size_kb = int(line[2])
+ if size_kb == 0 or line[5] == '-':
+ continue # exclude filesystems without size
+ item = " ".join(line[6:]).replace('\\', '/') # Windows \ is replaced with /
+
+ # exclude some filesystem types and some items
+ if fs_type not in inventory_df_exclude_fs and item not in inventory_df_exclude_mountpoints:
+ inventory.append((item, 'df_trend_default_levels'))
+ except ValueError,e:
+ sys.stderr.write("Invalid plugin output '%s'\n" % (line,))
+ pass # ignore e.g. entries for /proc, etc. if plugin sends any
+
+ return inventory
+
+# FIXME: There is some duplicate code with the df check. Maybe move to include file
+def check_df_trend(item, params, info):
+ # df outputs seven columns:
+ # DEVICE FS-TYPE SIZE(KB) USED(KB) AVAIL(KB) USED(%) MOUNTPOINT
+ # The mount point may contain spaces (seen on VMWare volumes)
+
+ used_list = [ l for l in info if " ".join(l[6:]).replace('\\','/') == item ]
+
+ if len(used_list) == 0:
+ return (3, "UNKNOWN - %s missing or not a partition" % item )
+ used = used_list[0] # might be listed twice. We take the first occurance
+
+ # In some rare cases the item may contain a space (happened on ESX).
+ if len(used) > 7:
+ used = used[0:6] + [ " ".join(used[6:]) ]
+
+ if len(used) != 7 or used[5][-1] != '%':
+ return (3, "UNKNOWN - Invalid output from agent (%s)" % (' '.join(used),))
+
+ bytes_total = saveint(used[2]) * 1024
+ bytes_used = bytes_total - (saveint(used[4]) * 1024)
+
+ # Get trend within the given timerange. When there are no information
+ # for the whole range get the oldest available data
+ try:
+ timedif, trend = get_trend("df.trend.%s" % item, time.time(), saveint(bytes_used), params[0])
+ except MKCounterWrapped, e:
+ return (3, "UNKNOWN - Initialized value -> Skipping check result")
+
+ # Check thresholds: percentage trend
+ perc_used = 100 * (bytes_used / float(bytes_total))
+ perc_trend = 100 * (trend / float(bytes_total))
+ status = 0
+ status_txt = ''
+ if params[2] and perc_trend > params[2]:
+ status = 2
+ status_txt = ' (grew more than %s%%)' % params[2]
+ elif params[1] and perc_trend > params[1]:
+ status = 1
+ status_txt = ' (grew more than %s%%)' % params[1]
+
+ # Format the timedif
+ (hours, seconds) = divmod(timedif, 3600)
+ (minutes, seconds) = divmod(seconds, 60)
+ timedif_txt = '%02d:%02d:%02d' % (int(hours), int(minutes), int(seconds))
+
+ perfdata = [
+ ( 'trend_mb', '%.2fMB' % (trend / 1024 / 1024), '', '', 0, '%.2f' % (bytes_total / 1024 / 1024)),
+ ( 'trend_perc', '%.2f%%' % perc_trend, params[1], params[2], 0, 100 ),
+ ]
+
+ output = "%s trend is %s (%.2f%%) for last %s%s" % \
+ (item, get_bytes_human_readable(trend), perc_trend, timedif_txt, status_txt)
+
+ # Calculate the time left, assuming linear growing
+ sec_left = -1
+ if trend > 0:
+ bytes_per_sec = trend / timedif
+ sec_left = bytes_total / bytes_per_sec
+
+ if params[4] and params[4] * 86400 > sec_left:
+ status = 2
+ status_txt = ' (CRIT: less than %d days left)' % params[4]
+ elif params[3] and params[3] * 86400 > sec_left:
+ if status < 1:
+ status = 1
+ status_txt = ' (WARN: less than %d days left)' % params[3]
+
+ # Fortmat the output
+ (days, seconds) = divmod(sec_left, 86400)
+ (hours, seconds) = divmod(seconds, 3600)
+ (minutes, seconds) = divmod(seconds, 60)
+ output += ' - %d days, %d hours, %d minutes, %d seconds left%s' % \
+ (int(days), int(hours), int(minutes), int(seconds), status_txt)
+
+ perfdata += [('sec_left', sec_left, params[3], params[4])]
+
+ return (status, '%s - %s' % (nagios_state_names[status], output), perfdata)
+
+check_info['df.trend'] = (check_df_trend, "Disk Usage Trend %s", 1, inventory_df_trend)
Module: check_mk
Branch: master
Commit: 01fa19b637aacac35301111d78a103d736bbe529
URL: http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=01fa19b637aaca…
Author: Mathias Kettner <mk(a)mathias-kettner.de>
Date: Wed Jan 12 08:10:46 2011 +0100
Updated bug entries
---
.bugs/95 | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/.bugs/95 b/.bugs/95
new file mode 100644
index 0000000..3c436fa
--- /dev/null
+++ b/.bugs/95
@@ -0,0 +1,12 @@
+Title: Improve output of relative timestamps (since, ago, in)
+Component: multisite
+Benefit: 1
+State: open
+Cost: 1
+Date: 2011-01-12 08:09:36
+Class: feature
+
+Sometimes it is not clear whether the time is in past or future,
+for example on "last check" and "next scheduled check". The latter
+one can happen to be in past. We might need to add 'in', 'since'
+or 'ago'
Module: check_mk
Branch: master
Commit: 5aed77a5dfd50bd120c5cbd641f4ae24207a4d01
URL: http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=5aed77a5dfd50b…
Author: Mathias Kettner <mk(a)mathias-kettner.de>
Date: Wed Jan 12 08:03:24 2011 +0100
Updated bug entries
---
.bugs/94 | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/.bugs/94 b/.bugs/94
new file mode 100644
index 0000000..eb8aa6c
--- /dev/null
+++ b/.bugs/94
@@ -0,0 +1,19 @@
+Title: View-Filter: der 1:1 Livestatus-Filter erzeugt
+Component: multisite
+Benefit: 3
+State: open
+Cost: 2
+Date: 2011-01-12 07:59:58
+Class: feature
+
+Damit man beliebig komplexe Filterausdrücke machen kann, wäre es toll, wenn es
+einen allgemeinen Filter gäbe, mit dem man beliebige Livestatus-Filter setzen
+kann. Also einfach eine Textarea, in die man direkt Filter:, And: und Or: eingeben
+kann. Aus Sicherheitsgründen muss man hier allerdings etwas parsen. Zum
+einen dürfen Zeilen nur mit Filter:, And: und Or: beginnen. Zum anderen sollte
+man noch soweit möglich die Syntax von Livestatus abfangen. Oder man könnte
+mit Hilfe einer Testanfrage prüfen, ob der Filter syntaktisch in Ordnung ist.
+Blöd ist halt, wenn im Nagioslog Fehlermeldungen auftauchen oder wenn eine
+persistente Livestatusverbindung kaputtgeht. Das Parsen sollte also so sauber
+wie möglich gehen. Der Test auf das Vorhandensein von bestimmten Spalten wäre
+auch möglich, aber natürlich aufwändiger (z.B. über die Tabelle columns).