Bugfix for checkpoints in the future
Message-ID: <552d17d2.5dAByGC3VNxJq6Zz%mk(a)mathias-kettner.de>
User-Agent: Heirloom mailx 12.5 6/20/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Module: check_mk
Branch: master
Commit: a2a035b3283099b51cb9fdea324b41d5d7950782
URL:
http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=a2a035b3283099…
Author: Mathias Kettner <mk(a)mathias-kettner.de>
Date: Tue Apr 14 15:36:13 2015 +0200
#1905 FIX: oracle_recovery_status: Bugfix for checkpoints in the future
---
.werks/1905 | 13 +++++++++++++
ChangeLog | 1 +
checkman/oracle_recovery_status | 5 ++++-
checks/oracle_recovery_status | 16 +++++++++++++---
4 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/.werks/1905 b/.werks/1905
new file mode 100644
index 0000000..51ac4ae
--- /dev/null
+++ b/.werks/1905
@@ -0,0 +1,13 @@
+Title: oracle_recovery_status: Bugfix for checkpoints in the future
+Level: 1
+Component: checks
+Compatible: compat
+Version: 1.2.7i1
+Date: 1427882680
+Class: fix
+
+The check results in CRITICAL state when the last checkpoint time is in
+the futute. This is possible after daylight changes or after a change of
+time on the database server after the start of the instance.
+This has been corrected. The check leads to a WARNING state when the last
+checkpoint is in the future and the time is displayed as a result.
diff --git a/ChangeLog b/ChangeLog
index 9b471eb..6fe6419 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -315,6 +315,7 @@
* 2110 FIX: netapp_api_aggr: check did not take configured levels when using Nagios
* 1954 FIX: fileinfo / fileinfo.groups: Fixed discovery function for fileinfo groups
and equalize agent output of fileinfo agents...
* 1904 FIX: mk_oracle: added processes check to ASM...
+ * 1905 FIX: oracle_recovery_status: Bugfix for checkpoints in the future...
* 2111 FIX: hitachi_hnas_volume: fix for cases when size information of volumes is
not available
* 2190 FIX: jolokia_metrics.gc: Fixed exception in check if no warn/crit levels are
defined
* 2192 FIX: check_notify_count": Fix exception in PNP template in case of explit
email addresses...
diff --git a/checkman/oracle_recovery_status b/checkman/oracle_recovery_status
index 5f42a03..bc2ffbc 100644
--- a/checkman/oracle_recovery_status
+++ b/checkman/oracle_recovery_status
@@ -10,7 +10,7 @@ description:
of an archived redolog will move the checkpoint of the database. This
is a usual way to monitor the apply of archived redologs on a Physical
Standby Database without Data Guard. This is normaly needed for
- Standard Edition Databases. There is an imporatnt difference between
+ Standard Edition Databases. There is an important difference between
Primary and Standby Database. Primary will only create WARNINGs, when
the level of levels is exceeded. A value for CRITICAL is only usable
when the Database is in Physical Standby Mode.
@@ -25,6 +25,9 @@ description:
monitored. The rule describes the maximum time for an active backup
mode for a datafile until a WARN or CRIT is generated.
+ If the last checkpoint time is in the future the check generates a WARN
+ and is displaying an information to check the time on the host system.
+
{Important Information}
There is no checkpoint created if a database is in backup mode. It
is expected, that the checkpoint time could also reach a configured
diff --git a/checks/oracle_recovery_status b/checks/oracle_recovery_status
index e98e34f..b1482e2 100644
--- a/checks/oracle_recovery_status
+++ b/checks/oracle_recovery_status
@@ -41,7 +41,7 @@ def inventory_oracle_recovery_status(info):
def check_oracle_recovery_status(item, params, info):
state = 0
offlinecount = 0
- oldest_checkpoint_age = -1
+ oldest_checkpoint_age = None
oldest_backup_age = -1
backup_count = 0
@@ -75,16 +75,26 @@ def check_oracle_recovery_status(item, params, info):
if datafilestatus == 'ONLINE':
checkpoint_age = int(checkpoint_age)
- oldest_checkpoint_age = max(oldest_checkpoint_age, checkpoint_age)
+ if oldest_checkpoint_age is None:
+ oldest_checkpoint_age = checkpoint_age
+ else:
+ oldest_checkpoint_age = max(oldest_checkpoint_age, checkpoint_age)
else:
offlinecount += 1
if itemfound == True:
infotext = "%s database" % (database_role.lower())
- if oldest_checkpoint_age == -1:
+ if oldest_checkpoint_age is None:
infotext += ", no online datafiles found(!!)"
state = 2
+
+ elif oldest_checkpoint_age <= -1:
+ # we found a negative time for last checkpoint
+ infotext += ", oldest checkpoint is in the future %s(!), check the time
on the server" \
+ % get_age_human_readable(int(oldest_checkpoint_age)*-1)
+ state = 1
+
else:
infotext += ", oldest Checkpoint %s ago" \
% (get_age_human_readable(int(oldest_checkpoint_age)))