ID: 8279
Title: Do not delete but postpone notification while host/service is out of it notification period
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i3
Previously notifications would be totally dropped if the object in question
was out of its notification period. Note: a contact who is out of notification
period will still not get the notification later. Also when a timeperiod is
being used as a condition in a notification rule then the notification will
not be postponed but the rule will simply fail to match.
ID: 8278
Title: RPM packages created on SLES11SP3 should now work on CentOS 6
Component: agents
Level: 1
Class: Bug fix
Version: 1.2.7i2
When you create RPM packages with the agent bakery on a more modern RPM based
system for an older RPM based system then the compression algorithm in the
created RPMs would not always by compatible with older RPM versions. This
has been fixed by forcing the compression to be Gzip.
ID: 8277
Title: mk_logwatch: Agent bakery for Linux now allows shipping just plugin without logwatch.cfg
Component: Checks & agents
Level: 1
Class: New feature
Version: 1.2.7i2
ID: 8275
Title: Alert Handlers - execute automatic actions upon state changes of hosts and services
Component: cmc
Level: 3
Class: New feature
Version: 1.2.7i2
Check_MK now supports automatic actions (scripts) to be executed upon the
state change of a host or service. This is similar to Nagios "Event Handlers"
but has a much more flexible configuration and other advantages.
At the beginning there is a state change of a host or service. It does not
matter whether this change is "soft" - because the maximum number of check
attempts has not been reached. It simply matters that the state has changed
from one of OK/WARN/CRIT/UNKNOWN to another.
Whenever this happens a new global rule chain of <i>Alert Handler Rules</i>
is being processed. Each rule that matches calls an external script of
your choice. Most times people want to restart services, trigger garbage
collections of Java machines or do similar stuff.
Please note that some folks insist that monitoring should not try to repair
things or by any other means actively <i>change</i> things. Whether you share
this opinion or not is your own decision. Alert handlers do not limit you
in what you exactly do with them. But you have been warned.
When you compare alert handlers with the Rule Based Notifications (RBN) then
here are some important differences:
LI:Notifications are being suppressed during downtimes, outside of the notification period, when the host is down and other situations. Alerts are never suppressed.
LI:Notifications cannot be triggered at soft state changes.
LI:Alert handlers only work with the Check_MK Micro Core. If you need Nagios or Icinga please use the Event Handlers of those cores.
LI:Alert handling rules do not allow cancelling. All matching rules are being executed.
LI:As long as no alert rule is defined the alert handling mechanism is deactivated in the core.
Note: this is just the first implementation of the Alert Handlers. Next steps
will the introduction of error tracking, notifications tied to alert actions,
even more flexible conditions, a system for secure remote execution and much
more. Stay tuned!
H2:Setting up Alert handlers
For setting up alert handlers you first need to create a script that should
be called. This can be written in any programming language - most people
will use a simple BASH script. It must be executable and be installed in
<tt>~/local/share/check_mk/alert_handlers</tt> and made executable.
The script is provided with all information about the alert with environment
variable that being with <tt>ALERT_</tt> - very similar to a notification
script. A good start for testing is the following script:
F+:/local/share/check_mk/alert_handlers/foo
#!/bin/bash
env | grep ALERT_ | sort > /tmp/alert.out
F-:
This will dump all the variable of the alert into the file <tt>/tmp/alert.out</tt>.
When specifying this script in the alert handler rule - simply write <tt>foo</tt>
here.
Useful for debugging is to set <i>Alert handler log level</i> to <i>Full dump
of all variables</i> and <i>Logging of the alert processing</i> to <i>on</i>
in the global settings. You will find information it <tt>~/var/log/cmc.log</tt>
and <tt>~/var/log/alerts.log</tt>.
ID: 8274
Title: Fix time range in "Export as PDF" from availability pages
Component: Reporting & Availability
Level: 2
Class: Bug fix
Version: 1.2.7i2
Instead of the time range that is selected in the availability options the
time range from the sidebar snapin would be used. This is fixed.
ID: 8273
Title: Recurring Scheduled Downtimes - adhoc via command and also via rule set
Component: cmc
Level: 3
Class: New feature
Version: 1.2.7i2
Check_MK now supports <i>recurring scheduled downtimes</i> (or simply
<i>recurring downtimes</i>). Let's assume that you have a couple of servers
that are rebooted once a week always at the same time. You surely do not
want any notifications about that, you also do not want to have these hosts
displayed as "problems" in the problem views.
Up to now the only useful tool in such situations was the notification
period. But that way you can just suppress notifications - not the problem
display. Also setting up notification periods requires configuration
permissions (WATO).
The new recurring downtimes create normal scheduled downtimes for you on
a regular base. This is a direct enhancement of the downtimes that already
exist. They now have a new field where you can specify an interval in which
the downtime should be repeated. Everything is then handled by the monitoring
core (CMC) and no external cron job is involved. This has a few advantages:
LI:No cron job is needed
LI:The recurring downtimes are visible in the <i>Downtimes</i> view - even if they are currently not active
LI:Recurring downtimes can be set and removed by using the existing downtime commands
You can create recurring downtimes in two ways:
H2:Using commands
The easiest way is to just use the same commands as for creating one-time
downtimes. The command box now has a new option <i>Repeat this downtime
on a regular base every ____</i>. You can choose between <i>hourly</i>,
<i>daily</i> and <i>weekly</i>. If you e.g. create a downtime for 12:34 on
Monday and select <i>weekly</i> then this downtime will be repeated on every
Monday from now on. Changes in the daylight saving time are compensated,
so the time of day (12:34) will be valid in and out of DST.
Such downtimes behave exactly like one-time downtimes - with the single
difference that they are not being deleted when they end but shifted to the
next interval instead.
U2:Using rule sets
There is a second way to create recurring downtimes: two new WATO rule sets
called <i>Recurring downtimes for hosts</i> and <i>Recurring downtimes for
services</i>. Using these you can base the downtimes on WATO folders and host
tags. While this could also by done by selecting host and services via the GUI
and applying commands - it still has one advantage: You can specify downtimes
for objects that <i>still do not exist</i>. If you create a recurring downtime
for all servers with the tag <i>Windows</i> then also Windows hosts that
will be added at a later time will automatically get that recurring downtime.
Recurring downtimes that have been created via a rule can not be deleted by
the operator via a command, of course. All downtime views have a new column
<i>Origin</i> that shows you wether a downtime exists due to a <i>command</i>
or due to <i>configuration</i>.
ID: 8269
Title: Improved error handling of edge cases in SNMP bulk walk processing
Component: inline-snmp
Level: 1
Class: Bug fix
Version: 1.2.7i3
In some cases SNMP based check could cause exceptions like "'NoneType' object has no attribute 'startswith'".
which might be caused by an invalid/incomplete SNMP answer provided by the used NetSNMP libraries.
Check_MK is now handling this result a little more robust by dropping the single result and keeping
the other ones.
ID: 8270
Title: Fixed error handling case during service discovery leading to SNMP timeouts
Component: inline-snmp
Level: 1
Class: Bug fix
Version: 1.2.7i3
The service discovery results in timeouts on some devices which were caused in
different behaviour of Inline-SNMP compared to the classic SNMP implementation.
ID: 8271
Title: mknotifyd: various optimizations for avoiding duplicate notifications
Component: Notifications
Level: 1
Class: Bug fix
Version: 1.2.7i2
The notification spooler has been changed in a few places in order to avoid
duplicate notifiations in situations with bad network connections:
LI:The time a connection has needed to build up is being output in the check
LI:When no connection is establish, logging about spool files is more quiet
LI:Heartbeat checking now considers <i>any</i> received data as heartbeat
LI:Heartbeat checking now accounts for the internal computation time
LI:Logging about duplicate acknowledgements has been repaired.
The latter two topics try to avoid heartbeat alarms in situations where a
connection is still valid.