ID: 8274
Title: Fix time range in "Export as PDF" from availability pages
Component: Reporting & Availability
Level: 2
Class: Bug fix
Version: 1.2.7i2
Instead of the time range that is selected in the availability options the
time range from the sidebar snapin would be used. This is fixed.
ID: 8275
Title: Alert Handlers - execute automatic actions upon state changes of hosts and services
Component: cmc
Level: 3
Class: New feature
Version: 1.2.7i2
Check_MK now supports automatic actions (scripts) to be executed upon the
state change of a host or service. This is similar to Nagios "Event Handlers"
but has a much more flexible configuration and other advantages.
At the beginning there is a state change of a host or service. It does not
matter whether this change is "soft" - because the maximum number of check
attempts has not been reached. It simply matters that the state has changed
from one of OK/WARN/CRIT/UNKNOWN to another.
Whenever this happens a new global rule chain of <i>Alert Handler Rules</i>
is being processed. Each rule that matches calls an external script of
your choice. Most times people want to restart services, trigger garbage
collections of Java machines or do similar stuff.
Please note that some folks insist that monitoring should not try to repair
things or by any other means actively <i>change</i> things. Whether you share
this opinion or not is your own decision. Alert handlers do not limit you
in what you exactly do with them. But you have been warned.
When you compare alert handlers with the Rule Based Notifications (RBN) then
here are some important differences:
LI:Notifications are being suppressed during downtimes, outside of the notification period, when the host is down and other situations. Alerts are never suppressed.
LI:Notifications cannot be triggered at soft state changes.
LI:Alert handlers only work with the Check_MK Micro Core. If you need Nagios or Icinga please use the Event Handlers of those cores.
LI:Alert handling rules do not allow cancelling. All matching rules are being executed.
LI:As long as no alert rule is defined the alert handling mechanism is deactivated in the core.
Note: this is just the first implementation of the Alert Handlers. Next steps
will the introduction of error tracking, notifications tied to alert actions,
even more flexible conditions, a system for secure remote execution and much
more. Stay tuned!
H2:Setting up Alert handlers
For setting up alert handlers you first need to create a script that should
be called. This can be written in any programming language - most people
will use a simple BASH script. It must be executable and be installed in
<tt>~/local/share/check_mk/alert_handlers</tt> and made executable.
The script is provided with all information about the alert with environment
variable that being with <tt>ALERT_</tt> - very similar to a notification
script. A good start for testing is the following script:
F+:/local/share/check_mk/alert_handlers/foo
#!/bin/bash
env | grep ALERT_ | sort > /tmp/alert.out
F-:
This will dump all the variable of the alert into the file <tt>/tmp/alert.out</tt>.
When specifying this script in the alert handler rule - simply write <tt>foo</tt>
here.
Useful for debugging is to set <i>Alert handler log level</i> to <i>Full dump
of all variables</i> and <i>Logging of the alert processing</i> to <i>on</i>
in the global settings. You will find information it <tt>~/var/log/cmc.log</tt>
and <tt>~/var/log/alerts.log</tt>.
ID: 8273
Title: Recurring Scheduled Downtimes - adhoc via command and also via rule set
Component: cmc
Level: 3
Class: New feature
Version: 1.2.7i2
Check_MK now supports <i>recurring scheduled downtimes</i> (or simply
<i>recurring downtimes</i>). Let's assume that you have a couple of servers
that are rebooted once a week always at the same time. You surely do not
want any notifications about that, you also do not want to have these hosts
displayed as "problems" in the problem views.
Up to now the only useful tool in such situations was the notification
period. But that way you can just suppress notifications - not the problem
display. Also setting up notification periods requires configuration
permissions (WATO).
The new recurring downtimes create normal scheduled downtimes for you on
a regular base. This is a direct enhancement of the downtimes that already
exist. They now have a new field where you can specify an interval in which
the downtime should be repeated. Everything is then handled by the monitoring
core (CMC) and no external cron job is involved. This has a few advantages:
LI:No cron job is needed
LI:The recurring downtimes are visible in the <i>Downtimes</i> view - even if they are currently not active
LI:Recurring downtimes can be set and removed by using the existing downtime commands
You can create recurring downtimes in two ways:
H2:Using commands
The easiest way is to just use the same commands as for creating one-time
downtimes. The command box now has a new option <i>Repeat this downtime
on a regular base every ____</i>. You can choose between <i>hourly</i>,
<i>daily</i> and <i>weekly</i>. If you e.g. create a downtime for 12:34 on
Monday and select <i>weekly</i> then this downtime will be repeated on every
Monday from now on. Changes in the daylight saving time are compensated,
so the time of day (12:34) will be valid in and out of DST.
Such downtimes behave exactly like one-time downtimes - with the single
difference that they are not being deleted when they end but shifted to the
next interval instead.
U2:Using rule sets
There is a second way to create recurring downtimes: two new WATO rule sets
called <i>Recurring downtimes for hosts</i> and <i>Recurring downtimes for
services</i>. Using these you can base the downtimes on WATO folders and host
tags. While this could also by done by selecting host and services via the GUI
and applying commands - it still has one advantage: You can specify downtimes
for objects that <i>still do not exist</i>. If you create a recurring downtime
for all servers with the tag <i>Windows</i> then also Windows hosts that
will be added at a later time will automatically get that recurring downtime.
Recurring downtimes that have been created via a rule can not be deleted by
the operator via a command, of course. All downtime views have a new column
<i>Origin</i> that shows you wether a downtime exists due to a <i>command</i>
or due to <i>configuration</i>.
ID: 8266
Title: Added download page for the builtin agents/plugins to agent bakery page
Component: agents
Level: 2
Class: New feature
Version: 1.2.7i3
The default agents and plugins can now be downloaded directly from the agent bakery page.
Use the button "Builtin Agents" to navigate to these files.
ID: 8258
Title: Fixed issue where wrong checks were discovered for SNMP devices
Component: inline-snmp
Level: 2
Class: Bug fix
Version: 1.2.7i3
This issue could lead to wrong services being discovered which then result
in services having UNKNOWN states or crashing checks.
You will need to reinventorize affected hosts after updating.
ID: 8261
Title: Fixed using cached agent data during regular checks
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i3
During runtime of the CMC it might happen that the data gathered
by the Check_MK Helpers is not up-to-date since they were using
cached data during checks. This only happened when having the
"Check_MK Discovery" service activated which regularly checks for
new services to be monitored.
Once the "Check_MK Discovery" services was executed for one host
in a helper process, this problem occured for all further checks
being performed by this helper.
ID: 8255
Title: Agent configs containing a customized "agent path" were not visible in the GUI
Component: agents
Level: 2
Class: Bug fix
Version: 1.2.7i3
None of the baked agents which used the rule "Installation paths for agent files (Linux, UNIX)"
were baked correctly and available on the filesystem, but not shown in the GUI.
ID: 8257
Title: Fixed crashing of core when rendering a lot of performance graphs in a short time
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i3
The core could crash when a single user or multiple users were requesting a lot
of performance graphs in a short time. This was caused by a non thread safe
code in the used rrdtool library which made the core crash.
ID: 8243
Title: Handle and display information about cached agent data in GUI
Component: cmc
Level: 2
Class: New feature
Version: 1.2.7i1
Check_MK now keeps track of the original age of information sent by the
monitoring agents. This is useful if certain agent sections are not generated
anew each time the agent is called but that are cached either locally on the
agent or on the Check_MK server (using the <tt>persist</tt> section option).
The sections in question need to declare the age
and lifetime of the cached data in the section header,
e.g. <tt><<<foobar:cached(1431431239:86400)>>></tt>.
Check_MK parses this information and hands it over to the Check_MK Micro Core.
The core uses that for a precise computation of the staleness and also makes
it available for the Multisite GUI.
The GUI in turn uses the point of time where the cached information has been
created as the time of the <i>last check</i> instead of the last execution of
Check_MK. In the details of a service you can see both the age and lifetime
of the cached data.
When using Nagios or another core the new information from the agents
is simply being ignored an nothing changes.
ID: 8222
Title: You can now have the timeline bar and/or the detailed timeline rendered in a report
Component: Reporting & Availability
Level: 2
Class: New feature
Version: 1.2.7i1
The report element <i>Availability table</i> now allows to also show the graphical
timeline bar and/or the detailed list of time spans (events) that make up to the
availability of the object in question. If the table contains more than one
object (host, service), then the timeline bars and details are repeated for every
single object.