ID: 8302
Title: New page for downloading the baked monitoring agent of a host
Component: agents
Level: 2
Class: New feature
Version: 1.2.7i3
There is now a new page for every host where you can download it's Check_MK
agent. This page has one link for every supported operating system (as WATO
cannot know which agent package type you need).
Note: The download of baked monitoring agents is only possible for users
with the Adminitrator role per default. This is because agent may contain
confidential data such as passwords. But if you like you can allow normal
users either downloading agents for hosts where they are a contact for or
even for all agents.
ID: 8301
Title: cmcdump: New tool for offline mirroring satellite sites into a central site
Component: cmc
Level: 3
Class: New feature
Version: 1.2.7i3
The CMC now has a new replication mechanism for mirroring the state
of satellite monitoring sites into a central site. This is much like
<tt>livedump</tt> for the Nagios core but is much more powerful.
In order to setup this you need to call <tt>cmcdump -C > cmc.config</tt>
on the remote site and transfer that file to the central site into
<tt>etc/check_mk/conf.d/yourfile.mk</tt>. This will dump the configuration
of all hosts and services. Afterwards activate the updated configuration
with <tt>cmk -O</tt>. You need to repeat it from time to time so that
your central site is up-to-date.
In a much shorter interval (e.g. once per minute) you call <tt>cmcdump >
cmc.state</tt> on the same remote site. This can easily be done with a cron
job. That file you also transfer to the central site via any mechanism
you like (scp, http, rsync, ...). Over there read it into the core with:
C+:
OM:unixcat < cmc.state tmp/run/live
C-:
This will update the core's complete state of all hosts and services that
are contained in the dump. The transferred state will correctly reflect the
following variables:
LI:The actual state (PEND, OK, WARN, ...)
LI:The plugin output
LI:The long (multiline) plugin output
LI:The performance data
LI:Whether the object is flapping and the current level of flappiness
LI:The time of the last check execution
LI:The time of the last state change
LI:The check execution time
LI:The check latency
LI:The number of the current check attempt
LI:Whether the current state is hard or soft
LI:Whether a problem has been acknowledged
LI:Whether the object is currently in a scheduled downtime
In the central site this will almost - but not entirely - be handled
like a check execution. One difference is that no notifications will be
sent. But performance data is being processed graphs will be created. Also
the monitoring log is being updated and availability data can be processed.
Depending on our synchronization interval of the data transfer the latter
one might not be 100% precise however.
The tool <tt>cmcdump</tt> is in your path and can directly be executed.
Call <tt>cmcdump --help</tt> for details on how to call this tool.
ID: 8299
Title: Fix impossible deletion of you own copy of the default report
Component: Reporting & Availability
Level: 2
Class: Bug fix
Version: 1.2.7i3
The reporting tought that other reports depend on that report, but a copy of
it is still available as builtin report.
ID: 8297
Title: Fixed problem where objects stayed in downtime even if downtimes where removed
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i3
Due to a bug while saving the state of current downtimes it could happen that removing
a downtime would not reset the host/service in the state "not in downtime". The blue
moon icon stayed at the host - even if there was no downtime left. This has been fixed.
The fix can only fix new downtimes automaticall, however. If you still have sticky
blue moons then please set and remove downtimes for those. The moon will then vanish.
ID: 8295
Title: Automatically create backup of state file if state shrinks by more than 5% in size
Component: cmc
Level: 2
Class: New feature
Version: 1.2.7i3
If you do some unlucky misconfiguration (like setting disabled services for
all services instead of just a few) then you will loose the current state and
downtimes of all affected objects - even if you recreate them later again.
That is because from the point of view of the core vanished objects are
expected to be gone forever and their state is being dropped.
In such situations the CMC now automatically creates a copy <tt>state.1</tt>
of you state file before creating a new one in its place. Previous backups
are shifted to <tt>state.2</tt>, <tt>state.3</tt> and so an. Up to 30 backups
are being kept. These backup are <i>never</i> being restored automatically
at any time. If you like to get back to your old state, you can copy these
files manually:
C+:
OM:omd stop cmc
OM:cd var/check_mk/core
OM:mv state state.away
OM:cp state.1 state
OM:omd start cmc
C-:
This backup always happens if the size of the new state file is by at least 5%
smaller then the old one.
Note: The state in this file contains the current OK/WARN/CRIT, the plugin
output, the next scheduled check execution, notification, downtimes, comments
and stuff like that. It does <i>not</i> contain historic performance data
or events. But these do not get lost in the cases mentioned above.
ID: 8296
Title: More precise availability handling of removed and readded objects
Component: Reporting & Availability
Level: 2
Class: New feature
Version: 1.2.7i3
The CMC now better detects objects that have been removed and readded later on to
the monitoring. It does this by logging messages of the type <tt>VANISHED HOST</tt>
or <tt>VANISHED SERVICE</tt> to the monitoring history file and processing these
messages when computing the availability timelines.
ID: 8293
Title: Alert handlers: optionally process every single check execution, inline alert handlers
Component: cmc
Level: 2
Class: New feature
Version: 1.2.7i3
The new alert handlers can now optionally be called at every check
execution. This is a possible way to bring check results into some external
system (like a database). Please note that this can cost many CPU ressources
and slow down the monitoring if the alert handler cannot process the data
fast enough.
In order to implement more performance alert handlers, these can now optionally
be written as Python functions that are being called without creating a new
process. Such an inline alert handler has the following structure:
F+:/omd/sites/mysite/local/share/check_mk/alert_handlers/foo
#!/usr/bin/python
# Inline: yes
# Do some basic initialization (optional)
def handle_init():
log("INIT")
# Called at shutdown (optional)
def handle_shutdown():
log("SHUTDOWN")
# Called at every alert (mandatory)
def handle_alert(context):
log("ALERT: %s" % context)
F-:
You need to define at least <tt>handle_alert</tt>. The argument <tt>context</tt> is a dictionary
with keys like <tt>"HOSTNAME"</tt>. You can use the function <tt>log(...)</tt>, which will write
some diagnostic text into <tt>var/log/alerts.log</tt> - marked with the name of you alert handler.
The two global variables <tt>omd_root</tt> and <tt>omd_site</tt> are set to the home directory
and to the name of the OMD site.
It is allowed to use <tt>import</tt> commands in your handler.
Note that in the second line you need to put <tt># Inline: yes</tt>. That way Check_MK nows
that the alert handler should be loaded as inline Python function and not run as a script.
Note also that after each change to an inline alert handler you need to restart the
CMC:
C+:
OM:omd restart cmc
C-:
ID: 8294
Title: Errors in WATO configuration do not prevent the core from being restarted anymore
Component: cmc
Level: 2
Class: New feature
Version: 1.2.7i3
When you had an error in your monitoring configuration like duplicate hosts
or services, missing timeperiods, invalid cluster nodes or parents or various
other situations then Check_MK could not create a valid configuration for the core
and thus you could not activate any changes.
This has been changed. All these errors are now only warnings. A working
configuration for the core is now always being created. WATO will display
the warnings in the <i>Activate Changes</i> page.