ID: 8070
Title: Fix notification type for the end of downtimes
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i6
When a downtime ends the CMC used to set the NOTIFICATION_TYPE to
<tt>DOWNTIMESTOPPED</tt>, while Nagios uses <tt>DOWNTIMEEND</tt>. This
leads to an invalid notification classification in the rule based
notification, so that instead of the type <i>Downtime Start or End</i>
a normal altert type was being assumed. That has been fixed now.
ID: 8065
Title: Savely direct any verbose or diagnostic output from check_mk or checks to cmc.log
Component: cmc
Level: 2
Class: New feature
Version: 1.2.6b1
If any check outputs something to stdout or stderr (which it should not do in
production mode) or if a function in the check_mk code (such as inline SNMP)
outputs data to stdout or stderr, then this will now savely be logged to the
CMC log with the classification "warning". This will allow easier debugging
and error handling. Also the logfile <tt>cmc-helper.log</tt> will no longer
be created. Messages go into <tt>cmc.log</tt> instead.
ID: 8055
Title: Change authorization method for host/service groups to loose
Component: config
Level: 2
Class: incomp
Version: 1.2.5i5
Up to now the CMC adopted the behaviour of Nagios when it came to the
authorization of seeing host and service groups. Nagios lets a user see a
host group only if he is contact for <b>every</b> host in that group. That
leads to anomalies, however. Because in the details of a host you can see
the group nevertheless and some of the views kind of display a host group,
just by printing out the host plus the group it is contained in.
Both in normal Livestatus and with the CMC you can change the behaviour.
In CMC this is done with <i>Authorization settings</i> in the global
settings.
<b>Note:</b> the new default setting is now <i>loose</i>. If you want
back the previous behaviour, please change it back in the global
settings.
ID: 8057
Title: Avoid SmartPING timeouts in situation of high disk I/O
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i5
In situations where the operating system in in high disk I/O and
many processes run into disk wait, the micro core seemed sometimes
to register PING timeouts. That was simply due to the fact that
the core itself was delayed and thought the PING packages didn't
arrive in time.
This has now been fixed, by assuming that when checking the
PING timeout taking into account the time since the last check:
CMC assumes now that all PING must have arrived directly at
the beginning of the current interval, not at the end.
ID: 8053
Title: Fix sporadic invalid OK status for active checks
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i4
In some very rare cases the CMC misinterpreted the check result status of active
checks as OK, even if the status would be WARN, CRIT or UNKNOWN. This was due
to an invalid byte offset. This has been fixed.
ID: 8054
Title: Reschedule checks if next check would be too far in future after config change
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i5
If due to a configuration change (in the check period or in the check interval) the
next scheduled check of a host or service would be too far in the future, then it
is now being rescheduled to be at correctly expected time.
ID: 8052
Title: Speedup availability queries by new caching (disabled per default)
Component: Livestatus
Level: 3
Class: New feature
Version: 1.2.5i4
The Check_MK Micro Core now has an alternative implementation of the
Livestatus table <tt>statehist</tt>. This table is the basis for all
availability computations. In the current implementation, which is still
the only when using the Nagios core, for each query all historic logfiles
that cover the query range have to be evaluated. Despite caching this can
mean an intense effort in CPU and IO usage. If you have a larger number of
hosts and services then a query for a larger time frame could last for minutes.
The new implementation needs to be enabled in the global settings
for the Check_MK Micro Core: <i>In-memory cache for availability data
(experimental)</i>. You also have to configure a time range. This limits how
long into the past you can do availability queries. The default setting is
two years.
During the start of The Core all historic log files for that time ranged are
parsed into a very efficient in-memory database so that future availability
queries do not need any disk IO or logfile parsing. The cache is automatically
updated when new alerts happen. Please also note that The Core is not
restarted during normal operation and activation of changes, so the cache
is just invalidated when you reboot your server or do a software update
of Check_MK.
The parser can process 500.000 messages per second and more, so if your disk
IO is fast enough even parsing a large history does not take longer than
a couple of minutes. This is done in the background and does not prevent
The Core from working or queries from being answered. Even availability
queries are being answered while the cache is still being built up. If the
queried time range is already in the cache then the query can immediately
be processed. Otherwise it waits for the cache to be ready.
When it comes to timeperiod definitions the new implementation has a
different behaviour: It reflects later changes in the definitions of your
timeperiods. This is conveniant when you want to work with service periods
for your availability queries. The classical implementation evaluates the
<tt>TIMEPERIOD TRANSITION</tt> entries in your logfiles. The new one directly
takes the current definitions into account and computes them for the time
range in the past.
<b>Note:</b> As of today this implemention is still highly <i>experimental</i>
and might not only produce wrong results, but might crash your core.
ID: 8050
Title: Use the same problem id through all notifications of a problem
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i3
When a host or services goes from OK into a hard non-OK state, then
a new problem ID is generated. When the state now changes between several
different non-OK states and goes finally back to OK, then the same
problem ID is being reused in all notifications. That way a matching of
notifications in an external system is possible. Previously each state
change created a new problem ID.
ID: 8051
Title: Smart PING now uses same layout as vanilla PING
Component: cmc
Level: 2
Class: New feature
Version: 1.2.5i3
The Smart PING of the Micro Core now per defaults creates exactly the same type of
packets that a plain command line <tt>ping</tt> does. This has the disadvantage of
creating larger packets then neccessary - but has the advantage of being compatible
with more firewalls. Some of those tend to regard ICMP ECHO REQUEST packets without
payload as some bogus attack and drop them.
You can reenable PING packets without payout via a new global option for
<tt>main.mk</tt>:
F+:main.mk
cmc_smartping_omit_payload = True
F-:
This is also available via WATO in the global settings of the Micro Core.