ID: 8076
Title: New option cmk --rrd-convert for convertig existing RRDs
Component: config
Level: 2
Class: New feature
Version: 1.2.5i6
This new commanline option for the <tt>cmk</tt> tool will change exising
RRD databases to match the configuration that is done via the WATO rulesets
<i>Configuration of RRD databases of hosts</i> and <i>Configuration of RRD
databases of services</i>. Otherwise changes in these rules applied only on
new RRDs.
You can restrict the conversion to one or several hosts:
C+:
OM:cmk -v --convert-rrds myhost1 myhost2
myhost1:
HOST
- rta....uptodate
- pl....uptodate
- rtmax....uptodate
- rtmin....uptodate
Postfix Queue
- length....uptodate
- size....uptodate
CPU utilization
- user....converted, 376 KB -> 40 KB
- system....converted, 376 KB -> 40 KB
- wait....converted, 376 KB -> 40 KB
[...]
C-:
If you do not specify any hostname, then <b>all</b> RRDs will
be converted.
<b>Note:</b> this new option uses a completely new feature of the RRDTool,
which has been sponsored by Mathias Kettner: RRDTool can now change the
internal structure of RRDs on-the-fly. That way it is now for example possible
to change the range of time or precision that data is being kept.
<b>Note 2:</b> this feature uses an <b>experimental</b> version of
RRDTool. Please make a backup of your RRDs before trying this out.
ID: 8077
Title: New option --split-rrds for --rrd-convert, converts PNP storage type
Component: config
Level: 2
Class: New feature
Version: 1.2.5i6
Check_MK has now a new builtin function for converting legacy-style PNP RRDs
from storage type <tt>SINGLE</tt>, which had been the default for many years,
to <tt>MULTIPLE</tt>, which is the current default since about three years.
<tt>SINGLE</tt> means that all metrics of one host or service are stored
in a single round robin database, whereas with <tt>MULTIPLE</tt> each RRD
does contain only one single datasource.
Performance tests have revealed that - other then one might guess -
<tt>MULTIPLE</tt> is not significantly slower. But it has the advantage
that new datasources can be added on the fly. This is often needed when new
versions of Check_MK introduce new metrics. For that reason Check_MK only
fully supports storage type <tt>MULTIPLE</tt>. When using the Check_MK
Micro Core then you <b>have</b> to convert to <tt>MULTIPLE</tt>, if you do
not want to loose your historic metrics, because the CMC does not support
<tt>SINGLE</tt> at all.
Converting RRDs - essentially splitting them up - can be done with PNP4Nagios'
shipped utility <tt>lib/pnp4nagios/rrd_convert.pl</tt>, but that is a bit
clumsy to use and very slow. If you have thousands of hosts the conversion
can take many days.
For that reason Check_MK now can do the splitting into multiple RRDs during the
process of the RRD conversion. This is not only simpler for you. It is also
much faster because it uses the native C code of the recent RRDTool. This
is how to do the conversion. We assume that you are using Nagios as your
monitoring core:
<ol>
<li>Make a backup of the directory <tt>var/pnp4nagios/perfdata</tt>.</li>
<li>Stop the <tt>npcd</tt>. This avoids RRD updates while the conversion is in progress:
C+:
OM:omd stop npcd
C-:
</li>
<li>Start the conversion. The option <tt>-v</tt> selects verbose output:
C+:
OM:cmk -v --convert-rrds --split-rrds
C-:
</li>
<li>Edit the file <tt>etc/pnp4nagios/process_perfdata.cfg</tt> and
change the storage type:
F+:etc/pnp4nagios/process_perfdata.cfg
RRD_STORAGE_TYPE = MULTIPLE
F-:
</li>
<li>Start the <tt>npcd</tt> again:
C+:
OM:omd start npcd
C-:
</li>
</ol>
<b>Notes:</b>
<ul>
<li>You can specify host names as arguments to <tt>cmk --convert-rrds</tt>. The conversion
will then only be done for these hosts. But when you start <tt>ncpd</tt> again and only
some of the hosts are being converted, then hosts that do not match the storage type
in <tt>process_perfdata.cfg</tt> will loose their RRD data.</li>
<li>The splitting and the conversion to the RRD configuration that is setup via WATO
with the rulesets <i>Configuration of RRD databases of hosts</i> and
<i>Configuration of RRD databases of services</i> will be done at the same time. There
is no way to just split. But the <i>Default</i> configuration of PNP4Nagios and of
WATO for the RRDs is the same, so if you have changed neither, you essentiall will
just split.</li>
<li>Check_MK will <b>not create any backups of any files!</b> Failed modifications, unlucky RRD
configuration and software bugs can lead to data loss. Make sure you have a backup.</li>
</ul>
ID: 8074
Title: Avoid delaying notifications for WARN/CRIT changes during periodic notifications
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i6
When you use period notifications and have a hard state change e.g. from WARN to CRIT
then the notification for this state change would be delayed until the next periodic
notification.
This has been fixed. Any non-OK state change will now immediately be notified.
Furthermore any first notification delay will <b>not</b> be applied in that case.
ID: 8070
Title: Fix notification type for the end of downtimes
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i6
When a downtime ends the CMC used to set the NOTIFICATION_TYPE to
<tt>DOWNTIMESTOPPED</tt>, while Nagios uses <tt>DOWNTIMEEND</tt>. This
leads to an invalid notification classification in the rule based
notification, so that instead of the type <i>Downtime Start or End</i>
a normal altert type was being assumed. That has been fixed now.
ID: 8065
Title: Savely direct any verbose or diagnostic output from check_mk or checks to cmc.log
Component: cmc
Level: 2
Class: New feature
Version: 1.2.6b1
If any check outputs something to stdout or stderr (which it should not do in
production mode) or if a function in the check_mk code (such as inline SNMP)
outputs data to stdout or stderr, then this will now savely be logged to the
CMC log with the classification "warning". This will allow easier debugging
and error handling. Also the logfile <tt>cmc-helper.log</tt> will no longer
be created. Messages go into <tt>cmc.log</tt> instead.
ID: 8055
Title: Change authorization method for host/service groups to loose
Component: config
Level: 2
Class: incomp
Version: 1.2.5i5
Up to now the CMC adopted the behaviour of Nagios when it came to the
authorization of seeing host and service groups. Nagios lets a user see a
host group only if he is contact for <b>every</b> host in that group. That
leads to anomalies, however. Because in the details of a host you can see
the group nevertheless and some of the views kind of display a host group,
just by printing out the host plus the group it is contained in.
Both in normal Livestatus and with the CMC you can change the behaviour.
In CMC this is done with <i>Authorization settings</i> in the global
settings.
<b>Note:</b> the new default setting is now <i>loose</i>. If you want
back the previous behaviour, please change it back in the global
settings.
ID: 8057
Title: Avoid SmartPING timeouts in situation of high disk I/O
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i5
In situations where the operating system in in high disk I/O and
many processes run into disk wait, the micro core seemed sometimes
to register PING timeouts. That was simply due to the fact that
the core itself was delayed and thought the PING packages didn't
arrive in time.
This has now been fixed, by assuming that when checking the
PING timeout taking into account the time since the last check:
CMC assumes now that all PING must have arrived directly at
the beginning of the current interval, not at the end.
ID: 8053
Title: Fix sporadic invalid OK status for active checks
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i4
In some very rare cases the CMC misinterpreted the check result status of active
checks as OK, even if the status would be WARN, CRIT or UNKNOWN. This was due
to an invalid byte offset. This has been fixed.
ID: 8054
Title: Reschedule checks if next check would be too far in future after config change
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.5i5
If due to a configuration change (in the check period or in the check interval) the
next scheduled check of a host or service would be too far in the future, then it
is now being rescheduled to be at correctly expected time.