ID: 12908
Title: Add predefined cluster modes for all services
Component: Checks & agents
Level: 1
Class: New feature
Version: 2.1.0i1
This werk changes the behaviour of (some) services on clusters.
Affected services will go to UNKNOWN. To fix this, users must
explicitly select the cluster mode they whish to use.
This can be done using the ruleset
<i>"Aggregation options for clustered services"</i>.
All services written against the old API are affected, and few
of the modern plugins (see below for a list of the latter).
Since this is the second time the behaviour of clustered services
changes (1.6 to 2.0 and 2.0 to 2.1), we provide an overview.
Note that we must consider four types of plugins here:
Plugins programmed against the old API (refered to as 'legacy')
or the new API ('modern'), and plugins developed with their
behaviour on clusters explicitly considered ('native cluster mode')
or not.
<b>In Checkmk 1.6 and earlier</b> all plugins (now legacy) can be
configured to run on a cluster. For plugins without a native
cluster implementation, the behaviour is unspecified.
They simply operate on the concatenated output of all nodes,
which may or may not result in the desired behaviour (or even crash).
<b>In Checkmk 2.0</b> the behaviour for legacy plugins is unchanged.
Modern plugins can only be run on a cluster, if they natively
implement a cluster mode.
Otherwise the service will be in a permanent WARNING state,
telling the user to change their configuration.
<b>In Checkmk 2.1</b> even legacy plugins are no longer run
on clusters in this unspecified manner.
By default, all services on are cluster are run in the native
mode (or issueing a WARNING if it does not exist).
If the plugin in question does not support a native cluster
mode, you can use the ruleset
<i>"Aggregation options for clustered services"</i> to
select one of three other aggregation modes (<i>"failover"</i>,
<i>"worst"</i>, <i>"best"</i>), where the results of the
individual nodes are aggregated in a predetermined way.
For a description of the available modes please refer to the
mentioned rulesets help.
The native cluster mode should be documented in the plugins
man page.
As a result some of the native implementations have been removed,
as they re-implemented one of the other aggregation modes (only
with fewer options).
These are the affected plugins and the cluster modes they
were implicitly operating on:
<ul>
<li>apache_status: <i>failover</i></li>
<li>cmk_site_statistics: <i>failover</i></li>
<li>f5_bigip_vcmpguests: <i>worst</i></li>
<li>infoblox_node_services: <i>best</i></li>
<li>infoblox_services: <i>best</i></li>
<li>job, local: The cluster behaviour has been configurable
for these plugins. Choose according to your previous
configuration. The options in the plugin specific rulesets
are ignored.</li>
<li>livestatus_status: <i>failover</i></li>
<li>mssql_counters_cache_hits, mssql_counters_file_sizes,
mssql_counters_locks, mssql_counters_locks_per_batch,
mssql_counters_page_life_expectancy,
mssql_counters_pageactivity, mssql_counters_sqlstats,
mssql_counters_transactions: <i>worst</i></li>
<li>mssql_datafiles, mssql_transactionlogs: <i>failover</i></li>
<li>netscaler_sslcertificates: <i>worst</i></li>
<li>sap_hana_diskusage, sap_hana_ess, 'sap_hana_events,
sap_hana_instance_status, sap_hana_memrate,
sap_hana_proc, sap_hana_replication_status: <i>best</i></li>
</ul>
<b>Note to developers:</b> Every plugin you develop will have the
predefined cluster modes ready to use -- there is nothing you have to
do. If none of the three modes <i>failover</i>, <i>worst</i> or
<i>best</i> suit your needs, you can implement your own <i>native</i>
mode using the <tt>cluster_check_function</tt>. Please refer to the
sphinx documentation for details.
ID: 12996
Title: Performance graphs always use default color styles
Component: Multisite
Level: 1
Class: Bug fix
Version: 2.1.0i1
Before checkmk 2.0.0 it was possible for users to configure the background,
foreground and canvas color of the performance graphs. This was ralery useful
and with the posiblility to switch between theme having a fixed set of style
custom set colors was unpleasing on theme changes.
Although the option to set these colors was removed the web renderer could still
read them out of the config. This was particularly anoying on modified service
views, which contain performance graphs and which had saved their default pre
2.0 color default, which don't match the new color themes in version 2.0.0
This Werk enforces checkmk's color themes for all Performance graphs and
discards any previous custom setup.
ID: 13118
Title: <tt>sentry_pdu, raritan_pdu_plugs</tt>: Fix parameter handling
Component: Checks & agents
Level: 1
Class: Bug fix
Version: 2.1.0i1
The check plugins <tt>sentry_pdu</tt> and <tt>raritan_pdu_plugs</tt> both
monitor the status of PDUs. In the discovery phase, both plugins store the
current states of the discovered PDUs. The subsequently computed check
results are based on a comparison of the discovered states with the current
states of the PDUs. Both plugins did not handle unknown states properly.
Furthermore, users have the option to explicitly configure target states for
the PDUs. In this case, the user-configured states are preferred over the
discovered states. This mechanism was broken for <tt>sentry_pdu</tt>.
Finally, when updating Checkmk from 1.6 to 2.0 and also during updates within
2.0, the states discovered by the plugin <tt>sentry_pdu</tt> were lost.
To fix these issues, users have to re-discover the corresponding services
"Plug ...". Note that for <tt>raritan_pdu_plugs</tt>, depending on the
current configuration, a re-discovery might not be necessary, but we
recommend it preemptively.
ID: 12822
Title: AWS Lambda performance: added metrics for cold starts and init duration
Component: Checks & agents
Level: 1
Class: New feature
Version: 2.1.0i1
The following additional metrics will be monitored by the AWS Lambda Performance check:
* init duration
* number of cold starts
ID: 13097
Title: Make host search more flexible on parent search
Component: Multisite
Level: 1
Class: Bug fix
Version: 2.1.0i1
If you searched hosts with the "Parents" pattern, only search results were
found, if all host parents have been entered.
>From now on, one parent entry is enough to find related hosts.
ID: 13136
Title: SLA View: Displaced timelines
Component: Multisite
Level: 1
Class: Bug fix
Version: 2.1.0i1
The timelines in SLA views were displaced.
First of all the Availability timeline was shifted to the right
and all other timelines would not line up with the result row
ID: 12851
Title: Folder/Host permissions: Users could add groups but not remove them afterwards
Component: Setup
Level: 1
Class: Bug fix
Version: 2.1.0i1
Users with the "User" role which are able to manage a folder in "Hosts &
folders" can add and remove permissions of contact groups on a folder. These
users should only be able to add / remove contact groups they are a member of.
However, in previous releases it was possible to add one group to the permitted
groups of a folder he is not a member of. But when trying to remove the group,
this was denied because the user is not a member of this group.
The logic has now been changed to provide the user a consistent behaviour: In
the moment a user tries to add OR remove a contact group, it is verified that
the user is a member of that group. The user can now really only add OR remove
groups he is a member of.
Groups which are permitted on this folder and are not modified by the user are
not relevant in this situation.
ID: 11814
Title: smart check: Evaluate command timeouts as counter
Component: Checks & agents
Level: 1
Class: Bug fix
Version: 2.1.0i1
Previously, the metric "Command_Timeout" of the smart check was evaluated in the following manner:
If the counter deviated from the discovered value, the check went {CRIT}.
This led to a lot of false posivitves, as the counter may also increase after a simple reboot, which is no faulty situation.
The rate however gives a better evaluation of the state.
As from this werk, we allow all rates below 100 Command Timeouts per hour.
ID: 12928
Title: fileinfo_groups, sap_hana_fileinfo_groups: migration to the new check API
Component: Checks & agents
Level: 1
Class: New feature
Version: 2.1.0i1
This werk is incompatible if you have fileinfo_groups or sap_hana_fileinfo_groups
enforced services. In that case you have to go to the enforced service parameters
and add group patterns for file grouping.
Nothing has changed in the check logic.
ID: 11812
Title: esx multipath: Skip devices without LUN ID
Component: Checks & agents
Level: 1
Class: Bug fix
Version: 2.1.0i1
ESX vsphere may return multipath devices without LUN ID (e.g. "local marvel processor").
This led to mixed-up service items for esx_vsphere_hostsystem.multipath as the parsing continued without an Exception.
As a user you may notice this, when you see a path as service item (instead of a LUN).
Devices without a LUN ID are now skipped and the parsing is fixed.
As the discovery may discovered false items, you may need to perform a
rediscovery in case you're affetcted by this issue.
SUP-7220