ID: 5538
Title: Improved performance when processing a large amount of piggyback data
Component: Core & setup
Level: 1
Class: New feature
Version: 1.5.0i2
When Check_MK needs to handle a large amount of piggyback data (a lot of piggbacked
hosts from a lot of piggyback source hosts, several hundreds to thousands),
the performance of Check_MK could decrease during regular monitoring. This was caused
by some too expensive house keeping logic that was executed too often.
The mechanism has now been changed to work like this:
<ul>
<li>During regular monitoring now piggyback data is removed anymore from the disk.</li>
<li>New piggyback data is written to disk when communicating with the source host.</li>
<li>When monitoring piggybacked hosts, the outdated piggyback data available on the
disk is filtered.</li>
<li>There is a dedicated housekeeping cron job executed sites crontab daily at 00:10
which removes outdated piggyback data. This job is mostly used to free up some tmpfs
space, the outated stored data is not read by monitoring anymore.</li>
</ul>
ID: 5411
Title: Windows agent: handle WMI timeouts
Component: Checks & agents
Level: 1
Class: Bug fix
Version: 1.5.0i2
All sections depending on WMI (Windows Management Instrumentation)
queries have been suffering from periodic freezing, the time interval
between subsequent freezes being typically 18...20 minutes. At those
moments, the Windows agent has not been delivering any output for some
of its WMI-dependent sections (e. g., ps, uptime, dotnet_clrmemory,
wmi_cpuload, msexch and wmi_webservices). The corresponding checks have
issued error messages of type "Missing agent sections...". Various
strategies have been previously used attempting to cope with the
periodic problems with WMI. Werk #4008 introduced a timeout of 10s in
order to prevent the agent from completely blocking if a WMI query
freezes. However, this led to the described problem of missing agent
output totally when no response was given to a WMI query within 10s.
Moreover, multiple WMI queries waiting for 10s after another led to
periodic long execution times of the Windows agent.
This Werk introduces a new strategy for coping with the periodic
freezing of WMI queries. The timeout of the queries is reduced to 2.5s
instead of 10s per query, reducing the total execution time of the
Windows agent by approximately 75% when the problem occurs. Upon a WMI
timeout, the Windows agent issues it in its output so, that the affected
checks can tolerate it by setting their state to UNKNOWN. In normal
cases, the check should get back to OK when the agent is contacted the
next time and the WMI freeze is most likely gone.
There seems to be a connection of the WMI freezes to the Windows service
WMI Performance Adapter. https://lokna.no/?p=1430 suggests that the
startup type of this service be set to automatic, ensuring the service
is running. Without this, the WMI Performance Adapter seems to get
started periodically when WMI is queried. Testing with WMI Performance
Adapter service running has showed clear signs of improvement, reducing
the frequency of freezing WMI queries even if not completely ending
them.
ID: 5537
Title: Icon categories can now be customized
Component: WATO
Level: 1
Class: New feature
Version: 1.5.0i2
The icon categories that can be used for managing custom icons in WATO can
now be configured using a new global setting "Administration tool (WATO) > Icon categories".
ID: 5520
Title: rstcli, rstcli.pdisks: Fix broken parse function
Component: Checks & agents
Level: 1
Class: Bug fix
Version: 1.5.0i2
Some unexpected lines in the corresponding agent section could lead to
neither check nor discovery working (ie crashing). This has been fixed
for all known cases.
ID: 5436
Title: Fixed logical problem with SNMP check interval rule
Component: Core & setup
Level: 1
Class: Bug fix
Version: 1.5.0i1
The rule was previously configured for each individual check. This was not really
correct, because the SNMP data is fetched for main checks and their sub checks
together. This means that you can not define inidividual SNMP checks intervals for
these checks.
The ruleset "Check intervals for SNMP checks" has now been changed to work with the
"section names" instead of inidividual checks.
ID: 5535
Title: Check_MK hosts can now use multiple agents
Component: Core & setup
Level: 2
Class: New feature
Version: 1.5.0i2
It is now possible to configure multiple Check_MK agents for a single host.
With this change, you can now configure e.g. a ESX VCenter to use the ESX special agent
together with the regular Check_MK agent installed on the monitored host.
All existing hosts remain using their existing configuration after an update. Also
new hosts only use a single Check_MK agent using the already existing logic where
a) The Check_MK agent was contacted via TCP or b) a configured data source program
(special agent or other program invocation) was used.
The new feature can be enabled by changing the host attribute (on hosts or folders)
"Check_MK Agent" to e.g. "Contact Check_MK agent and all datasource programs". This
will make Check_MK use all data sources matching on this host instead of just picking
the first matching one. There is also an alternative option "Use all enabled datasources"
available which can be used to execute only the data sources matching the host.
On the way to this change we have changed server previously existing things:
<ul>
<li>The host tag group <tt>agent</tt> has been split into multiple tag groups to be
more flexible.</li>
<li>The tag group <tt>ping</tt> and <tt>snmp</tt> have been added and provide the options
which were previously available in the single <tt>agent</tt> attribute.</li>
<li>All these tag groups are treated as <i>builtin</i> tag groups defined by Check_MK
(can not be modified anymore).</li>
<li>Existing configurations of hosts/folders will be translated seamlessly into the new
format.</li>
<li>During updates your site will only apply the changes above in case you have an unmodified
<tt>agent</tt> tag group. In case you have modified it in any way, these changes will not
be applied and you won't be able to use the changes introduced with this werk. You will then
have to clean up your local changes. Once you delete your local tag group "agent", the
builtin one will be used automatically.</li>
<li>The <i>Edit host</i> dialog has split up into more independent sections, the new ones
are <i>Address</i> and <i>Data sources</i> to better visualize the relation of the different
attributes.</li>
</ul>
<i>Please note:</i> In case you are using the Web-API calls to create or modify hosts or folders
while setting attributes we changed with this change, you may have to change your API calls.