ID: 7202
Title: Elasticsearch Monitoring
Component: Checks & agents
Level: 2
Class: New feature
Version: 1.6.0i1
7202 Elasticsearch Monitoring
A special agent and first checks to support monitoring of Elasticsearch cluster, nodes and indices.
With this werk it is possible to monitor the cluster and node states of an elasticsearch instance.
LI: elasticsearch_cluster_health: State of the cluster (eg. green/yellow/red), Name, Number of nodes, Number of data nodes
LI: elasticsearch_cluster_health.shards: Number of shards in different states
LI: elasticsearch_cluster_health.tasks: Pending tasks, Task max waiting, Time out of tasks
LI: elasticsearch_nodes: Total virtual memory, CPU usage, Number of open file descriptors of nodes
For each check it is possible to set parameter in the associated WATO rule.
Checks for Cluster, Indices and Shard statistics follow soon.
Thanks to Fabian Binder for creating the initial version of the agent and checks!
ID: 6969
Title: Fixed recurring flexible downtimes.
Component: cmc
Level: 2
Class: Bug fix
Version: 1.6.0i1
Previously, the combination of "recurring" and "flexible" was broken for
downtimes. The effect was that such downtimes remained hot even outside
their intended time window, so that the next occurring problem just
triggered the start of such a downtime and no problem was notified.
An example for such a scenario: A flexible host downtime was configured to
happen every day between 02:00 and 03:00 for 2 hours. Everything was OK
between 02:00 and 03:00, but at 08:15 the host went DOWN. This started the
downtime, lasting until 10:15, which is of course totally unintended: The
problem did not happen between 02:00-03:00, so the downtime should not start
and normal problem processing should be done, including notifications etc.
This has been fixed, so the recurring flexible downtimes are working as
intended now. If you update your installation, there is nothing more you
have to do, all downtimes automatically work correctly after that. If you
do not want to update yet, you should delete your recurring flexible
downtimes for now and add new recurring fixed downtimes instead as a
workaround until you update.
ID: 7326
Title: "Update DNS cache" action is now really cleaning up the cache
Component: WATO
Level: 2
Class: Bug fix
Version: 1.6.0i1
As written in the documentation
(https://mathias-kettner.com/cms_wato_hosts.html) Check_MK is keeping an
internal DNS cache for the hosts that have configured no static IP address in
Check_MK:
---
With the host name method Check_MK uses cached data in order to minimise
repeated DNS requests during an Activate Changes – which is very important for
accelerating the activation procedure. Furthermore, the cache ensures that a
changed configuration can still be activated if the DNS stops working.
The catch is that Check_MK doesn't automatically notice the change to an
address in DNS. For this reason, in the host details there is the button which
deletes the entire DNS cache and forces a new resolution at the next Activate
changes. This file is found under ~/var/check_mk/ipaddresses.cache in your
instance, by the way. Deleting this file has the same effect as the button as
described above.
---
The problem was that previous versions did not really delete the entire cache,
but only updated it. We have now changed this to make the cache invalidation
work as intended.
ID: 6987
Title: Event history: Fix incomplete information when using time filters (eg. Recent Events view)
Component: Event Console
Level: 2
Class: Bug fix
Version: 1.6.0i1
An encoding issue would lead to queries as used by the event history to skip files with non-ASCII characters
in the first line, leading to events archived in that file's timerange to be omitted. This has been fixed.
ID: 6968
Title: Fixed CMC crashes caused by a race condition
Component: Core & setup
Level: 2
Class: Bug fix
Version: 1.6.0i1
Activating a new configuration during a long-running query for graph data
could lead to a CMC crash because of an internal race condition. This has
been fixed.
ID: 7201
Title: Servicenow: Notification plugin
Component: Notifications
Level: 2
Class: New feature
Version: 1.6.0i1
Check_MK now supports integration with Servicenow.
You can create, update and close incidents for Host an Service Problems.
Please see the inline help of the notification method rule within wato for detailed options.
CMK-1167
ID: 7226
Title: LXC: Add basic support for Linux containers
Component: Checks & agents
Level: 2
Class: New feature
Version: 1.6.0i1
The agent is now detecting when it is executed in a Linux container
context and changes it's behaviour depending the environment.
In containers the agent currently does the following:
<ul>
<li>lxc_container_cpu: This check is used instead of the normal CPU
utilization check</li>
<li>zfs filesystems are not excluded for the df section anymore</li>
<li>kernel section is not processed, because it's the host systems
kernel the agent reports information for</li>
<li>drbd section is not processed for the same reason</li>
<li>lnx_thermal section is not processed for the same reason</li>
</ul>
ID: 7224
Title: Backup/Restore: Fix vanishing files terminating a backup
Component: Site Management
Level: 2
Class: Bug fix
Version: 1.6.0i1
The "omd backup" mechanism recursively backups all site related
files of a Check_MK site. When files in a site directory, that
is currently being processed vanish, this could lead to failed
backups with errors like this:
C+:
Site backup failed: Traceback (most recent call last):
File "/omd/versions/1.5.0p8.cee/bin/omd", line 4553, in
command_function(args, command_options)
File "/omd/versions/1.5.0p8.cee/bin/omd", line 3711, in main_backup
backup_site_to_tarfile(fh, tar_mode, options)
File "/omd/versions/1.5.0p8.cee/bin/omd", line 3686, in backup_site_to_tarfile
backup_site_files_to_tarfile(tar, options)
File "/omd/versions/1.5.0p8.cee/bin/omd", line 3556, in backup_site_files_to_tarfile
tar.add(g_sitedir, g_sitename, exclude=filter_files)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2032, in add
recursive, exclude, filter)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2032, in add
recursive, exclude, filter)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2032, in add
recursive, exclude, filter)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2032, in add
recursive, exclude, filter)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2032, in add
recursive, exclude, filter)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 2009, in add
tarinfo = self.gettarinfo(name, arcname)
File "/omd/versions/1.5.0p8.cee/lib/python2.7/tarfile.py", line 1881, in gettarinfo
statres = os.lstat(name)
OSError: [Errno 2] No such file or directory: '/omd/sites/xyz/var/pnp4nagios/perfdata/hastenichgesehn/Interface_BG4_2OG_VLAN_421.xml.new'
C-:
After this change the backup continues in such a situation
excluding the just vanished file.
ID: 7185
Title: Custom service attributes can now be configured
Component: WATO
Level: 2
Class: New feature
Version: 1.6.0i1
It is now possible to configure arbitrary custom attributes for all services
just like it was already possible for hosts and users in previous versions.
The configuration procedure is working equally, you first need to define an
attribute in the first place, to be able to refer to the attribute and assign
it to services. The definition of custom attributes is done using the Global
Setting "Custom service attributes".
Once you have defined a custom service attribute, you can assign it to a
collection of services using the equal named ruleset "Custom service attributes".
In general, you should keep the number of rules low. To support you in this,
the rule is structured so that you can select several custom attributes for
each rule.
The custom service attribute will be named <tt>_[ID]</tt> in the core
configuration and can be gathered using the Livestatus column
<tt>custom_variables</tt> using the <tt>[ID]</tt>. The custom service
attributes are available to notification scripts as environment variable
named <tt>SERVICE_[ID]</tt>.
ID: 7100
Title: mk_sap_hana: Refactored plugin; Deprecated old checks; Add new checks
Component: Checks & agents
Level: 2
Class: New feature
Version: 1.6.0i1
Deprecated checks are:
sap_hana_filesystem,
sap_hana_full_backup,
sap_hana_mem,
sap_hana_process_list,
sap_hana_version,
New checks are:
sap_hana_backup,
sap_hana_data_volume,
sap_hana_diskusage,
sap_hana_ess,
sap_hana_events,
sap_hana_license,
sap_hana_memrate,
sap_hana_proc,
sap_hana_replication_status,
sap_hana_status,
In order to make the SAP HANA checks work you have to install
the agent plugin {{mk_sap_hana}} on your clients.
If you have installed the old agent plugin then you have to
replace it with the new agent plugin and perform a rediscovery
of these hosts.
The old check plugins are not discoverable any more and show
the message WARN - "This check is deprecated. Please rediscover the
services of that host."