Module: check_mk
Branch: master
Commit: abdc673c713cb11d6877699dec3ee33014db7113
URL:
http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=abdc673c713cb1…
Author: Lars Michelsen <lm(a)mathias-kettner.de>
Date: Mon Jul 23 08:05:48 2018 +0200
6358 FIX Fixed stale services on cluster nodes
When using Check_MK clusters it could happen that some of the services
on a node went stale and remained in this state. In this situation you
can find multiple messages in the cmc.log when using the CEE:
2018-07-21 12:47:52 [5] [Check_MK helper 4644] started, commandline:
/omd/sites/beta/bin/cmk --keepalive
2018-07-21 12:48:42 [2] [Check_MK helper 4644] ignoring check result for lxclu1 /
Filesystem /boot: no such service
2018-07-21 12:48:43 [4] [Check_MK helper 4644] restarting dead helper
2018-07-21 12:48:43 [5] [Check_MK helper 4644] exited normally
The issue was triggered because the cluster nodes reported service check
results for services which are assigned to the cluster using the "Clustered
services" rule set.
The problem could only happen when services of one check type were assigned to
the cluster and the cluster node. For example in case you have at least one
Filesystem service assigned to the node and at least one Filesystem service
assigned to the cluster.
This regression was introduced with 1.5.0b7 (Werk #5814).
Change-Id: Ibddc6e489c21b8664164d1437878f7021186f740
---
.werks/6358 | 29 +++++++++++++++++++++++++++++
cmk_base/checking.py | 3 +++
2 files changed, 32 insertions(+)
diff --git a/.werks/6358 b/.werks/6358
new file mode 100644
index 0000000..84c6395
--- /dev/null
+++ b/.werks/6358
@@ -0,0 +1,29 @@
+Title: Fixed stale services on cluster nodes
+Level: 2
+Component: core
+Class: fix
+Compatible: compat
+Edition: cre
+State: unknown
+Version: 1.6.0i1
+Date: 1532171683
+
+When using Check_MK clusters it could happen that some of the services
+on a node went stale and remained in this state. In this situation you
+can find multiple messages in the cmc.log when using the CEE:
+
+2018-07-21 12:47:52 [5] [Check_MK helper 4644] started, commandline:
/omd/sites/beta/bin/cmk --keepalive
+2018-07-21 12:48:42 [2] [Check_MK helper 4644] ignoring check result for lxclu1 /
Filesystem /boot: no such service
+2018-07-21 12:48:43 [4] [Check_MK helper 4644] restarting dead helper
+2018-07-21 12:48:43 [5] [Check_MK helper 4644] exited normally
+
+The issue was triggered because the cluster nodes reported service check
+results for services which are assigned to the cluster using the "Clustered
+services" rule set.
+
+The problem could only happen when services of one check type were assigned to
+the cluster and the cluster node. For example in case you have at least one
+Filesystem service assigned to the node and at least one Filesystem service
+assigned to the cluster.
+
+This regression was introduced with 1.5.0b7 (Werk #5814).
diff --git a/cmk_base/checking.py b/cmk_base/checking.py
index a7362ba..8598ca8 100644
--- a/cmk_base/checking.py
+++ b/cmk_base/checking.py
@@ -197,6 +197,9 @@ def _do_all_checks_on_host(sources, hostname, ipaddress,
only_check_plugin_names
if only_check_plugin_names != None and check_plugin_name not in
only_check_plugin_names:
continue
+ if belongs_to_cluster and hostname != config.host_of_clustered_service(hostname,
description):
+ continue
+
success = execute_check(multi_host_sections, hostname, ipaddress,
check_plugin_name, item, params, description)
if success:
num_success += 1