Module: check_mk
Branch: master
Commit: 3113d9985d346e3380d9a7a81f476b6f11c80fd2
URL:
http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=3113d9985d346e…
Author: Lars Michelsen <lm(a)mathias-kettner.de>
Date: Mon Jul 24 16:59:19 2017 +0200
5038 FIX Datasource programs: Prevent zombie processes in case of timeouts
When using datasource programs (like e.g. special agents or SSH command line calls)
to get agent data from hosts it may happen that these programs remain open as
zombie processes. Check_MK has been extended to deal with this situation and clean
up these processes.
Details: When the program execution takes too long Check_MK sends a SIGTERM to the
process group of the executed program. After sending the signal Check_MK is now
waiting for the process to finish.
Change-Id: I9f2dbfa0839f8089bcc86bf67ec270aedf7adf3a
---
.werks/5038 | 18 ++++++++++++++++++
cmk_base/agent_data.py | 1 +
2 files changed, 19 insertions(+)
diff --git a/.werks/5038 b/.werks/5038
new file mode 100644
index 0000000..8e17bb7
--- /dev/null
+++ b/.werks/5038
@@ -0,0 +1,18 @@
+Title: Datasource programs: Prevent zombie processes in case of timeouts
+Level: 1
+Component: core
+Class: fix
+Compatible: compat
+Edition: cre
+State: unknown
+Version: 1.5.0i1
+Date: 1500907925
+
+When using datasource programs (like e.g. special agents or SSH command line calls)
+to get agent data from hosts it may happen that these programs remain open as
+zombie processes. Check_MK has been extended to deal with this situation and clean
+up these processes.
+
+Details: When the program execution takes too long Check_MK sends a SIGTERM to the
+process group of the executed program. After sending the signal Check_MK is now
+waiting for the process to finish.
diff --git a/cmk_base/agent_data.py b/cmk_base/agent_data.py
index 456bf5d..5183715 100644
--- a/cmk_base/agent_data.py
+++ b/cmk_base/agent_data.py
@@ -812,6 +812,7 @@ def get_agent_info_program(commandline):
# On timeout exception try to stop the process to prevent child process
"leakage"
if p:
os.killpg(os.getpgid(p.pid), signal.SIGTERM)
+ p.wait()
raise
except Exception, e:
raise MKAgentError("Could not execute '%s': %s" % (exepath,
e))