ID: 5275
Title: cmk-update-agent: Fix run-as-plugin mode (Regression since 1.5.0b1)
Component: agents
Level: 2
Class: Bug fix
Version: 1.6.0i1
Due to an erroneous call to a non-existant method, the Agent Updater crashes when run as plugin.
As a result, automatic updates would not work with an Agent Updater 1.5.0b1. Manual updates are still working, so it is possible to replace the erroneous Agent Updater with a fixed one by doing a manual Update to a newly baked agent.
ID: 5970
Title: Cascading livestatus proxy is now possible
Component: Livestatus Proxy
Level: 2
Class: New feature
Version: 1.5.0b2
It is now possible to cascade livestatus proxy configurations. This comes in
handy to build cascaded distributed GUI (status GUI) setups.
The feature has been build for a scenario like this:
<ul>
<li>A distributed setup where you have remote sites that are not directly
reachable</li> <li>These remote sites are only reachable through a single
"location master" site</li> <li>You use the "location masters" for
configuration of all the related sites</li> <li>The central site is only used
as central operating site (overview, reporting) and not for configuration</li>
</ul>
To get a cascading setup, configure Check_MK like this:
<ul>
<li>Location master: Create one site in "Distributed configuration" for each
local site. Configure the connection parameters to use the Livestatus Proxy.
Set the new option to open a TCP port for this connection and insert a TCP port
that is currently not used on the local machine (e.g. 6560).</li> <li>Central
viewer site: Create one site in "Distributed configuration" for each remote
site. Configure it to use the Livestatus Proxy. Set the destination IP address
to the IP address of the "Location master" server and set the TCP port to the
port you configured for the site in the previous step.</li>
</ul>
After this you should be able to connect to your cascaded remote sites through
the Livestatus Proxy of the "Location master".
ID: 5958
Title: Introduce docker monitoring with Check_MK
Component: Checks & agents
Level: 2
Class: New feature
Version: 1.5.0i4
With this change we prepare Check_MK for monitoring docker environments out
of the box. These checks work in different layers (node, container).
The docker monitoring is currently available through the linux agent. To get
a docker node monitored it should be enough to simply deploy the agent as
usual on the node. Check_MK will find all relevant checks automatically.
The agent on the node will iterate over all containers and execute the
Check_MK agent in the context of the container. In case there is a agent already
installed in the container, the agent of the container will be used. Otherwise
the node will execute the nodes agent in the context of the container.
In case you need specific agent plugins executed in the container, you can
add them to the container image together with the agent just like you would
do it for regular hosts.
By default the docker container specific parts are transported via piggyback
from the node to the Check_MK server. This means that you will have to create
hosts in your Check_MK use the short container ID as name.
For the docker container hosts please use the following configuration:
<ul>
<li>Set the "IP address family" to "No IP" for only processing the piggyback data.</li>
<li>Set the docker node as parent.</li>
<li>Enable HW/SW inventory for the node and the containers</li>
</ul>
The manual (or scripted) configuration of these hosts will be necessary with
the 1.5. Check_MK 1.6 will solve this problem automatically in a more elegant way.
There are other use cases, for example if you have not access to the node,
then you can also install the agent (including optional config and plugins)
into the image and make the container open a dedicated network port for
agent communication.
We'll add a dedicated docker monitoring page to the documentation in the
near future to describe this in detail.
The following changes have been made for now:
<h3>New check plugins</h3>
<ul>
<li>docker_node_info: Check the status of docker daemon<br>
Whether or not the docker daemon is running and functional on the docker
node.
</li>
<li>docker_node_info.containers: Count number of containers<br>
Counts the number of containers in the different states. Creates metrics
out of these information. Thresholds can be configured on the number of
containers in the different states.
</li>
<li>docker_node_disk_usage: Disk usage of docker files<br>
This check summarizes the disk usage of docker files (images, ...) on
the disks. It tells you whether or not you can safe disk space by
cleaning up things.
</li>
<li>docker_container_cpu: Check the CPU utilization of a docker container<br>
This check reports the percentage CPU utilization of a docker container.
Unlike the Linux CPU utilization check (kernel.util) it does only report
user and system time. More detailed values, like iowait, are not available.
</li>
<li>docker_container_mem: Docker container specific memory checking<br>
Instead of using the default linux memory check (mem), Check_MK is now
using the container specific memory check.
The main reason is that the memory information in the container is not
available through <tt>/proc/meminfo</tt> as usual. The memory data is available
through the kernels cgroup interface which is available in the containers
context below <tt>/sys/fs/cgroup/memory/memory.stat</tt>
The features of both checks are exactly the same.
</li>
<li>docker_container_status: Checks running state of container<br>
The check docker_container_status checks whether a container is running or not.
</li>
<li>docker_container_status.health: Check healthcheck API of containers<br>
Check the status of containers as reported by Docker's healthcheck API.
</li>
</ul>
<h3>New HW / SW inventory plugins</h3>
<ul>
<li>docker_node_images: Inventorize docker node information<br>
Inventorizes information about repository, tag, ID, creation time, size,
labels and the amount of docker images. It also collect information about
how many containers currently use this image.
</li>
<li>docker_node_info: Inventory plugin displaying docker version<br>
Adds the docker version and node labels to the inventory tree.
</li>
<li>docker_container_labels: Inventorize the labels of container</li>
<li>docker_container_node_name: Inventorize node name of containers</li>
</ul>
<h3>Preparing linux agent for docker monitoring</h3>
<ul>
<li>The agent now detects whether or not it is being executed
in a docker container context.
</li>
<li>Find docker containers and execute agent in context<br>
In case the agent is running on a docker node, it iterates
all running containers and executes the Check_MK agent in
to context of the container to gather container specific
information.
In case a check_mk_agent is already installed in the
container, then this agent is executed.
In case there is no check_mk_agent installed, the agent
of the docker node is executed in the container.
</li>
</ul>
<h3>Changed checks</h3>
<ul>
<li>lnx_if: Exclude veth* network interfaces on docker nodes<br>
The veth* network interfaces created for docker containers are now
excluded by the linux agent in all cases. The interface names have no
direct match with the docker container name or ID. They seem to have
some kind of random nature.
These container specific interfaces are not relevant to be monitored
on the node. We are monitoring the docker network interfaces in the
container.
</li>
<li>df: Exclude docker local storage mounts on docker nodes<br>
The df check is now excluding all filesystems found below
<tt>/var/lib/docker</tt>, which is the default location for
the docker container local storage.
Depending on the used storage engine docker creates overlay
filesystems and mounts below this hierarchy for the started
containers.
The filesystems are not interesting for our monitoring. They
will be monitored from the container context.
</li>
<li>df mounts: Skip docker mounts for name resolution in container<br>
When docker containers are configured to perform name resolution there are
mounts at <tt>/etc/resolv.conf</tt>, <tt>/etc/hostname</tt> and
<tt>/etc/hosts</tt> which are not relevant to be monitored. These checks are
now always skipped.
</li>
<li>uptime: Is now reported correctly for docker containers<br>
In previous versions of the linux agent the uptime of the
docker node was reported by the agent when it is being executed
in a docker container context.
</li>
<li>Checks disabled in docker container contexts<br>
These checks do not make sense in the context of a docker container.
The agent is now skipping this section when executed in a container.
For some of the checks docker specific ones have been added (see above).
<ul>
<li>kernel</li>
<li>cpu.threads</li>
<li>cpu.load</li>
<li>drbd</li>
<li>lnx_thermal</li>
</ul>
</ul>
ID: 5830
Title: Fixed DST shift correction for downtimes, causing CMC to use 100% CPU
Component: Core & setup
Level: 2
Class: Bug fix
Version: 1.5.0i4
When a downtime was configured to recur every hour and a 1h DST shift
happened, the CMC would go into an infinite loop with 100% CPU load and no
monitoring at all. If the downtime in question was set in an ad hoc fashion
via the GUI (not via rules), the only way to work around this issue was to
remove the state file, losing all downtimes, acknowledgements and comments.
ID: 5707
Title: Windows: allow eventlog monitoring from multiple hosts
Component: Checks & agents
Level: 2
Class: New feature
Version: 1.5.0i4
Until now, Windows eventlogs could be monitored only from one host (Check_MK
site). Attempting to contact one single Windows agent from multiple hosts lead
to lost eventlog entries as all hosts shared one common state file for storing
the offsets of so far read eventlog entries.
Now the offsets are store in host-IP-specific state files, allowing hosts with
different IP addresses to monitor one single Windows system without loosing
eventlog entries. Note: multiple Check_MK sites running under the one and same
IP address will still suffer from lost eventlog entries as the offsets are
stored per IP address.
ID: 5783
Title: Introduced background jobs for longer running task
Component: WATO
Level: 2
Class: New feature
Version: 1.5.0i4
A new mechanism has been introduced allowing certain tasks to be run as background processes.
Currently the following areas are included
<ul>
<li>Host rennaming</li>
<li>Agent baking (Enterprise Edition)</li>
<li>Report generation (Enterprise Edition)</li>
</ul>
Once a background job is triggered, an overview page provides additional information
regarding the progress. You can also stop and delete background jobs, if applicable.
Keep in mind that some jobs - in detail host renaming and agent baking - lock certain areas
in WATO to prevent further configurations as long the background process is running.
Each background has a working directory located in <tt>~/var/check_mk/background_jobs</tt> with
a statusfile named <tt>jobstatus.mk</tt>. If the background job generates stdout data it will
be shown in the job details page. Job Exceptions are also shown in the details page and, in addition,
written to <tt>~/var/log/web.log</tt>.
Keep in mind, old background jobs will be automatically removed after 30 days or if the maximum amount
of jobs for a job type has been reached. These limits are currently not configurable and hardcoded
<ul>
<li>Reporting has a limit of 100</li>
<li>Host renaming has a limit of 50</li>
</ul>
This cleanup routine is regularly called through a multisite cronjob.
ID: 5744
Title: Export of rule packs in MKP packages
Component: Event Console
Level: 2
Class: New feature
Version: 1.5.0i4
Rule packs of the Event Console can now be exported in MKP packages, i.e. they
can be downloaded, versioned, and shared with other Check_MK installations.
This is e.g. useful in setups with independent instances where rule packs can
now be defined centrally in one instance and distributed to other instances as
predefined packages. For distributed instances the primary mechanism to
synchronize rule packs remains the synchronization via WATO.
To export a rule pack it has to be made exportable by clicking on the
corresponding action in the rule pack overview of the WATO module Event
Console. After that the rule pack is available for MKP export in the WATO
module Extension Packages, i.e. it can be seen in the packaged files when a
package is created or edited. If a rule pack is exported in a MKP it cannot be
deleted directly in the WATO module Event Console anymore and it's ID and title
cannot be changed. The rules of the rule pack can still be modified via the
edit rules menu. If a rule of a MKP rule pack is modified, deleted or a new
rule is created the MKP rule pack becomes a modified MKP rule pack. This means
that the modified version of the rule pack becomes valid, but the exported MKP
rule pack remains unchanged. To synchronize the MKP rule pack with the modified
MKP rule pack there are two options: the modified rule pack can be resetted to
the MKP version or the MKP can be updated by the modified version. Both options
are available via the corresponding action of the rule pack in the WATO module
Event Console.
If a MKP including Event Console rule packs is uploaded to a site the included
rule packs will be added to the end of the existing rule packs. They can be
moved freely between the existing rule packs without changing the MKP.
Furthermore, rule packs provided in a MKP can be enabled and disabled without
restrictions. Note however that the information if a rule pack is enabled or
disabled is persisted in the MKP. This makes it possible to ship e.g. three
rule packs for different versions of a software which are disabled by default.
After uploading the MKP the rule pack for a specific version can be activated.
If a MKP with exported rule packs is deleted the rule packs provided by that
MKP are deleted as well. To keel the rule packs and to remove the MKP the MKP
has to be dissolved. As a result the rule packs that were provided by that MKP
are still available. After dissolving a MKP the rule packs will be exportable.
In a distributed monitoring setup there are the two existing options "Replicate
Event Console configuration to this site" and "Replicate extensions (MKPs and
files in ~/local/)" which have an impact on exportable and exported rule packs.
By enabling the replication of the event console configuration all Event
Console rule packs are synchronized with slave sites, but only by enabling the
replication of MKPs the MKP information of rule packs is synchronized. This has
an implication on the representation and behaviour of rule packs of a slave
sites if the WATO configuration of the slave site is enabled. If for example
only the replication of the event console configuration is enabled the slave
site will show a synchronized MKP rule pack as exportable but not as a rule
pack provided by a MKP. Therefore, rule packs can be provided by the master and
bundled in MKPs by the slave site. If the rule pack export is used it is not
advisable to use the MKP replication without the replication of the Event
Console configuration because the rule packs are provided to the slave in .mk
files, but they WILL NOT be recognized by the Event Console. The preferred
method to share only specific rule packs is to disable both replication options
and to upload MKPs containing these rule packs to the corresponding slave
sites.
To avoid errors in a distributed monitoring setup the rule pack export SHOULD
only be used when ALL slave sites support the MKP rule pack export. If a slave
site does not support the rule pack export either the replication of the Event
Console configuration should be disabled for that site or the export of rule
packs should not be used at all.