ID: 8358
Title: Automatic Check_MK Agent updates for Linux
Component: agents
Level: 3
Class: New feature
Version: 1.2.7i4
The Check_MK Agent Bakery now allows for automatically updating Check_MK
Agents via HTTP/HTTPS. This is realized by a new agent plugin called
<tt>cmk-update-agent</tt>, that is currently only available for Linux,
but a Windows version will follow soon. The steps for setting up the
automatic updates is:
1. Go to the new page for the Agent Deployment. You find a new button
for this in the Agent Bakery page.
2. Fullfil all of the prerequisites that are shown on that new pages.
This involves creating agents that contain the new plugin with a
valid configuration and enabling the deployment.
3. Install the new prepared agents on the target hosts of your choice.
4. Call <tt>cmk-update-agent register</tt> on these systems in order
to register the agents for updates.
5. Don't forget to enable the agent deployment's master switch!
ID: 8355
Title: Fixed exception about "TypeError: a float is required" in reporting
Component: Reporting & Availability
Level: 2
Class: Bug fix
Version: 1.2.7i4
This occurred when selecting time ranges of the form <i>The last ...</i>.
ID: 8354
Title: Notification spooler: handle incoming connections even in situation with large spool directories
Component: Notifications
Level: 2
Class: Bug fix
Version: 1.2.7i4
In situations where a large bulk of notification came in from a slave the mknotify
would happily handle all those notifications while at the same time forgetting
to handle all the connections. They would run into timeouts that way. No the
processing of notifications is limited to 3 seconds for each turn.
ID: 8352
Title: Fix missing service level macro in Check_MK Micro Core
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i4
This was due to the service level being stored as an integer, but
in the macros we need strings for string replace.
ID: 8353
Title: Support for new RRD format with more efficient Disk IO
Component: metrics
Level: 2
Class: New feature
Version: 1.2.7i4
The Check_MK Micro Core now supports a new format when creating
RRD files. Here:
LI:RRD files are kept below the new directory <tt>~/var/check_mk/rrd</tt>.
LI:All metrics of one service are kept in one RRD
Because now all metrics of a service are stored together in one RRD less
disk blocks need to be updated when a new value is entered. This saves
Disk IO. This is possible because the RRDTool version 1.5 now supports
changing existing RRDs. That way if a check outputs new metrics these can
be added to an existing RRD.
H2:Switch to the new format
The new CMC format is not used automatically - neither for existing nor
for new sites. The default for new sites might be changed in future, though.
Switching your site to the new CMC format is done in the following steps:
1. Make a backup of your current RRDs (<tt>~/var/pnp4nagios/perfdata</tt>). You might
be tended to skip this step as the amount of data can be large. But you have
been warned.
2. Go to the ruleset <i>Configuration of RRD databases of services</i> (and
<i>hosts</i> resp.)
3. Create a rule for some (or all) hosts with the setting
<i> RRD storage format: One RRD per host/service (saves disk IO, only with CMC)</i>
4. Activate Changes - And now your system is in a state where <b>new</b>
RRDs will automatically be created
in the new format in <tt>~/var/check_mk/rrd</tt>.
5. In order to migrate the existing RRDs use the following command:
C+:
OM:cmk --convert-rrds -v
C-:
It is also possible to specify a list of hosts and limit the conversion
to these:
C+:
OM:cmk --convert-rrds -v server01 server02
C-:
Note: This keeps the existing RRDs present in <tt>~/var/pnp4nagios/perfdata</tt>
and needs lots of diskspace. If you low in diskspace and are bold or if you
have a backup (even better) you can directly delete the PNP format RRDs:
C+:
OM:cmk --convert-rrds --delete-rrds -v
C-:
If you decided not to delete the old RRDs in the first place you can run
the command with the option <tt>--delete-rrds</tt> at any time later.
Since all of you zigs of gigabytes of RRDs needs to be transformed this
can take a long time. The good news: you do not need a downtime on your
monitoring during this period. New RRDs are being used for storing current
data as soon as they exist so with a bit luck and good disk performance you
will not loose data points.
H2:Old PNP SINGLE format
Note: The new RRD format is almost identical with the format <tt>SINGLE</tt> that
was default in PNP4Nagios a long time ago. PNP was not able to alter the
list of metrics that a service did output - however. So whenever the list of
performance data of a check changed the graphs ended at that point of time.
There is currently no direct migration from PNP's <tt>SINGLE</tt> format
to the new CMC format. But you can go the intermediate step with
<tt>cmk --convert-rrds --split</tt>.
ID: 8350
Title: Real-time checks: Introducing checking in one second resolution
Component: cmc
Level: 2
Class: New feature
Version: 1.2.7i4
This release introduces a new feature named Real-time checks. With the new
Real-time checks it is possible to monitor specific things a lot shorter
intervals than the normal interval of 60 seconds.
This feature has mainly been developed to get detailed graphs for some values
which change often like for example the memory usage or CPU utilization. But
not only the performance data is updated in this interval. The complete service
state is updated which may also result in faster notifications.
The Real-time checks are working like this: The core is listening on the network
for incoming Real-time check results which are basically UDP packets sent by
the agents in an interval of one second. This needds to be enabled using the
configuration option <i>Global Settings > Enable handling of Real-Time Checks</i>.
You need to configure the UDP port to listen on (6559 by default) and the secret
which is used to decrypt the Real-time checks. This secret needs to be equal for
the Check_MK server and all agents which are sending Real-time check results.
The agents need to be configured to send Real-time check results. This can currently
be done for the Linux and Windows Agents. On linux you need to create a file
/etc/check_mk/real_time_checks.cfg with the following contents:
F+:/etc/check_mk/real_time_checks.cfg
RTC_TIMEOUT=90
RTC_PORT=6559
RTC_SECRET='hallo123'
RTC_SECTIONS=""
RTC_SECTIONS+="mem "
RTC_SECTIONS+="cpu "
F-:
It is a good idea to reduce permissions on this file because it contains the
real-time check secret which is shared between the Check_MK agent and the server
to encrypt the transfered data. For example <tt>chmod 640 /etc/check_mk/real_time_checks.cfg</tt>
is a good idea.
On windows you need to add the following to the <tt>[global]</tt> section
of your check_mk.ini and restart the Check_MK service:
F+:check_mk.ini
realtime_port = 6559
realtime_sections = mem winperf
realtime_timeout = 90
passphrase = hallo123
F-:
The agent is working as usual, waiting for connections from the Check_MK server.
Once a Check_MK server is contacting the agent, the agent is responding with
it's regular response. Now, when Real-time checks are enabled, the agent is
sending one UDP packet for each enabled section per second to the host address
which had queried the Check_MK agent, which is normally the Check_MK servers
address.
The data which can be processed as Real-time check is limited, so we limit the
sections which can be send as Real-time checks. Currently you can enable only
the <tt>mem</tt> and <tt>cpu</tt> sections on linux and <tt>mem</tt> and
<tt>winperf</tt> on windows systems. This might be extended in the future.
To get detailed graphs, you now need to configure your RRD databases to be able
to store these detailed information. This can be done via the ruleset
<i>Host & Service Parameters > Monitoring Configuration > Configuration of RRD databases of services</i>.
You need to create a new rule and first need to ensure that you only apply the
rule to checks which get real-time check information as the RRDs of these
services need more disk space. So you should only select the CPU/Memory services
of hosts which are sending Real-time check results.
Then you need to configure this rule to have a 1 second precision for a duration
of your choice.
Just one example configuration for having:
<ul>
<li>1 second resolution for 4 hours</li>
<li>1 minute resolution for 2 days</li>
<li>5 minute resolution for 10 days</li>
<li>30 minute resolution for 90 days</li>
<li>6 hour resolution for 4 years</li>
</ul>
You need to configure these numbers:
<ul>
<li>Step (precision): 1 sec.</li>
<li>RRA configuration:</li>
<li>50.0%, 1, 14400</li>
<li>50.0%, 60, 2880</li>
<li>50.0%, 300, 2880</li>
<li>50.0%, 1800, 4320</li>
<li>50.0%, 21600, 5840</li>
</ul>
After you configured this, you need to run <tt>cmk --convert-rrds -v</tt> to convert
the existing RRDs.
After the conversion has finished and processing of the Real-time checks works
correctly, you should see the service state, output and graphs e.g. of the "CPU utilisation"
service updating in an interval of one second.
ID: 8344
Title: Users can now add a pin to graphs to get specific values
Component: metrics
Level: 2
Class: New feature
Version: 1.2.7i4
By clicking on a graph, users can mark a specific position (time) in
their graphs.
This makes the graphs show the values of the marked time in the
legend below the graph. The values of the graphs at the current
marked time are shown in the column "pin".
The time is saved per user. When the saved time of the current pin
is in the range of a graph, the pin is shown and the values are
displayed in the table.
ID: 8310
Title: Fix exception in report schedule in case of reports with special contexts
Component: Reporting & Availability
Level: 2
Class: Bug fix
Version: 1.2.7i4
If you have a report of context type "single Event Console Event" and then in the
scheduler add an event id to the context of the report you would run into an
exception saying <tt>object of type 'int' has no len()</tt>. This has been fixed.
ID: 8309
Title: Fix authorization settings for seeing service in the user interface
Component: cmc
Level: 2
Class: Bug fix
Version: 1.2.7i4
The behaviour of the authorization setting <i>loose</i> for hosts has been
changed to be compatible with the behaviour Nagios has. If host authorization
is set to <i>loose</i> in the global settings of the Check_MK Micro Core, then
it is now suffient to be the contact of a host for seeing all of its service
- even if those services do have explicit <i>contacts</i> assigned. Formery
you needed to be an explicit contact if one had been set. In the case that
you just have contacts assigned to hosts nothing has changed.
The behaviour of the authorization setting <i>strict</i> has also changed.
If you are the explicit contact of service, but not of the host, then you
are allowed to see the service.
<b>Note</b>: This all has <b>no</b> impact on notifications. Nothing has
changed there.
ID: 8304
Title: New graphical frontend for metrics time graphs
Component: metrics
Level: 3
Class: New feature
Version: 1.2.7i3
The Check_MK Enterprise Edition has now a new graphical user frontend for
displaying metrics time graphs. Instead of using pixeled PNG images it makes
use of interactive vector graphics using HTML 5. These graphs do not only
look better. They allow you to:
LI:Zoom horitontally using your mouse wheel
LI:Zoom vertically by dragging points up and down
LI:Scroll horizontally (into past and future) by dragging points left and right
LI:Resize a graph using the right bottom corner
If more than one graph is being displayed on the same HTML page then any zoom
or shift in time will be synchronzed to <b>all</b> graphs on the same page.
Also the reporting uses the new graphing engine and replaces the embedded
pixel images by real vector graphics.
A further advantage of the new system is, that in distributed setups no
reverse HTTP proxy is needed any longer. The data is being fetched via
Livestatus (up-to-date core is neccessary).