Improved Unicode robustness. - Checkmk git commits

11 May 2016

Module: check_mk
Branch: master
Commit: e4db79755a9feeb90e203fabfc5e8a0ab7aa4e30
URL:   
http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=e4db79755a9fee…

Author: Sven Panne &lt;sp(a)mathias-kettner.de&gt;
Date:   Wed May 11 10:26:58 2016 +0200

Improved Unicode robustness.

---

 bin/mkeventd |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/bin/mkeventd b/bin/mkeventd
index 134e33e..f060feb 100755
--- a/bin/mkeventd
+++ b/bin/mkeventd
@@ -97,6 +97,19 @@ history_columns = [
 # Alas, we often have no clue about the actual encoding, so we have to guess:
 # Initially we assume UTF-8, but fall back to latin-1 if it didn't work.
 def decode_from_bytes(string_as_bytes):
+    # This is just a safeguard if we are inadvertedly called with a Unicode
+    # string. In theory this should never happen, but given the typing chaos in
+    # this script, one never knows. In the Unicode case, Python tries to be
+    # "helpful", but this fails miserably: Calling 'decode' on a
Unicode string
+    # implicitly converts it via 'encode("ascii")' to a byte string
first, but
+    # this can of course fail and doesn't make sense at all when we immediately
+    # call 'decode' on this byte string again. In a nutshell: The implicit
+    # conversions between str and unicode are a fundamentally broken idea, just
+    # like all implicit things and "helpful" ideas in general. :-P For further
+    # info see e.g. http://nedbatchelder.com/text/unipain.html
+    if type(string_as_bytes) == unicode:
+        return string_as_bytes
+
     try:
         return string_as_bytes.decode("utf-8")
     except: