Bug 17028 - SBS 7.5.4 r31939 - Malformed UTF-8 character (fatal) error
: SBS 7.5.4 r31939 - Malformed UTF-8 character (fatal) error
Status: NEW
Product: Logitech Media Server
Classification: Unclassified
Component: Display
: 7.5.4
: Other Linux (other)
: P3 normal (vote)
: 7.7.x
Assigned To: Andy Grundman
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-05 08:02 UTC by JED
Modified: 2011-09-19 11:23 UTC (History)
2 users (show)

See Also:
Category: Bug


Attachments
Print a date in March using strftime and locale language (418 bytes, text/plain)
2011-09-19 08:17 UTC, Andy Grundman
Details
A better version of the script (455 bytes, text/plain)
2011-09-19 08:22 UTC, Andy Grundman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description JED 2011-03-05 08:02:32 UTC
I've updated my Squeezebox Server to v7.5.4 r31939 (Perl 5.12.0) in February and it worked fine for a couple of days. On 02.03.2011 I realised that that the German month name wasn't shown properly anymore and that the following errors are continouly being logged since 01.03.2011:

Slim::Utils::Misc::msg (1165) Warning: [16:35:11.5475] Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately after start byte 0xe4) in substitution (s///) at /usr/local/slimserver/Slim/Display/Text.pm line 933.
Slim::Utils::Timers::__ANON__ (258) Error: Timer failed: Malformed UTF-8 character (fatal) at /usr/local/slimserver/Slim/Display/Text.pm line 933.
Slim::Utils::Misc::msg (1165) Warning: [16:35:12.4159] Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately after start byte 0xe4) in substitution (s///) at /usr/local/slimserver/Slim/Display/Text.pm line 933.
Slim::Networking::IO::Select::__ANON__ (146) Error: Select task failed calling Slim::Networking::Slimproto::client_readable: Malformed UTF-8 character (fatal) at /usr/local/slimserver/Slim/Display/Text.pm line 933.

Due to the fact that the problem started on March 1st (that's what I can see in the log file), I assume that it's caused by the German month name "März" which includes a German umlaut as second character at the moment.

Any hints how I could solve the problem?
Comment 1 JED 2011-03-06 09:12:34 UTC
(In reply to comment #0)

> Slim::Utils::Timers::__ANON__ (258) Error: Timer failed: Malformed UTF-8
> character (fatal) at /usr/local/slimserver/Slim/Display/Text.pm line 933.
> ...
> Due to the fact that the problem started on March 1st (that's what I can see
> in the log file), I assume that it's caused by the German month name "März"
> which includes a German umlaut as second character at the moment.

To narrow down the problem I've added a debug output to Slim/Display/Text.pm

--- Text.pm.ORG 2011-03-06 17:24:37.000000000 +0100
+++ Text.pm     2011-03-06 17:22:14.000000000 +0100
@@ -930,6 +930,7 @@
        }

        if (defined($length)) {
+logBacktrace("MYDEBUG:$string:$length:");
                if ($string =~ s/^(((?:(\x1e[^\x1e]+\x1e)|)([^\x1e\x1f]|\x1f[^\x1f]+\x1f)){0,$length})//) {
                        $newstring = $1;
                }

As far as I can see the log output confirms my previous assumtion that the error is caused by the German month short name:

[11-03-06 17:22:28.4219] Slim::Display::Text::subString (933) Error: MYDEBUG:              So 6. Mär 2011              :40:
                           ^^^^^
[11-03-06 17:22:28.4222] Slim::Display::Text::subString (933) Backtrace:
...
[11-03-06 17:22:28.4224] Slim::Utils::Misc::msg (1165) Warning: [17:22:28.4222] Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately
 after start byte 0xe4) in substitution (s///) at /usr/local/slimserver/Slim/Display/Text.pm line 934.

As second test I changed the default language from German to English which also
solves the problem:

[11-03-06 17:49:59.1374] Slim::Display::Text::subString (933) Error: MYDEBUG:            Sun 6. Mar 2011             :40:
Comment 2 Andy Grundman 2011-04-11 09:05:27 UTC
Do you have newer players, or does this only affect slimp3/SB1?
Comment 3 Andy Grundman 2011-09-19 07:05:09 UTC
What Linux distro/version are you running? The month names come from OS localization, not our own strings.txt file so it could be a bug there.
Comment 4 JED 2011-09-19 07:49:48 UTC
Hello Andy,

> Do you have newer players, or does this only affect slimp3/SB1?

I've also got a Squeezebox Boom which showed the same error.

Regards
Juergen
Comment 5 JED 2011-09-19 08:03:23 UTC
Hello Andy,

> What Linux distro/version are you running? The month names come from OS
> localization, not our own strings.txt file so it could be a bug there.

I'm using a Linux system which uses a self compiled environment incl. apache2 etc.
The returned date string "So 6. Mär 2011" is not the problem itself because it's absolutely ok for a german environment (LANG=C, LC_CTYPE=de_DE@euro) but the way how the program tries to format it.

Unfortunately there's only one month per year which contains a german umlaut and
I cannot change the date of my server. Is there any other way to help you with
your investigation, e.g. by providing a small PHP code snippet which I can run on my box for debugging purposes?

Regards
Juergen
Comment 6 Andy Grundman 2011-09-19 08:17:13 UTC
Created attachment 7470 [details]
Print a date in March using strftime and locale language

Try running this attached script.
Comment 7 Andy Grundman 2011-09-19 08:22:16 UTC
Created attachment 7471 [details]
A better version of the script

Try this one instead. You can uncomment the setlocale line to force the locale to German. On my Mac, the output in German is correct:

Locale: LC_TIME: de_DE.UTF-8
Samstag, März 19, 2011
"Samstag, M\xC3\xA4rz 19, 2011"
Comment 8 JED 2011-09-19 09:08:31 UTC
Hello Andy,

> Try this one instead. You can uncomment the setlocale line to force the locale
> to German. On my Mac, the output in German is correct:
> 
> Locale: LC_TIME: de_DE.UTF-8
> Samstag, März 19, 2011
> "Samstag, M\xC3\xA4rz 19, 2011"

that's the result on my server if I run it in a console window:

The environment: LANG=C
                 LC_CTYPE=de_DE@euro

# ./slims-locale-test.pl
Locale: LC_TIME: de_DE.UTF-8
Samstag, März 19, 2011
"Samstag, M\xC3\xA4rz 19, 2011"

The underlying system locale is not set to UTF-8 on my server which might cause the problem, although I'm running other PHP web applications with UTF-8 character sets without any problem.

I've tested the script on a server with UTF-8 system environment and the output looks a little bit different:

The environment: LANG=C
                 LC_CTYPE=de_DE.UTF-8

# ./slims-locale-test.pl
Locale: LC_TIME: de_DE.UTF-8
Samstag, März 19, 2011
"Samstag, M\xC3\xA4rz 19, 2011"
Comment 9 Andy Grundman 2011-09-19 09:50:01 UTC
"The underlying system locale is not set to UTF-8"

I'm pretty sure this is the problem, and you can see that the output doesn't look right. Why is your locale not set to UTF-8?
Comment 10 JED 2011-09-19 10:31:31 UTC
Hello Andy,

> "The underlying system locale is not set to UTF-8"
> 
> I'm pretty sure this is the problem, and you can see that the output doesn't
> look right. Why is your locale not set to UTF-8?

the system works fine without it, because I'm usually not using german locales on the console but the engish defaults. Only for some web applications, like the Squeezebox Server, I've changed the language setting to german. My family prefers german language settings ;-)

Wouldn't it be better to add translations for the month names to stay independent from the underlying system?!
Comment 11 Andy Grundman 2011-09-19 10:38:44 UTC
I suppose we could add our own translations but that would be a lot of work, and using strftime seems to work fine for everyone else.

So, you are using de_DE@euro which for some legacy reason uses ISO8859-15 encoding. You should really consider if this is what you actually want. Probably causes all kinds of other issues, I particularly liked this bug report: https://bugs.launchpad.net/ubuntu/+bug/751138
Comment 12 JED 2011-09-19 11:23:17 UTC
> I suppose we could add our own translations but that would be a lot of work,
> and using strftime seems to work fine for everyone else.
> 
> So, you are using de_DE@euro which for some legacy reason uses ISO8859-15
> encoding. You should really consider if this is what you actually want.
> Probably causes all kinds of other issues, I particularly liked this bug
> report: https://bugs.launchpad.net/ubuntu/+bug/751138

This report covers a problem which is only a problem if you're downgrading an
existing UTF-8 system to de_DE@euro. But if the default system is a de_DE@euro system which optional UTF-8 support such a problem should never appear.