Bugzilla – Bug 15739
UTF-8 characters in M3U playlists don't scan correctly
Last modified: 2011-09-18 08:01:06 UTC
+++ This bug was initially created as a clone of Bug #4578 +++ If a playlist filename contains UTF-8 characters (beyond the US-ASCII subset), or the playlist entries within that file contain such characters, the playlist will not scan correctly. The current behavior is that the playlist will appear in SqueezeBox Server as an empty playlist with a mangled name. In earlier versions of SqueezeCenter, the server would manage to show playlist entries that did not contain characters beyond the US-ASCII subset. It has never worked entirely correctly as far as I know. OS: OpenSUSE 11.1 (verified SC is correctly detecting UTF-8 in the Web GUI) All files are stored locally, so no network protocol charset issues can be involved. The character is question is 0x2019 (apostrophe). This problem seems specific to the playlist scanner--the song scanner seems to correctly scan songs with the same character in their filenames and tags. Sample playlist: https://bugs-archive.lyrion.org/attachment.cgi?id=3779 The file was originally created using Amarok, and modified using Kate. Verified still a problem with squeezeboxserver-7.5.0-0.1.30158.noarch.rpm
*** Bug 15799 has been marked as a duplicate of this bug. ***
The following patch fixes it for me in 7.4.2, but as in https://bugs-archive.lyrion.org/show_bug.cgi?id=15739 I am not sure, whether this is a fix that would work on all platforms and in all environments. I know it works on Ubuntu Karmic with unicode and non-unicode playlists. --- M3U.pm 2010-02-27 18:50:42.000000000 +0000 +++ M3U.pm 2010-03-06 11:00:45.000000000 +0000 @@ -74,7 +74,8 @@ $foundBOM = 1; } - $entry = Slim::Utils::Unicode::utf8decode_guess($entry, $enc); +# $entry = Slim::Utils::Unicode::utf8decode_guess($entry, $enc); + $entry = Slim::Utils::Unicode::utf8on($entry); main::DEBUGLOG && $log->debug(" entry from file: $entry"); @@ -111,7 +112,7 @@ $entry = Win32::GetANSIPathName($entry); } else { - $entry = Slim::Utils::Unicode::utf8encode_locale($entry); +# $entry = Slim::Utils::Unicode::utf8encode_locale($entry); } $entry = Slim::Utils::Misc::fixPath($entry, $baseDir);
(In reply to comment #2) sorry, I meant https://bugs-archive.lyrion.org/show_bug.cgi?id=15799 above.
This seems quite similar to a bug I've seen while testing SB Server on Mac OS X 10.4.11. I have a playlist file named "playlist - ã.m3u". In the versions of Windows I've tested, this displayed as "playlist - ã" in the WebUI and on attached SB Devices. In Max OS X 10.4.11, it displays as "playlist - ã" in the WebUI (Safari) and on the connected devices I'm testing with (SB Touch and SB Radio).
(In reply to comment #4) I'm not sure this is the same bug. Windows filesystems use UTF-16, and Linux/MacOS use UTF-8. There can be a lot of sources of filename mangling when sharing files between these two worlds--Samba issues, etc. For simplicity's sake, this bug refers to a 100% UTF-8 environment, using no network filesharing protocols or non-native filesystems.
*** This bug has been confirmed by popular vote. ***
We have decided to support only UTF-8 encoding within playlists and throw a warning if a non-UTF8 character is detected in a playlist. It is difficult if not impossible to correctly guess encoding of a given playlist.
Andy asked me to place a reminder in this bug that he should be pull out the codepage 'guessing' logic.
For what it's worth the patch mentioned in comment#2 has no effect on this playlist: https://bugs-archive.lyrion.org/attachment.cgi?id=3779
gentlemen, i have been following bug 4578 as well, since i am impacted. is it possible that many, if not most of the problems arise because people are using windows clients to edit/create their m3u playlists? (like me) if that were the case, would it not be somehow more functional to simply add an advanced commandline startup switch for SBS, which will force it to use some given coding for playlists? btw. - i sometimes use windows notepad to edit my playlists and it will allow me to save an m3u in UTF-8. doesnt seem to make any difference tho...
I think there's something more fundamentally wrong here--it doesn't seem to be failing to guess the encoding, it just doesn't seem to be handling UTF-8 correctly even when going through all the correct code branches. I doubt this is Windows-related. My playlists are all created/edited on Linux, and have the same problems. Windows does add some odd nonstandard BOM stuff to the beginnings of Unicode files, but presumably the server can already cope with that or there would be bigger issues elsewhere.
Just to confirm exactly the same here in both 7.5.0 (final) and 7.5.1 r30739 - running on Debian Squeeze, locale = UTF-8, CIFS mount (iocharset=utf8), playlists created via SBS web. I've tried the patch in comment 2, changing ID tag versions -- all to no avail. Playlists characters display properly in TextWrangler (Mac) but playlist scans choke -- e.g. a song called L'amitié produces L%27amiti in scanner logs and is skipped.
*** Bug 16302 has been marked as a duplicate of this bug. ***
I'm seeing this problem with 7.6.0 - r30575
(In reply to comment #14) > I'm seeing this problem with 7.6.0 - r30575 I should add that I have an all Linux, all utf8 setup. Files are served over NFS (utf8 preserving) to the squeezecenter server which is also running all utf8, ubuntu. Both are ubuntu 10.04LTS. locale shows all LC_* are en_US.UTF-8.
I presume that we are talking about M3U playlists here.
Leif, are you still seeing this with 7.6 r31864? If so, please attach a sample playlist. Keith, would you have a chance to try this with 7.6?
No, still doesn't work with 7.6 r31864 (Linux RPM, same sample playlist as comment#1). An M3U playlist with UTF-8 characters in the playlist name is found (the scanner reports 1 playlist found), but the Playlists menu is empty. Not sure what that means.
Created attachment 7147 [details] playlist is found, but comes up empty
Created attachment 7148 [details] same playlist, but scans fine
i am able to replicate the same effect with a different cause (or am i?)... the first attachment is a playlist "mood groovy_broken.m3u" that was viciously truncated by a playlist editing program at 128KB. i only discovered that by chance after this playlist kept coming up empty in SC. note the last track listing is incomplete. the other attachment "mood groovy_ok.m3u" is complete and scans fine. before everyone gets all confused, "mood groovy_ok.m3u" DOES contain all sorts of non-acscii characters, such as greek, german, chinese etc.... the only reason i can successfully scan lists like that at all is because i use the patch proposed in the original bug: https://bugs-archive.lyrion.org/show_bug.cgi?id=4578#c34 in short: ANY playlist scan problem seems to result in the entire playlist being empty. seems rather intolerant to me... is what i am describing part of the symptoms or part of the problem? ps. oh, and can we get this bug fixed once and for all? four years open is a long time... ;-)
ups, sorry - forgot to post my system's details: Version: 7.3.3 - 27044 @ Mon Jun 15 15:03:29 PDT 2009 Betriebssystem: Linux - DE - utf8 Plattformarchitektur: armv5tejl-linux Perl-Version: 5.8.8 - armv5tejl-linux-thread-multi MySQL-Version: 5.0.27
*** Bug 17285 has been marked as a duplicate of this bug. ***