Bug 15739 - UTF-8 characters in M3U playlists don't scan correctly
: UTF-8 characters in M3U playlists don't scan correctly
Status: NEW
Product: Logitech Media Server
Classification: Unclassified
Component: Playlists
: 7.5.0
: PC SuSE Linux
: P2 normal with 7 votes (vote)
: 7.6.0
Assigned To: jaswant
: charset_issues
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-19 11:39 UTC by Keith Briscoe
Modified: 2011-09-18 08:01 UTC (History)
15 users (show)

See Also:
Category: ---


Attachments
playlist is found, but comes up empty (128.01 KB, audio/mpegurl)
2011-02-13 15:57 UTC, Dominique Cote
Details
same playlist, but scans fine (137.92 KB, audio/mpegurl)
2011-02-13 15:59 UTC, Dominique Cote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Keith Briscoe 2010-02-19 11:39:11 UTC
+++ This bug was initially created as a clone of Bug #4578 +++

If a playlist filename contains UTF-8 characters (beyond the US-ASCII subset), or the playlist entries within that file contain such characters, the playlist will not scan correctly.  The current behavior is that the playlist will appear in SqueezeBox Server as an empty playlist with a mangled name.  In earlier versions of SqueezeCenter, the server would manage to show playlist entries that did not contain characters beyond the US-ASCII subset.  It has never worked entirely correctly as far as I know.

OS: OpenSUSE 11.1 (verified SC is correctly detecting UTF-8 in the Web GUI)

All files are stored locally, so no network protocol charset issues can be involved.  The character is question is 0x2019 (apostrophe).  This problem seems specific to the playlist scanner--the song scanner seems to correctly scan songs with the same character in their filenames and tags.

Sample playlist: https://bugs-archive.lyrion.org/attachment.cgi?id=3779

The file was originally created using Amarok, and modified using Kate.

Verified still a problem with squeezeboxserver-7.5.0-0.1.30158.noarch.rpm
Comment 1 Michael Herger 2010-02-27 22:25:22 UTC
*** Bug 15799 has been marked as a duplicate of this bug. ***
Comment 2 stridger 2010-03-06 03:05:17 UTC
The following patch fixes it for me in 7.4.2, but as in https://bugs-archive.lyrion.org/show_bug.cgi?id=15739 I am not sure, whether this is a fix that would work on all platforms and in all environments. I know it works on Ubuntu Karmic with unicode and non-unicode playlists.

--- M3U.pm      2010-02-27 18:50:42.000000000 +0000
+++ M3U.pm      2010-03-06 11:00:45.000000000 +0000
@@ -74,7 +74,8 @@
                        $foundBOM = 1;
                }

-               $entry = Slim::Utils::Unicode::utf8decode_guess($entry, $enc);
+#              $entry = Slim::Utils::Unicode::utf8decode_guess($entry, $enc);
+               $entry = Slim::Utils::Unicode::utf8on($entry);

                main::DEBUGLOG && $log->debug("  entry from file: $entry");

@@ -111,7 +112,7 @@
                        $entry = Win32::GetANSIPathName($entry);
                }
                else {
-                       $entry = Slim::Utils::Unicode::utf8encode_locale($entry);
+#                      $entry = Slim::Utils::Unicode::utf8encode_locale($entry);
                }

                $entry = Slim::Utils::Misc::fixPath($entry, $baseDir);
Comment 3 stridger 2010-03-06 03:08:53 UTC
(In reply to comment #2)
sorry, I meant https://bugs-archive.lyrion.org/show_bug.cgi?id=15799 above.
Comment 4 Cory Nielsen 2010-03-11 15:28:31 UTC
This seems quite similar to a bug I've seen while testing SB Server on Mac OS X 10.4.11.

I have a playlist file named "playlist - ã.m3u". In the versions of Windows I've tested, this displayed as "playlist - ã" in the WebUI and on attached SB Devices. In Max OS X 10.4.11, it displays as "playlist - ã" in the WebUI (Safari) and on the connected devices I'm testing with (SB Touch and SB Radio).
Comment 5 Keith Briscoe 2010-03-16 08:58:12 UTC
(In reply to comment #4)

I'm not sure this is the same bug.  Windows filesystems use UTF-16, and Linux/MacOS use UTF-8.  There can be a lot of sources of filename mangling when sharing files between these two worlds--Samba issues, etc.  For simplicity's sake, this bug refers to a 100% UTF-8 environment, using no network filesharing protocols or non-native filesystems.
Comment 6 stridger 2010-03-29 15:44:29 UTC
*** This bug has been confirmed by popular vote. ***
Comment 7 Andy Grundman 2010-04-06 09:14:45 UTC
We have decided to support only UTF-8 encoding within playlists and throw a warning if a non-UTF8 character is detected in a playlist.  It is difficult if not impossible to correctly guess encoding of a given playlist.
Comment 8 Chris Owens 2010-04-12 16:44:44 UTC
Andy asked me to place a reminder in this bug that he should be pull out the codepage 'guessing' logic.
Comment 9 Keith Briscoe 2010-04-28 20:57:05 UTC
For what it's worth the patch mentioned in comment#2 has no effect on this playlist: https://bugs-archive.lyrion.org/attachment.cgi?id=3779
Comment 10 Dominique Cote 2010-04-29 00:10:08 UTC
gentlemen, i have been following bug 4578 as well, since i am impacted.

is it possible that many, if not most of the problems arise because people are using windows clients to edit/create their m3u playlists? (like me)
if that were the case, would it not be somehow more functional to simply add an advanced commandline startup switch for SBS, which will force it to use some given coding for playlists?

btw. - i sometimes use windows notepad to edit my playlists and it will allow me to save an m3u in UTF-8. doesnt seem to make any difference tho...
Comment 11 Keith Briscoe 2010-04-29 11:29:13 UTC
I think there's something more fundamentally wrong here--it doesn't seem to be failing to guess the encoding, it just doesn't seem to be handling UTF-8 correctly even when going through all the correct code branches.

I doubt this is Windows-related.  My playlists are all created/edited on Linux, and have the same problems.  Windows does add some odd nonstandard BOM stuff to the beginnings of Unicode files, but presumably the server can already cope with that or there would be bigger issues elsewhere.
Comment 12 Simon Finch 2010-05-13 19:46:21 UTC
Just to confirm exactly the same here in both 7.5.0 (final) and 7.5.1 r30739 - running on Debian Squeeze, locale = UTF-8, CIFS mount (iocharset=utf8), playlists created via SBS web. I've tried the patch in comment 2, changing ID tag versions -- all to no avail. Playlists characters display properly in TextWrangler (Mac) but playlist scans choke -- 

e.g. a song called L'amitié produces L%27amiti in scanner logs and is skipped.
Comment 13 Michael Herger 2010-06-17 22:25:38 UTC
*** Bug 16302 has been marked as a duplicate of this bug. ***
Comment 14 Leif Johansson 2010-12-30 12:29:57 UTC
I'm seeing this problem with 7.6.0 - r30575
Comment 15 Leif Johansson 2010-12-30 15:31:27 UTC
(In reply to comment #14)
> I'm seeing this problem with 7.6.0 - r30575

I should add that I have an all Linux, all utf8 setup. Files are served over NFS (utf8 preserving) to the squeezecenter server which is also running all utf8, ubuntu. Both are ubuntu 10.04LTS. locale shows all LC_* are en_US.UTF-8.
Comment 16 Alan Young 2011-01-28 05:09:46 UTC
I presume that we are talking about M3U playlists here.
Comment 17 Alan Young 2011-01-28 05:36:26 UTC
Leif, are you still seeing this with 7.6 r31864? If so, please attach a sample playlist.

Keith, would you have a chance to try this with 7.6?
Comment 18 Keith Briscoe 2011-01-29 08:28:56 UTC
No, still doesn't work with 7.6 r31864 (Linux RPM, same sample playlist as comment#1).  An M3U playlist with UTF-8 characters in the playlist name is found (the scanner reports 1 playlist found), but the Playlists menu is empty.  Not sure what that means.
Comment 19 Dominique Cote 2011-02-13 15:57:30 UTC
Created attachment 7147 [details]
playlist is found, but comes up empty
Comment 20 Dominique Cote 2011-02-13 15:59:24 UTC
Created attachment 7148 [details]
same playlist, but scans fine
Comment 21 Dominique Cote 2011-02-13 16:15:05 UTC
i am able to replicate the same effect with a different cause (or am i?)...

the first attachment is a playlist "mood groovy_broken.m3u" that was viciously truncated by a playlist editing program at 128KB. i only discovered that by chance after this playlist kept coming up empty in SC. note the last track listing is incomplete.

the other attachment "mood groovy_ok.m3u" is complete and scans fine.

before everyone gets all confused, "mood groovy_ok.m3u" DOES contain all sorts of non-acscii characters, such as greek, german, chinese etc....
the only reason i can successfully scan lists like that at all is because i use the patch proposed in the original bug: https://bugs-archive.lyrion.org/show_bug.cgi?id=4578#c34

in short: ANY playlist scan problem seems to result in the entire playlist being empty. seems rather intolerant to me...

is what i am describing part of the symptoms or part of the problem?

ps. oh, and can we get this bug fixed once and for all? four years open is a long time... ;-)
Comment 22 Dominique Cote 2011-02-13 16:16:49 UTC
ups, sorry - forgot to post my system's details:

Version: 7.3.3 - 27044 @ Mon Jun 15 15:03:29 PDT 2009
Betriebssystem: Linux - DE - utf8 
Plattformarchitektur: armv5tejl-linux
Perl-Version: 5.8.8 - armv5tejl-linux-thread-multi
MySQL-Version: 5.0.27
Comment 23 Michael Herger 2011-06-29 00:43:47 UTC
*** Bug 17285 has been marked as a duplicate of this bug. ***