Bugzilla – Bug 7073
Filenames with accented characters truncated when browsing
Last modified: 2011-03-16 04:19:53 UTC
(This may be related to Bug 6754, but the behavior and my explanation are different.) When browsing Music Folder, filenames are truncated at the first accented character. "Café del Mar", for example, is truncated to "Caf". This happens in both the web interface and at my Squeezebox. I tried to follow the SC logic. When browsing folders on the web interface, Slim::Web::Pages::BrowseTree gets the text to display using Slim::Music::Info::fileName, which uses Slim::Utils::Misc::pathFromFileURL to convert the file's URL to a path. Here's a log snippet: [08-02-10 14:24:22.5603] Slim::Utils::Misc::stripRel (728) Original: /home/netshares/mp3/Albums/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 [08-02-10 14:24:22.5611] Slim::Utils::Misc::stripRel (734) Stripped: /home/netshares/mp3/Albums/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 [08-02-10 14:24:22.5623] Slim::Utils::Misc::fixPath (711) Fixed: 15 - Björk - It's in Our Hands.mp3 to file:///home/netshares/mp3/Albums/Bj%F6rk%20-%20Greatest%20Hits/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-02-10 14:24:22.5631] Slim::Utils::Misc::fixPath (713) Base: /home/netshares/mp3/Albums/Björk - Greatest Hits [08-02-10 14:24:22.5692] Slim::Utils::Misc::pathFromFileURL (415) Got /home/netshares/mp3/Albums/Bj%F6rk%20-%20Greatest%20Hits/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 from file url file:///home/netshares/mp3/Albums/Bj%F6rk%20-%20Greatest%20Hits/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-02-10 14:24:22.5702] Slim::Utils::Misc::pathFromFileURL (431) Extracted: /home/netshares/mp3/Albums/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 from file:///home/netshares/mp3/Albums/Bj%F6rk%20-%20Greatest%20Hits/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 That looks like a torturous path to get back to the original string, but I'm sure there's good reason. Anyway, in this case, the result from pathFromFileURL is UTF-8 encoded. Slim::Music::Info::fileName finally uses Slim::Utils::Unicode::utf8decode_locale to convert the string to UTF-8, if it is not already so. The code is: sub utf8decode_locale { my $string = shift; if ($string && $] > 5.007 && !Encode::is_utf8($string)) { $string = Encode::decode($lc_ctype, $string, $FB_QUIET); } return $string; } I'm not sure what is happening here. I think maybe my string is UTF-8 encoded, but not UTF-8 flagged, so Encode::is_utf8 returns false, then Encode::decode is used to try to convert the string from the current locale to UTF-8, even though it is already UTF-8. (I quite likely am misunderstanding something here.) Regardless of the explanation, the string returned by utf8decode_locale is truncated at the position of the first accented character. In my example, "15 - Björk - It's in Our Hands.mp3" gets converted to "15 - Bj". That is what gets displayed in the UI.
I forgot the following- SqueezeCenter Version: 7.0 - 17379 - Red Hat - EN - utf8 Perl Version: 5.8.8 x86_64-linux MySQL Version: 5.0.45 Platform Architecture: x86_64-linux Mandriva Free 2007, kernel 2.6.17-10mdv
*** Bug 7353 has been marked as a duplicate of this bug. ***
That's another odd one... While I now can reproduce it on a Windows system, it seems to be working fine on OSX 10.5 and Linux for me. Are these filenames correctly displayed on Linux? Or have they been created by Windows over samba or something?
My server: Locale is en_US.UTF-8. Filenames are encoded iso-8859-1; e.g., 'ls -b' shows "Björk" to be "Bj\366rk". I am indeed using Samba, with the option 'unix charset = ISO8859-1', which is the only way I could make everything happy. The files are generally created from a Windows environment. The filenames do show correctly in all environments that I use, including Linux (both console and x apps), and both Windows and OSX via Samba shares. Please note that SC is able to find, index, and play these files correctly. As shown in my first comment, above, when browsing by folder, it seems to manipulate the filename just fine until it hits utf8decode_locale, whereupon it gets truncated.
I've checked in a few more file path and encoding related changes. Can you still reproduce this issue with revision 17910 or later?
The problem persists for me with 17914.
I'm having a hard time reproducing your issue. Whatever I do, my filenames are correctly stored and displayed on the Linux box, as well as on Mac and Windows, when connecting to this Linux box using Samba. I'm not sure we can do a lot to fix issues caused by the filesystem's or share's configuration. What's the reason you try to iso8859 encode filenames on a utf8 filesystem?
Ross - as you reported this for local files, too, could you please give the latest revision another try?
"I am indeed using Samba, with the option 'unix charset = ISO8859-1', which is the only way I could make everything happy." - what wouldn't be happy with that value set to utf-8? "Everything happy" seems a bit odd when reporting a bug ;-)
Latest build works locally, no longer seeing 7353, everything is happy for me. :)
(In reply to comment #9) > "I am indeed using Samba, with the option 'unix charset = ISO8859-1', which is > the only way I could make everything happy." - what wouldn't be happy with that > value set to utf-8? "Everything happy" seems a bit odd when reporting a bug ;-) To clarify, by "everything happy", I obviously did not mean SC7! I run a bunch of legacy software (the old free twonky musicserver, yarrs, an audiotron toc generator, a customized phatbox index builder, etc.) for my toys, and my recollection is that some of it doesn't play nice with UTF-8. Perhaps I should change my system locale, but honestly, until now (i.e., with SC7), I hadn't run into any problems. In fact, I've been probably been running a UTF-8 system for almost a year (the last time I rebuilt it) without even noticing. Despite my 8859-1 filenames, I still consider this filename truncation issue an SC7 bug because it manages to locate, index, and play the files just fine. Also, the "location" field on the "song info" page is correct, meaning that it is perfectly capable of manipulating my "non-standard" filenames. For some reason, the "Music Folder" logic is different, and it mangles the filenames. But having said that, your comments make it clear that this behavior should affect very few people (maybe only me), so I probably shouldn't expect a fix. At least the thing is written in Perl, so I can fix (break?) it myself for my installation. One final note for this post: I renamed some test files into UTF-8 format (I think), so I could see if the problem persisted. By the way, I found this to be a royal pain in the neck in my bash shell; I would only wish it upon my worst enemies. (One more reason I think I shall stick with 8859-1 for now.) It indeed did not mangle the filename when encoded in UTF-8. See the following log snippets (the utf8decode_locale log entry is something I added to the code to facilitate my debugging): With "Björk" in 8859-1 ("Bj\366rk")... [08-03-19 07:01:49.4910] Slim::Utils::Misc::stripRel (776) Original: /home/netshares/mp3/Björk - Greatest HitsXXXX/15 - Björk - It's in Our Hands.mp3 [08-03-19 07:01:49.4917] Slim::Utils::Misc::stripRel (782) Stripped: /home/netshares/mp3/Björk - Greatest HitsXXXX/15 - Björk - It's in Our Hands.mp3 [08-03-19 07:01:49.4930] Slim::Utils::Misc::fixPath (759) Fixed: 15 - Björk - It's in Our Hands.mp3 to file:///home/netshares/mp3/Bj%F6rk%20-%20Greatest%20HitsXXXX/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 07:01:49.4937] Slim::Utils::Misc::fixPath (761) Base: /home/netshares/mp3/Björk - Greatest HitsXXXX [08-03-19 07:01:49.4999] Slim::Utils::Misc::pathFromFileURL (463) Got /home/netshares/mp3/Bj%F6rk%20-%20Greatest%20HitsXXXX/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 from file url file:///home/netshares/mp3/Bj%F6rk%20-%20Greatest%20HitsXXXX/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 07:01:49.5009] Slim::Utils::Misc::pathFromFileURL (479) Extracted: /home/netshares/mp3/Björk - Greatest HitsXXXX/15 - Björk - It's in Our Hands.mp3 from file:///home/netshares/mp3/Bj%F6rk%20-%20Greatest%20HitsXXXX/15%20-%20Bj%F6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 07:01:49.5017] Slim::Utils::Unicode::utf8decode_locale (502) LAK - before decode: 15 - Björk - It's in Our Hands.mp3 [08-03-19 07:01:49.5025] Slim::Utils::Unicode::utf8decode_locale (511) LAK - after decode: 15 - Bj With "Björk" in UTF-8 ("Björk")... [08-03-19 13:28:41.1957] Slim::Utils::Misc::stripRel (776) Original: /home/netshares/mp3/BjorkTest/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 [08-03-19 13:28:41.1964] Slim::Utils::Misc::stripRel (782) Stripped: /home/netshares/mp3/BjorkTest/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 [08-03-19 13:28:41.1976] Slim::Utils::Misc::fixPath (759) Fixed: 15 - Björk - It's in Our Hands.mp3 to file:///home/netshares/mp3/BjorkTest/Bj%C3%B6rk%20-%20Greatest%20Hits/15%20-%20Bj%C3%B6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 13:28:41.1984] Slim::Utils::Misc::fixPath (761) Base: /home/netshares/mp3/BjorkTest/Björk - Greatest Hits [08-03-19 13:28:41.2046] Slim::Utils::Misc::pathFromFileURL (463) Got /home/netshares/mp3/BjorkTest/Bj%C3%B6rk%20-%20Greatest%20Hits/15%20-%20Bj%C3%B6rk%20-%20It%27s%20in%20Our%20Hands.mp3 from file url file:///home/netshares/mp3/BjorkTest/Bj%C3%B6rk%20-%20Greatest%20Hits/15%20-%20Bj%C3%B6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 13:28:41.2056] Slim::Utils::Misc::pathFromFileURL (479) Extracted: /home/netshares/mp3/BjorkTest/Björk - Greatest Hits/15 - Björk - It's in Our Hands.mp3 from file:///home/netshares/mp3/BjorkTest/Bj%C3%B6rk%20-%20Greatest%20Hits/15%20-%20Bj%C3%B6rk%20-%20It%27s%20in%20Our%20Hands.mp3 [08-03-19 13:28:41.2065] Slim::Utils::Unicode::utf8decode_locale (502) LAK - before decode: 15 - Björk - It's in Our Hands.mp3 [08-03-19 13:28:41.2072] Slim::Utils::Unicode::utf8decode_locale (511) LAK - after decode: 15 - Björk - It's in Our Hands.mp3 Can anyone enlighten me on the significance of the  (0xC2) that shows up between the bytes for the UTF-8 encoded ö (0xC3, 0xB6) in these log entries?
As you guessed correctly in the initial comment the problem lies in utf8decode_locale: this routine is using your system's locale to encode the data read from the disk. As your locale is utf8, this will break file names stored as iso-8859-1. We can't fix this. The routine is doing what it's expected to do. I see two workarounds for you: - run SqueezeCenter with it's own locale: LC_ALL=iso-8859-1 slimerver.pl (you might add the LC_ALL definition to the startup script) - this will lead to some warnings about locale being badly set, but eventually Perl will fall back to using iso-8859-1 as its locale - patch Slim/Music/Info.pm fileName() to use utf8decode_guess instead of utf8decode_locale. This will ignore the locale, but guess from the actual value how to encode the string. I won't check this change in, as guessing is no more than that: a guess. It breaks things on OSX installations (and very likely others, too) where _some_ letter combinations would falsly be interpreted as utf8 double byte values.
Closing as "wontfix" as this is not a bug in SC, but a system configuration issue. Thanks for your understanding.
Reduce number of active targets for SC