Bugzilla – Bug 17714
UTF-8 being "Normalized" to ?Unicode
Last modified: 2011-11-03 22:05:40 UTC
My iTunes library is on my QNAP NAS and I am using it as my server with the QPKG plugin created by FlipFlip. The samba configuration on the NAS is set to UTF-8 for the Unix and Display Charsets and the native charset for the NAS is UTF-8. I am using Windows 7 on my laptop which I use to write from iTunes to my library. Files written there with accents display correctly in Windows Explorer and in the NAS Web File Manager. When I look at the files in Putty, the accented characters display with usually two characters which are also the characters seen in the iTunes Library.xml file (although formatted as "%XX%XX" hexidecimals). However, when I scan the files, those file with accents in the path (either in the folders or the filename) fail as a File Not Found. Each two byte characters sequence is being converted to four byte sequences and appears to be Unicode. E.g. Char Original Bytes New Byte Sequence Unicode ° C2 B0 C3 82 C2 B0 U+00B0 Ô C3 94 C3 83 C2 94 U+00D4 ä C3 A4 C3 83 C2 A4 U+00E4 è C3 A8 C3 83 C2 A8 U+00E8 é C3 A9 C3 83 C2 A9 U+00E9 ó C3 B3 C3 83 C2 B3 U+00F3 ü C3 BC C3 83 C2 BC U+00FC İ C4 B0 C3 84 C2 B0 U+0130 œ C5 93 C3 85 C2 93 U+0153 The end result of this is that I have lost about 25% of my 14,000 files in my scanned library as well as about 30 playlists. One playlist of a Mozart opera has reduced from 49 to 7 tracks! Version: 7.6.1 - r33110 @ Wed Aug 17 19:53:42 MDT 2011 Hostname: QNAPNAS Server IP Address: 192.168.10.90 Server HTTP Port Number: 9001 Operating system: Linux - EN - utf8 Platform Architecture: i686-linux Perl Version: 5.10.0 - i686-linux-thread-multi Database Version: DBD::SQLite 1.34_01 (sqlite 3.7.7.1)
The same problem is occurring when generating artwork file names based on the music metadata. For example, on my Fedora 15 system, using UTF-8 for all metadata and file names, the Slim::Music::Artwork::findStandaloneArtwork function's call to the Slim::Utils::Unicode::encode_locale function modifies a sequence containing a character with a diacritical: néa 6e c3 a9 61 into: néa 6e c3 83 c2 a9 61 OS: Fedora 15 Architecture: x86_64 SqueezeboxServer: 7.6.2-0.1.33593 Perl: 5.12.4-162
This is probably the same bug that was reported as 17530 and fixed in ver 7.6.2. I have yet to test that this upgrade fixes the problem.
I reproduced this with the Nov 2 build of 7.6.2, so it's probably still there.