Bug 17714 - UTF-8 being "Normalized" to ?Unicode
: UTF-8 being "Normalized" to ?Unicode
Status: UNCONFIRMED
Product: Logitech Media Server
Classification: Unclassified
Component: Scanner
: 7.6.1
: Other Linux (other)
: -- major with 1 vote (vote)
: ---
Assigned To: Unassigned bug - please assign me!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-01 01:49 UTC by Ian
Modified: 2011-11-03 22:05 UTC (History)
0 users

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ian 2011-11-01 01:49:51 UTC
My iTunes library is on my QNAP NAS and I am using it as my server with the QPKG plugin created by FlipFlip.  The samba configuration on the NAS is set to UTF-8 for the Unix and Display Charsets and the native charset for the NAS is UTF-8.  I am using Windows 7 on my laptop which I use to write from iTunes to my library.  Files written there with accents display correctly in Windows Explorer and in the NAS Web File Manager.  When I look at the files in Putty, the accented characters display with usually two characters which are also the characters seen in the iTunes Library.xml file (although formatted as "%XX%XX" hexidecimals).  However, when I scan the files, those file with accents in the path (either in the folders or the filename) fail as a File Not Found.  Each two byte characters sequence is being converted to four byte sequences and appears to be Unicode.  E.g.

Char    Original Bytes          New Byte Sequence                       Unicode
°	C2	B0		C3	82	C2	B0		U+00B0
Ô	C3	94		C3	83	C2	94		U+00D4
ä	C3	A4		C3	83	C2	A4		U+00E4
è	C3	A8		C3	83	C2	A8		U+00E8
é	C3	A9		C3	83	C2	A9		U+00E9
ó	C3	B3		C3	83	C2	B3		U+00F3
ü	C3	BC		C3	83	C2	BC		U+00FC
İ	C4	B0		C3	84	C2	B0		U+0130
œ	C5	93		C3	85	C2	93		U+0153

The end result of this is that I have lost about 25% of my 14,000 files in my scanned library as well as about 30 playlists.  One playlist of a Mozart opera has reduced from 49 to 7 tracks!

Version: 7.6.1 - r33110 @ Wed Aug 17 19:53:42 MDT 2011
Hostname: QNAPNAS
Server IP Address: 192.168.10.90
Server HTTP Port Number: 9001
Operating system: Linux - EN - utf8
Platform Architecture: i686-linux
Perl Version: 5.10.0 - i686-linux-thread-multi
Database Version: DBD::SQLite 1.34_01 (sqlite 3.7.7.1)
Comment 1 Sean Sheedy 2011-11-03 12:51:40 UTC
The same problem is occurring when generating artwork file names based on the music metadata.  For example, on my Fedora 15 system, using UTF-8 for all metadata and file names, the Slim::Music::Artwork::findStandaloneArtwork function's call to the Slim::Utils::Unicode::encode_locale function modifies a sequence containing a character with a diacritical:

néa
6e c3 a9 61

into:

néa
6e c3 83 c2 a9 61


OS:  Fedora 15
Architecture:  x86_64
SqueezeboxServer:  7.6.2-0.1.33593
Perl:  5.12.4-162
Comment 2 Ian 2011-11-03 17:15:02 UTC
This is probably the same bug that was reported as 17530 and fixed in ver 7.6.2.  I have yet to test that this upgrade fixes the problem.
Comment 3 Sean Sheedy 2011-11-03 22:05:40 UTC
I reproduced this with the Nov 2 build of 7.6.2, so it's probably still there.