Bug 503 - Problem with UTF-8 encoded ID tags
: Problem with UTF-8 encoded ID tags
Status: RESOLVED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Localization
: unspecified
: All All
: P2 enhancement (vote)
: ---
Assigned To: Dan Sully
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-18 13:16 UTC by Harald Alvestrand
Modified: 2008-08-18 10:53 UTC (History)
1 user (show)

See Also:
Category: ---


Attachments
Music file with Norwegian characters in tags (2.86 MB, application/octet-stream)
2004-08-18 13:20 UTC, Harald Alvestrand
Details
It works! (The proof) (49.11 KB, image/png)
2004-11-07 07:38 UTC, Zsolt Horv�th
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Alvestrand 2004-08-18 13:16:53 UTC
The attached file does not show properly on the SlimServer web page or on the 
Squeezebox display. The strange-looking characters should have formed an AE 
digraph (lower case).
I think this tag is UTF-8, and the display is thinking it's Latin-1; I do not 
know if the tag format lets you know which it's supposed to be.
Comment 1 Harald Alvestrand 2004-08-18 13:20:08 UTC
Created attachment 106 [details]
Music file with Norwegian characters in tags
Comment 2 Zsolt Horv�th 2004-11-07 07:36:55 UTC
OK, I had the same problems, with Hungarian letters (non latin1). I was working for a 
few hours on it, then realized the obvious sollution. This only solves the web interface 
problem, but it's better, than nothing... 
 
So, what you have to do, is either set the page encoding in the browser (sollution 
depends on browser, but it's never more than 3 clicks'o your mouse), OR (the better 
way) edit the HTMLs on your server. The 2nd sollution is highly superior, as it's 
permanent.  What you have to do is insert: 
 
                <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
 
into each html of your favourite skin. Be careful though, only to insert this into real 
htmls and not fragments, and always between the <head> and the </head> tag. In 
some files you'll find : 
 
                <meta http-equiv="Content-Type" content="text/html; 
charset=iso-8859-1"> 
 
Feel free to replace these. 
 
You also have to be careful, as from now on, all characters will be interpreted as 
UTF-8, so some ISO-8859-X (non 8859-1) chars will be misinterpreted. So you can go 
either with a full UTF-8 collection or a full ISO-8859-X one.... 
 
This sollution defenetly works, tried it with your attached file, and it displays nicely, 
even among my hungarian files ;-) (See attachment for proof) 
 
Some notes/questions for developers: 
- wouldn't it be possible to make this a bit more automated. A config option for html 
encoding maybe? 
- am I right by saying it's possible to fix this on the player (squeezebox) by supplying 
another font-file? 
 
 
thx: Zsolt 
Comment 3 Zsolt Horv�th 2004-11-07 07:38:17 UTC
Created attachment 197 [details]
It works! (The proof)
Comment 4 Harald Alvestrand 2004-11-07 12:09:48 UTC
Mucking with the webserver doesn't solve the problem on the Squeezebox.... UTF-8 
is a slightly bigger problem than a "new font", since the number of bytes in a 
character varies from character to character....
Comment 5 Harald Alvestrand 2004-11-08 03:45:14 UTC
some more info, digging around on the net....

according to the ID3 current standard, http://www.id3.org/id3v2.3.0.html, 
strings are either ISO 8859-1 or Unicode UTF-2 (start with FEFF), in either byte 
order.
The Ogg Vorbis specification, http://www.xiph.org/ogg/vorbis/doc/v-comment.html, 
says that strings are in UTF-8, which is an ASCII-compatible encoding of 
Unicode. Thus, it's a mistake (but probably a common one..... arrgh) to display 
strings from Ogg Vorbis files as if they were ISO 8859-1.

Life ain't easy....
Comment 6 Zsolt Horv�th 2004-11-08 13:13:12 UTC
Well, I think you've managed to find an obsolete version of the id3v2 spec. In reality 
(http://www.id3.org/develop.html), the possibilities for encodings are: 
 
       0    No restrictions 
       1    Strings are only encoded with ISO-8859-1 [ISO-8859-1] or 
            UTF-8 [UTF-8]. 
 
(extended header information /   q - Text encoding restrictions) 
 
This means, that if you want follow the strict rules, you don't use any UTF-2,  
ISO-8859-2/3/5/.. or similar encodings! The problem here, is that almost NONE of the 
APIs use the specification correctly (they do, but don't use restrictions, and this makes 
our life a bit more difficult). MS Mediaplayer (even v10!!!), or winamp for insance do not 
use encodings correctly!! The situation is soooo bad, that they cannot even display 
UTF-8 encodings correctly (just like the sqeezebox).  
 
What makes things better here, is that the perl library slimserver uses handles 
encodings well, it's just the output, that's not interpreted correctly. By default! This is 
where my "mucking" comes to play, as it helps on one of the "frontends", the 
web-based one.  "All we have to do" is to convince the boxes to work right.  
 
So tags are read the right way from mp3s, stored in the database correctly. Trust me, 
this is WAY more important, and would be much harder to resolve! As a software 
developer in a "non latin-1" country I've been playing around with encoding-problems 
quite a lot.  English speaking people usually can't even perceive what an important 
thing this is!  But IMHO slimdevices is going the right way, maybe the developers just 
need more help from people like you or me, to test these features... 
 
Dean, plase write someting encouraging to us.... 
 
Zsolt 
 
Comment 7 KDF 2004-11-09 10:49:35 UTC
Dan Sully, I believe, is already working on this...under one or all of teh
following similar bugs:
31, 519, 534
Comment 8 Dan Sully 2004-12-03 10:08:08 UTC
I've sent an initial patch to the developers list on 12/02/2004.

http://lists.slimdevices.com/archives/developers/2004-December/010855.html
Comment 9 Dan Sully 2005-01-13 15:54:32 UTC
The 6.0/trunk tree fully supports UTF-8