Bugzilla – Bug 503
Problem with UTF-8 encoded ID tags
Last modified: 2008-08-18 10:53:01 UTC
The attached file does not show properly on the SlimServer web page or on the Squeezebox display. The strange-looking characters should have formed an AE digraph (lower case). I think this tag is UTF-8, and the display is thinking it's Latin-1; I do not know if the tag format lets you know which it's supposed to be.
Created attachment 106 [details] Music file with Norwegian characters in tags
OK, I had the same problems, with Hungarian letters (non latin1). I was working for a few hours on it, then realized the obvious sollution. This only solves the web interface problem, but it's better, than nothing... So, what you have to do, is either set the page encoding in the browser (sollution depends on browser, but it's never more than 3 clicks'o your mouse), OR (the better way) edit the HTMLs on your server. The 2nd sollution is highly superior, as it's permanent. What you have to do is insert: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> into each html of your favourite skin. Be careful though, only to insert this into real htmls and not fragments, and always between the <head> and the </head> tag. In some files you'll find : <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> Feel free to replace these. You also have to be careful, as from now on, all characters will be interpreted as UTF-8, so some ISO-8859-X (non 8859-1) chars will be misinterpreted. So you can go either with a full UTF-8 collection or a full ISO-8859-X one.... This sollution defenetly works, tried it with your attached file, and it displays nicely, even among my hungarian files ;-) (See attachment for proof) Some notes/questions for developers: - wouldn't it be possible to make this a bit more automated. A config option for html encoding maybe? - am I right by saying it's possible to fix this on the player (squeezebox) by supplying another font-file? thx: Zsolt
Created attachment 197 [details] It works! (The proof)
Mucking with the webserver doesn't solve the problem on the Squeezebox.... UTF-8 is a slightly bigger problem than a "new font", since the number of bytes in a character varies from character to character....
some more info, digging around on the net.... according to the ID3 current standard, http://www.id3.org/id3v2.3.0.html, strings are either ISO 8859-1 or Unicode UTF-2 (start with FEFF), in either byte order. The Ogg Vorbis specification, http://www.xiph.org/ogg/vorbis/doc/v-comment.html, says that strings are in UTF-8, which is an ASCII-compatible encoding of Unicode. Thus, it's a mistake (but probably a common one..... arrgh) to display strings from Ogg Vorbis files as if they were ISO 8859-1. Life ain't easy....
Well, I think you've managed to find an obsolete version of the id3v2 spec. In reality (http://www.id3.org/develop.html), the possibilities for encodings are: 0 No restrictions 1 Strings are only encoded with ISO-8859-1 [ISO-8859-1] or UTF-8 [UTF-8]. (extended header information / q - Text encoding restrictions) This means, that if you want follow the strict rules, you don't use any UTF-2, ISO-8859-2/3/5/.. or similar encodings! The problem here, is that almost NONE of the APIs use the specification correctly (they do, but don't use restrictions, and this makes our life a bit more difficult). MS Mediaplayer (even v10!!!), or winamp for insance do not use encodings correctly!! The situation is soooo bad, that they cannot even display UTF-8 encodings correctly (just like the sqeezebox). What makes things better here, is that the perl library slimserver uses handles encodings well, it's just the output, that's not interpreted correctly. By default! This is where my "mucking" comes to play, as it helps on one of the "frontends", the web-based one. "All we have to do" is to convince the boxes to work right. So tags are read the right way from mp3s, stored in the database correctly. Trust me, this is WAY more important, and would be much harder to resolve! As a software developer in a "non latin-1" country I've been playing around with encoding-problems quite a lot. English speaking people usually can't even perceive what an important thing this is! But IMHO slimdevices is going the right way, maybe the developers just need more help from people like you or me, to test these features... Dean, plase write someting encouraging to us.... Zsolt
Dan Sully, I believe, is already working on this...under one or all of teh following similar bugs: 31, 519, 534
I've sent an initial patch to the developers list on 12/02/2004. http://lists.slimdevices.com/archives/developers/2004-December/010855.html
The 6.0/trunk tree fully supports UTF-8