Bugzilla – Bug 5686
Two sequential umlauted characters are not displayed correctly
Last modified: 2009-09-08 09:30:31 UTC
I'm using todays SVN snapshot of SqueezeCenter. If I have three tracks with following names: Jonain päivänä.mp3 (Jonain pa"iva"na") Syytön.mp3 (Syyto"n) Venäjää.mp3 (Vena"ja"a") The first two tracks are displayed correctly in the web interface. The third one is not. This happens always when there are two sequential umlauted chars in the file name. If there is only one, it's ok. See http://b.bbbs.net/umlaut.png for a screen capture, "Venäjää" is the last track. This worked perfectly with 6.5 version. File format does not matter, same thing happens with OGGs and FLACs.
QA to repro
can't reproduce this. Neither with some random name like äöéà.mp3 nor with the ones provided in the report. They show up correctly here (OSX).
Created attachment 2525 [details] test mp3 files Test files with ID3 tags: one with Latin-1 charset and one with UTF-8.
Created attachment 2526 [details] This is how SqueezeCenter sees testmp3 files
Did you tag the files? All of my files have ID3 tags. Please test with attached test mp3 files. One is tagged with Latin-1 charset, one with UTF-8. Both are displayed incorrectly. As ID3 does not specify charset (does it?), it's ok to one of them to display incorrectly. What charset should I use?
If I look directly to the MySQL's tracks-table I see thing like: mysql> select url, title from tracks where url like '%Parhaat%/2/%'; +---------------------------------------------------------------------------------+----------------------+ | url | title | +---------------------------------------------------------------------------------+----------------------+ ... | file:///home/music/data/Dingo/1999-Parhaat/2/12-Pyh%C3%A4_klaani.mp3 | Pyhä klaani | ... | file:///home/music/data/Dingo/1999-Parhaat/2/17-N%C3%A4hd%C3%A4%C3%A4n_taas.mp3 | Nהhdההn taas | +---------------------------------------------------------------------------------+----------------------+ URL is encoded correctly, title is not.
QA: please try to reproduce on Windows and Linux.
Steven: Can you take a look at this?
Kim - what application did you use to tag these files? The "latin" version shows fine in mp3tag and iTunes. But none of the utf versions would be displayed correctly in any of the tested applications.
> Kim - what application did you use to tag these files? It's my own application using id3lib. > The "latin" version shows fine in mp3tag and iTunes. Yes, they are correct in every program except SqueezeCenter. That's the problem. > But none of the utf versions would be You can ignore utf files. ...except that in SqueezeCenter they show funny thing: utf: double-umlaut (f.ex. "väärin") is ok, single (f.ex. "äiti") is not (or was it vice versa?) latin: single-umlaut is ok, double is not I assume SqueezeCenter does charset conversion twice (id3-latin to utf to utf), or something... It's using utf internally, isn't it?
I have been able to reproduce this issue with both Windows and Linux. Mac OS X does not appear to have this issue. I also see that it worked correctly with 6.5 and not 7. Linux is able to read both UTF and ISO encoded ID3 tags even though I believe ISO is more compatible. Michael did you want to take this one?
Ping michael....
Kim - do you by chance have Encode::Detect::Detector installed on your system? Would it work if you removed it (temporarily)? And did you tag on Windows or Linux? The tags seem to have incomplete encoding information. SC then tries to "guess" what encoding it is. Detector recognizes it as cp1255 (Windows), whereas Encode::Guess (used if the above module is not installed) correctly recognizes it as iso-8859-1. I'm still not sure whether this is a bug or just unlucky guessing (compared to other applications).
> Kim - do you by chance have Encode::Detect::Detector installed on your system? Systemwide no. Of course a copy of it comes with SC, which is installed... Do you want me to delete SC's copy of it? > And did you tag on Windows or Linux? Linux. My filesystem is UTF-8, my own program is using iconv("ISO-8859-1","UTF-8") and id3lib to tag the files. > The tags seem to have incomplete encoding information. True... id3lib seems to have: virtual bool SetEncoding(ID3_TextEnc enc) = 0; ...with ID3TE_UTF8 (3) and ID3TE_ISO8859_1 (0) defines. I'll try if that helps.
Created attachment 2774 [details] hard-code specs compliant latin1 encoding Please try this patch. It removes the encoding detection wizardry and hard codes the latin1 decoding (which should be expected in this case according to the specs).
Works great with that patch. I don't see any errors anymore.
Michael will be putting this in trunk for 7.0.1. with dan's feedback.
Kim - can you still reproduce this issue with the latest 7.0.1 build? I've checked in a few encoding related changes which might influence your case too.
I removed the patch, updated to SVN 17943 and did full rescan: the problem is still there.
change 17945 - thanks Kim QA - please do some thorough testing on whatever oddly tagged files you can find (mp3 only). I tested with my ~8000 titles on OSX/WinXP - but then my files are correctly tagged ;-)
I guarantee that change 17945 will cause more pain for everyone than fixing this one person's problem.
Dan - thanks for your feedback. I've feared something like that (that's why I included you on the CC list ;-)). We'll re-consider the fix after the easter-weekend.
change 17979 - revert change 17945 Kim, I'm sorry for this. You know the "fix" for your case. Thanks for the understanding.
Will take another look for the next major release.
*** Bug 10089 has been marked as a duplicate of this bug. ***
*** Bug 5898 has been marked as a duplicate of this bug. ***
Please see also dupe bug 5898 (helpful attachments)
I repeat a bit myself, but perhaps this could give some kind of hint to you. I downgraded Squeezecenter back from version 7.2 to version 7.1 and the same issue remains still. Then I followed instructions in the following url: http://kimmo.suominen.com/archives/2008/06/squeezecenter-7/ Now everything works fine again with 7.1. I did not get this working with version 7.2.
> Now everything works fine again with 7.1. I did not get this working with > version 7.2. Sami, I've been using Michael's patch (https://bugs-archive.lyrion.org/attachment.cgi?id=2774) with 7.1, 7.2 and 7.3. It works fine for me on all versions.
Ok, Thanks Kim!
Adding Andy - good test case for C scanner code.
At first glance the title string in your utf test file is not encoded correctly. id3v2.3 requires UTF-16, and the string is not encoded that way. What tool was this file encoded with?
Andy, as mentioned in comment #10, you can ignore utf test files. Same comment says that they were tagged with my own program using id3lib. Currently I'm using Latin1-tagged files. 7.3-trunk with Michael's patch works fine for me. Without the patch it doesn't because I don't add encoding info.
I'm confused. Your screenshot shows only the utf version has display issues. This is because your program is not tagging it properly.
I'm closing this as won't fix. SC cannot be expected to support ID3 tags from buggy implementations. You should read the ID3v2.3 spec, which says: If nothing else is said a string is represented as ISO-8859-1 [ISO-8859-1] characters in the range $20 - $FF. Such strings are represented as <text string>, or <full text string> if newlines are allowed, in the frame descriptions. All Unicode strings [UNICODE] use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE FF) to identify the byte order. Try comparing the tags generated by your program to the tags generated by MP3Tag.
> Your screenshot shows only the utf version has display issues. My screenshot shows that both utf and latin (ISO-8859-1) have display issues.