Bug 5686 - Two sequential umlauted characters are not displayed correctly
: Two sequential umlauted characters are not displayed correctly
Status: RESOLVED WONTFIX
Product: Logitech Media Server
Classification: Unclassified
Component: Web Interface
: 7.4.0
: PC RedHat Linux
: P2 normal (vote)
: 7.4.0
Assigned To: Michael Herger
: charset_issues
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-06 04:53 UTC by Kim B. Heino
Modified: 2009-09-08 09:30 UTC (History)
5 users (show)

See Also:
Category: ---


Attachments
test mp3 files (7.86 KB, application/octet-stream)
2007-12-13 11:39 UTC, Kim B. Heino
Details
This is how SqueezeCenter sees testmp3 files (8.21 KB, image/png)
2007-12-13 11:40 UTC, Kim B. Heino
Details
hard-code specs compliant latin1 encoding (1.18 KB, patch)
2008-01-30 03:44 UTC, Michael Herger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kim B. Heino 2007-10-06 04:53:11 UTC
I'm using todays SVN snapshot of SqueezeCenter. If I have three tracks with following names:

Jonain päivänä.mp3       (Jonain pa"iva"na")
Syytön.mp3               (Syyto"n)
Venäjää.mp3              (Vena"ja"a")

The first two tracks are displayed correctly in the web interface. The third one is not. This happens always when there are two sequential umlauted chars in the file name. If there is only one, it's ok. See http://b.bbbs.net/umlaut.png for a screen capture, "Venäjää" is the last track.

This worked perfectly with 6.5 version. File format does not matter, same thing happens with OGGs and FLACs.
Comment 1 Chris Owens 2007-11-05 09:38:09 UTC
QA to repro
Comment 2 Michael Herger 2007-12-10 07:20:29 UTC
can't reproduce this. Neither with some random name like äöéà.mp3 nor with the ones provided in the report. They show up correctly here (OSX).
Comment 3 Kim B. Heino 2007-12-13 11:39:17 UTC
Created attachment 2525 [details]
test mp3 files

Test files with ID3 tags: one with Latin-1 charset and one with UTF-8.
Comment 4 Kim B. Heino 2007-12-13 11:40:04 UTC
Created attachment 2526 [details]
This is how SqueezeCenter sees testmp3 files
Comment 5 Kim B. Heino 2007-12-13 11:44:12 UTC
Did you tag the files? All of my files have ID3 tags. Please test with attached test mp3 files. One is tagged with Latin-1 charset, one with UTF-8. Both are displayed incorrectly. As ID3 does not specify charset (does it?), it's ok to one of them to display incorrectly. What charset should I use?
Comment 6 Kim B. Heino 2008-01-09 10:25:38 UTC
If I look directly to the MySQL's tracks-table I see thing like:

mysql> select url, title from tracks where url like '%Parhaat%/2/%';
+---------------------------------------------------------------------------------+----------------------+
| url                                                                             | title                |
+---------------------------------------------------------------------------------+----------------------+
...
| file:///home/music/data/Dingo/1999-Parhaat/2/12-Pyh%C3%A4_klaani.mp3            | Pyhä klaani         | 
...
| file:///home/music/data/Dingo/1999-Parhaat/2/17-N%C3%A4hd%C3%A4%C3%A4n_taas.mp3 | Nהhdההn taas      | 
+---------------------------------------------------------------------------------+----------------------+

URL is encoded correctly, title is not.
Comment 7 Blackketter Dean 2008-01-21 07:17:03 UTC
QA:  please try to reproduce on Windows and Linux.
Comment 8 Blackketter Dean 2008-01-28 10:24:25 UTC
Steven: Can you take a look at this?
Comment 9 Michael Herger 2008-01-29 06:23:23 UTC
Kim - what application did you use to tag these files? The "latin" version shows fine in mp3tag and iTunes. But none of the utf versions would be displayed correctly in any of the tested applications.
Comment 10 Kim B. Heino 2008-01-29 09:38:58 UTC
> Kim - what application did you use to tag these files?

It's my own application using id3lib.

> The "latin" version shows fine in mp3tag and iTunes.

Yes, they are correct in every program except SqueezeCenter. That's the problem.

> But none of the utf versions would be

You can ignore utf files. ...except that in SqueezeCenter they show funny thing:

utf: double-umlaut (f.ex. "väärin") is ok, single (f.ex. "äiti") is not (or was it vice versa?)
latin: single-umlaut is ok, double is not

I assume SqueezeCenter does charset conversion twice (id3-latin to utf to utf), or something... It's using utf internally, isn't it?
Comment 11 Spies Steven 2008-01-29 13:42:59 UTC
I have been able to reproduce this issue with both Windows and Linux.  Mac OS X does not appear to have this issue.  I also see that it worked correctly with 6.5 and not 7.  Linux is able to read both UTF and ISO encoded ID3 tags even though I believe ISO is more compatible.

Michael did you want to take this one?
Comment 12 Blackketter Dean 2008-01-29 22:23:51 UTC
Ping michael....
Comment 13 Michael Herger 2008-01-30 02:26:59 UTC
Kim - do you by chance have Encode::Detect::Detector installed on your system? Would it work if you removed it (temporarily)? And did you tag on Windows or Linux?

The tags seem to have incomplete encoding information. SC then tries to "guess" what encoding it is. Detector recognizes it as cp1255 (Windows), whereas Encode::Guess (used if the above module is not installed) correctly recognizes it as iso-8859-1. 

I'm still not sure whether this is a bug or just unlucky guessing (compared to other applications).
Comment 14 Kim B. Heino 2008-01-30 03:30:33 UTC
> Kim - do you by chance have Encode::Detect::Detector installed on your system?

Systemwide no. Of course a copy of it comes with SC, which is installed... Do you want me to delete SC's copy of it?

> And did you tag on Windows or Linux?

Linux. My filesystem is UTF-8, my own program is using iconv("ISO-8859-1","UTF-8") and id3lib to tag the files.

> The tags seem to have incomplete encoding information.

True... id3lib seems to have:

  virtual bool          SetEncoding(ID3_TextEnc enc) = 0;

...with ID3TE_UTF8 (3) and ID3TE_ISO8859_1 (0) defines. 

I'll try if that helps.
Comment 15 Michael Herger 2008-01-30 03:44:05 UTC
Created attachment 2774 [details]
hard-code specs compliant latin1 encoding

Please try this patch. It removes the encoding detection wizardry and hard codes the latin1 decoding (which should be expected in this case according to the specs).
Comment 16 Kim B. Heino 2008-01-30 08:01:21 UTC
Works great with that patch. I don't see any errors anymore.
Comment 17 Blackketter Dean 2008-01-30 08:56:21 UTC
Michael will be putting this in trunk for 7.0.1. with dan's feedback.
Comment 18 Michael Herger 2008-03-18 03:42:45 UTC
Kim - can you still reproduce this issue with the latest 7.0.1 build? I've checked in a few encoding related changes which might influence your case too.
Comment 19 Kim B. Heino 2008-03-20 01:09:45 UTC
I removed the patch, updated to SVN 17943 and did full rescan: the problem is still there.
Comment 20 Michael Herger 2008-03-20 03:57:40 UTC
change 17945 - thanks Kim

QA - please do some thorough testing on whatever oddly tagged files you can find (mp3 only). I tested with my ~8000 titles on OSX/WinXP - but then my files are correctly tagged ;-)
Comment 21 Dan Sully 2008-03-20 07:03:15 UTC
I guarantee that change 17945 will cause more pain for everyone than fixing this one person's problem.
Comment 22 Michael Herger 2008-03-23 15:26:04 UTC
Dan - thanks for your feedback. I've feared something like that (that's why I included you on the CC list ;-)). We'll re-consider the fix after the easter-weekend.
Comment 23 Michael Herger 2008-03-25 04:18:44 UTC
change 17979 - revert change 17945

Kim, I'm sorry for this. You know the "fix" for your case. Thanks for the understanding.
Comment 24 Michael Herger 2008-11-06 08:03:56 UTC
Will take another look for the next major release.
Comment 25 Michael Herger 2008-11-20 23:55:41 UTC
*** Bug 10089 has been marked as a duplicate of this bug. ***
Comment 26 Michael Herger 2008-11-21 05:40:05 UTC
*** Bug 5898 has been marked as a duplicate of this bug. ***
Comment 27 Michael Herger 2008-11-21 05:40:45 UTC
Please see also dupe bug 5898 (helpful attachments)
Comment 28 sami moisio 2008-11-22 02:12:23 UTC
I repeat a bit myself, but perhaps this could give some kind of hint to you. I downgraded Squeezecenter back from version 7.2 to version 7.1 and the same issue remains still. Then I followed instructions in the following url: http://kimmo.suominen.com/archives/2008/06/squeezecenter-7/ 
Now everything works fine again with 7.1. I did not get this working with version 7.2.
Comment 29 Kim B. Heino 2008-11-22 03:32:22 UTC
> Now everything works fine again with 7.1. I did not get this working with
> version 7.2.

Sami, I've been using Michael's patch (https://bugs-archive.lyrion.org/attachment.cgi?id=2774) with 7.1, 7.2 and 7.3. It works fine for me on all versions.
Comment 30 sami moisio 2008-11-22 04:06:14 UTC
Ok, Thanks Kim!
Comment 31 Michael Herger 2009-03-17 07:53:19 UTC
Adding Andy - good test case for C scanner code.
Comment 32 Andy Grundman 2009-03-17 09:19:24 UTC
At first glance the title string in your utf test file is not encoded correctly.  id3v2.3 requires UTF-16, and the string is not encoded that way.  What tool was this file encoded with?
Comment 33 Kim B. Heino 2009-03-19 13:30:25 UTC
Andy, as mentioned in comment #10, you can ignore utf test files. Same comment says that they were tagged with my own program using id3lib.

Currently I'm using Latin1-tagged files. 7.3-trunk with Michael's patch works fine for me. Without the patch it doesn't because I don't add encoding info.
Comment 34 Andy Grundman 2009-03-19 13:43:54 UTC
I'm confused.  Your screenshot shows only the utf version has display issues.  This is because your program is not tagging it properly.
Comment 35 Andy Grundman 2009-03-19 13:50:49 UTC
I'm closing this as won't fix.  SC cannot be expected to support ID3 tags from buggy implementations.  You should read the ID3v2.3 spec, which says:

   If nothing else is said a string is represented as ISO-8859-1
   [ISO-8859-1] characters in the range $20 - $FF. Such strings are
   represented as <text string>, or <full text string> if newlines are
   allowed, in the frame descriptions. All Unicode strings [UNICODE] use
   16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings
   must begin with the Unicode BOM ($FF FE or $FE FF) to identify the
   byte order.

Try comparing the tags generated by your program to the tags generated by MP3Tag.
Comment 36 Kim B. Heino 2009-03-19 13:55:32 UTC
> Your screenshot shows only the utf version has display issues. 

My screenshot shows that both utf and latin (ISO-8859-1) have display issues.