Bug 17863 - No display of "Comment" Tag when special characters are used
: No display of "Comment" Tag when special characters are used
Status: RESOLVED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Player UI
: 7.6.1
: All All
: -- normal with 12 votes (vote)
: 7.9.x
Assigned To: Michael Herger
http://forums.slimdevices.com/showthr...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-14 11:54 UTC by a277_913za
Modified: 2014-06-10 09:15 UTC (History)
2 users (show)

See Also:
Category: Bug


Attachments
Proposed patch to Slim/Schema.pm to handle valid utf-8 comment tags (1.25 KB, patch)
2012-01-15 09:53 UTC, Martin Williams
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description a277_913za 2012-01-14 11:54:10 UTC
As described in http://forums.slimdevices.com/showthread.php?t=89021:
If the content of the Tag Field "Comment" contains some (which?) special characters, it will not be displayed in the contextmenu of a selected song. This is valid for the Web-Interface, the Controller-Interface and SqueezePlay!

Unknown which characters lead to this behaviour, but at least german "Umlaute" like "äöüß" or the character "—" (not "-") in the comment field are sufficient. 

The bug has been detected with FLAC files, but seems to bet valid for mp3 as well.
Comment 1 gary 2012-01-15 07:34:35 UTC
*** This bug has been confirmed by popular vote. ***
Comment 2 Martin Williams 2012-01-15 09:53:22 UTC
Created attachment 7607 [details]
Proposed patch to Slim/Schema.pm to handle valid utf-8 comment tags
Comment 3 Martin Williams 2012-01-15 10:25:50 UTC
(In reply to comment #2)
> Created an attachment (id=7607) [details]
> Proposed patch to Slim/Schema.pm to handle valid utf-8 comment tags

My comments got lost from this.

I replicate the problem with mp3 files containing valid latin1 comments, but
including non-ascii characters.

I trace my issue to change #30085 attached to bug #15630.
 
That change introduces code to 'skip comment strings with invalid utf-8'.
Unfortunately it also skips comment strings with valid utf-8 as well.

My proposed patch deals with the problem by 'round tripping' a potentially
invalid utf-8 string, replacing any invalid characters with valid substitution
characters.

It also looks at any non utf-8 string, and applies a conversion should such
string contain any non-ASCII characters.

The patch appears to be effective in solving my mp3 comment tag problems, and it
deals appropriately with the test case attached to bug #15630. I do not have a
library of 'errant mp3 comment tags found in the wild' against which it could be
tested. It is not restricted to mp3 comment tags in its application, though,
as it is applied after the underlying files have been scanned.  

A quick analysis of why I believe #30085 failed in its objective:

It introduces in Slim/Schema.pm, circa line 2540:

# Bug 15630, ignore strings which have the utf8 flag on but are in fact invalid utf8
next if utf8::is_utf8($c) && !Slim::Utils::Unicode::looks_like_utf8($c);

A) It appears to me that Slim::Utils::Unicode::looks_like_utf8 expects to
receive an 'octet string', but is being passed a utf-8 encoded internal perl
string.
B) In any case, I believe that Slim::Utils::Unicode::looks_like_utf8 may not
catch the invalid unicode character \xFFFF. That character features in the test
case attached to bug #15630.

It also begs the question of why invalid utf-8 strings are generated in the
first place.

I have not exhaustively exercised this analysis.

The patch looks as if it will apply to server version 7.7, although I haven't
checked that it remains necessary.
Comment 4 Martin Williams 2012-01-15 10:37:30 UTC
(In reply to comment #3)
> (In reply to comment #2)

I should also remark that I am running under Perl 5.8.8 on OSX 10.5 (ppc).
Comment 5 Gordon Harris 2012-03-25 22:27:51 UTC
*** Bug 17936 has been marked as a duplicate of this bug. ***
Comment 6 Gordon Harris 2012-03-25 23:32:16 UTC
Martin's patch works for me with the 7.7.2 svn code.  My cuesheet comments (most of which contain diacritics) scan just fine with his fix.  Also: scanning speed doesn't suffer at all and, in fact, might be a tad faster.  I suspect that:

   $c = decode("UTF-8",encode("utf8", $c),Encode::FB_DEFAULT);

..might be faster than the regex in Slim::Utils::Unicode::looks_like_utf8().
Comment 7 Martin Williams 2012-03-26 10:19:01 UTC
(In reply to comment #4)
> I should also remark that I am running under Perl 5.8.8 on OSX 10.5 (ppc).

My proposed 'round tripping through UTF-8' approach will not work on Perl
versions below 5.8.7 as it appears to depend on the existence of the
'utf-8-strict' character encoding. Refer documentation for Encode.

On such systems the effect of my proposed patch may be to reintroduce bug
#15630. Mac OSX 10.4 (Tiger) is one such system.

That can be avoided by modifying the patch to check for the presence of the
'utf-8-strict' encoding and only 'round trip' if it's there. Pre Perl 5.8.7
systems would simply have to suffer the existing behaviour !


A bigger question may be "How to handle character encoding issues generally ?"
given the somewhat awkward Unicode issues that seem to exist in Perl, and the
variation between the different Perl versions.
Comment 8 Gordon Harris 2012-03-26 11:15:49 UTC
In the blog exchange at http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex which gave rise to looks_like_utf8(), one commentator mentions using iconv() rather than the regex, for speed's sake.  I don't know if that's helpful as an alternative to the limitations of encode/decode that Martin mentions.

If it is, at the very least, I'd think that Text::Iconv would have to be pulled into Slim's CPAN to make it available for the windows version.
Comment 9 bforbord 2012-04-14 13:30:30 UTC
Tested with abberant hyphens (only - is allowed) and quotation marks (only " and ' are accepted).  These exist in many other variants when I copy info from web-sites, and it is a difficult task to detect and remove them.  Not all of us have english as our mother tongue.  Umlaute and scandinavian characters should be recognized by the squeezebox.  I think this is a MAJOR BUG!
Comment 10 Michael Herger 2014-06-04 14:44:45 UTC
I just reverted the change supposed to fix bug 15630, which was causing this issue you're seeing. It seems the "new" scanner introduced in 7.6 would handle those comments just fine. I wasn't able to crash it using the sample files provided in bug 15630.

Please let me know if this fixes your problem (or if it causes new issues...)
Comment 11 a277_913za 2014-06-06 06:25:46 UTC
Michael, you're last comment confuses me. My posts at the forum in 2012 (!) clearly Städte, that i discovered the problems with 7.61!!!!
Comment 12 a277_913za 2014-06-06 06:28:31 UTC
...clearly state, that the problem exist with SC 7.61...
Comment 13 Michael Herger 2014-06-06 09:23:23 UTC
I'm sorry for the confusion. I only expressed a guess without verifying it.

Would 7.9 now crash due to this change?
Comment 14 Martin Williams 2014-06-06 19:41:15 UTC
(In reply to comment #10)
> I just reverted the change supposed to fix bug 15630, which was causing this
> issue you're seeing. It seems the "new" scanner introduced in 7.6 would
> handle those comments just fine. I wasn't able to crash it using the sample
> files provided in bug 15630.
> 
> Please let me know if this fixes your problem (or if it causes new issues...)

FWIW, in LMS 7.7.3:

This change appears to fix the problems I had encountered with utf-8 encoded MP3 tags.
(I don't have so many, as my collection is somewhat anglocentric).

I also verified that the umlauted characters and em-dash quoted in the first post were also scanned correctly in an MP3 tag.

I can't confirm cue sheet or flac handling.
Comment 15 Martin Williams 2014-06-06 20:18:03 UTC
To clarify:

By 'change' I mean the 'change reversion' that Michael has just applied.
Comment 16 a277_913za 2014-06-08 08:45:11 UTC
Checked the behaviour with SC  7.8.0 - 1387542508 and FLAC (!) Files:
A single "german umlaut" like äöüß makes the comment field disappear.

So, no improvmeent since the year 2012 then :-(
Comment 17 Mikael Nyberg 2014-06-08 19:35:26 UTC
Try 7.9 that's where the change is done .
Comment 18 Gordon Harris 2014-06-08 19:48:50 UTC
Running from updated git code on public/7.9, I'm finding that the new scanner IS pulling in comments-with-diacritics from whole-album-flacs-with-embedded cuesheets.

Example:  from the file:

"/mnt/Media/Music/l_Modern_Central_European/Bartók, B/Chamber Works - György Pauk, Jenö Jandó, Kodály Quartet.flac"

..the scanner successfully pulls in the comment:

"Comment: Béla Bartók (1881-1945); Chamber Music; György Pauk, violin; Jenő Jandó, piano; Kodály Quartet; Attila Falvay, violin; Tamás Szabó, violin; Gäbor Fias, viola; János Devich, cello;"

..and displays it in the webUI.
Comment 19 a277_913za 2014-06-09 10:21:55 UTC
With 7.9.0 - 1402065712 at least comments with "german umlauts" are displayed.

Looks good4me!
Comment 20 a277_913za 2014-06-09 10:38:47 UTC
Putting the following string into a comment: äöü!"§$%&/()=?´`*~'#;:_-.,><|°^

Results in a doubble comment :-)
---
Comment: äVier Jahre nach ihrem gefeierten Hit-Album The Reminder meldet sich Feist mit ihrem neuen Werk zurueck Metals , das vierte Studioalbum der Kanadierin, erscheint am 30. September. Bei den Aufnahmen an der kalifornischen Kueste wurde die 35-Jaehrige sowohl von langjaehrigen Weggefaehrten Chilly Gonzales und Mocky als auch von neuen Verbuendeten wie Valgeir Sigurdsson (Bonne, Prince, Billy, Bjoerk) unterstuetzt. Die Stuecke, die auf Metals versammelt sind, rangieren von sachte polternden, stimmungsvollen Klangteppichen bis hin zu extrem intensiven Tracks, die wie ein musikalisches Pendant den aufziehenden Nebel und das darauf folgende Gewitter widerspiegeln. Feist ist zurueck - und besser denn je. Deluxe Edition kommt in einer schoenen Digipack Ausstattung.
/ äöü!"§$%&/()=?´`*~'#;:_-.,><|°^Vier Jahre nach ihrem gefeierten Hit-Album The Reminder meldet sich Feist mit ihrem neuen Werk zurueck Metals , das vierte Studioalbum der Kanadierin, erscheint am 30. September. Bei den Aufnahmen an der kalifornischen Kueste wurde die 35-Jaehrige sowohl von langjaehrigen Weggefaehrten Chilly Gonzales und Mocky als auch von neuen Verbuendeten wie Valgeir Sigurdsson (Bonne, Prince, Billy, Bjoerk) unterstuetzt. Die Stuecke, die auf Metals versammelt sind, rangieren von sachte polternden, stimmungsvollen Klangteppichen bis hin zu extrem intensiven Tracks, die wie ein musikalisches Pendant den aufziehenden Nebel und das darauf folgende Gewitter widerspiegeln. Feist ist zurueck - und besser denn je. Deluxe Edition kommt in einer schoenen Digipack Ausstattung.
---

Funny. Now we have the comment (good), but sometime a little bit too much of it ;-)
Comment 21 a277_913za 2014-06-09 13:37:02 UTC
And more funny: The following string a the beginning of a comment field, does not double the comment:
äöü§$%&/()=?[]}\+#-.,;:_'*^°´`~|²³ß

Any ideas?
Comment 22 a277_913za 2014-06-09 13:42:20 UTC
Copying the string from https://bugs-archive.lyrion.org/show_bug.cgi?id=17863#c20 again in the comment, has no effect (do double comment).

So, I can't reconstruct the behaviour...
Comment 23 Michael Herger 2014-06-10 09:15:46 UTC
Let's consider this fixed.

dev04 - please feel free to open a new bug report about your specific issue if you can reproduce it. Thanks!