Bugzilla – Bug 17863
No display of "Comment" Tag when special characters are used
Last modified: 2014-06-10 09:15:46 UTC
As described in http://forums.slimdevices.com/showthread.php?t=89021: If the content of the Tag Field "Comment" contains some (which?) special characters, it will not be displayed in the contextmenu of a selected song. This is valid for the Web-Interface, the Controller-Interface and SqueezePlay! Unknown which characters lead to this behaviour, but at least german "Umlaute" like "äöüß" or the character "—" (not "-") in the comment field are sufficient. The bug has been detected with FLAC files, but seems to bet valid for mp3 as well.
*** This bug has been confirmed by popular vote. ***
Created attachment 7607 [details] Proposed patch to Slim/Schema.pm to handle valid utf-8 comment tags
(In reply to comment #2) > Created an attachment (id=7607) [details] > Proposed patch to Slim/Schema.pm to handle valid utf-8 comment tags My comments got lost from this. I replicate the problem with mp3 files containing valid latin1 comments, but including non-ascii characters. I trace my issue to change #30085 attached to bug #15630. That change introduces code to 'skip comment strings with invalid utf-8'. Unfortunately it also skips comment strings with valid utf-8 as well. My proposed patch deals with the problem by 'round tripping' a potentially invalid utf-8 string, replacing any invalid characters with valid substitution characters. It also looks at any non utf-8 string, and applies a conversion should such string contain any non-ASCII characters. The patch appears to be effective in solving my mp3 comment tag problems, and it deals appropriately with the test case attached to bug #15630. I do not have a library of 'errant mp3 comment tags found in the wild' against which it could be tested. It is not restricted to mp3 comment tags in its application, though, as it is applied after the underlying files have been scanned. A quick analysis of why I believe #30085 failed in its objective: It introduces in Slim/Schema.pm, circa line 2540: # Bug 15630, ignore strings which have the utf8 flag on but are in fact invalid utf8 next if utf8::is_utf8($c) && !Slim::Utils::Unicode::looks_like_utf8($c); A) It appears to me that Slim::Utils::Unicode::looks_like_utf8 expects to receive an 'octet string', but is being passed a utf-8 encoded internal perl string. B) In any case, I believe that Slim::Utils::Unicode::looks_like_utf8 may not catch the invalid unicode character \xFFFF. That character features in the test case attached to bug #15630. It also begs the question of why invalid utf-8 strings are generated in the first place. I have not exhaustively exercised this analysis. The patch looks as if it will apply to server version 7.7, although I haven't checked that it remains necessary.
(In reply to comment #3) > (In reply to comment #2) I should also remark that I am running under Perl 5.8.8 on OSX 10.5 (ppc).
*** Bug 17936 has been marked as a duplicate of this bug. ***
Martin's patch works for me with the 7.7.2 svn code. My cuesheet comments (most of which contain diacritics) scan just fine with his fix. Also: scanning speed doesn't suffer at all and, in fact, might be a tad faster. I suspect that: $c = decode("UTF-8",encode("utf8", $c),Encode::FB_DEFAULT); ..might be faster than the regex in Slim::Utils::Unicode::looks_like_utf8().
(In reply to comment #4) > I should also remark that I am running under Perl 5.8.8 on OSX 10.5 (ppc). My proposed 'round tripping through UTF-8' approach will not work on Perl versions below 5.8.7 as it appears to depend on the existence of the 'utf-8-strict' character encoding. Refer documentation for Encode. On such systems the effect of my proposed patch may be to reintroduce bug #15630. Mac OSX 10.4 (Tiger) is one such system. That can be avoided by modifying the patch to check for the presence of the 'utf-8-strict' encoding and only 'round trip' if it's there. Pre Perl 5.8.7 systems would simply have to suffer the existing behaviour ! A bigger question may be "How to handle character encoding issues generally ?" given the somewhat awkward Unicode issues that seem to exist in Perl, and the variation between the different Perl versions.
In the blog exchange at http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex which gave rise to looks_like_utf8(), one commentator mentions using iconv() rather than the regex, for speed's sake. I don't know if that's helpful as an alternative to the limitations of encode/decode that Martin mentions. If it is, at the very least, I'd think that Text::Iconv would have to be pulled into Slim's CPAN to make it available for the windows version.
Tested with abberant hyphens (only - is allowed) and quotation marks (only " and ' are accepted). These exist in many other variants when I copy info from web-sites, and it is a difficult task to detect and remove them. Not all of us have english as our mother tongue. Umlaute and scandinavian characters should be recognized by the squeezebox. I think this is a MAJOR BUG!
I just reverted the change supposed to fix bug 15630, which was causing this issue you're seeing. It seems the "new" scanner introduced in 7.6 would handle those comments just fine. I wasn't able to crash it using the sample files provided in bug 15630. Please let me know if this fixes your problem (or if it causes new issues...)
Michael, you're last comment confuses me. My posts at the forum in 2012 (!) clearly Städte, that i discovered the problems with 7.61!!!!
...clearly state, that the problem exist with SC 7.61...
I'm sorry for the confusion. I only expressed a guess without verifying it. Would 7.9 now crash due to this change?
(In reply to comment #10) > I just reverted the change supposed to fix bug 15630, which was causing this > issue you're seeing. It seems the "new" scanner introduced in 7.6 would > handle those comments just fine. I wasn't able to crash it using the sample > files provided in bug 15630. > > Please let me know if this fixes your problem (or if it causes new issues...) FWIW, in LMS 7.7.3: This change appears to fix the problems I had encountered with utf-8 encoded MP3 tags. (I don't have so many, as my collection is somewhat anglocentric). I also verified that the umlauted characters and em-dash quoted in the first post were also scanned correctly in an MP3 tag. I can't confirm cue sheet or flac handling.
To clarify: By 'change' I mean the 'change reversion' that Michael has just applied.
Checked the behaviour with SC 7.8.0 - 1387542508 and FLAC (!) Files: A single "german umlaut" like äöüß makes the comment field disappear. So, no improvmeent since the year 2012 then :-(
Try 7.9 that's where the change is done .
Running from updated git code on public/7.9, I'm finding that the new scanner IS pulling in comments-with-diacritics from whole-album-flacs-with-embedded cuesheets. Example: from the file: "/mnt/Media/Music/l_Modern_Central_European/Bartók, B/Chamber Works - György Pauk, Jenö Jandó, Kodály Quartet.flac" ..the scanner successfully pulls in the comment: "Comment: Béla Bartók (1881-1945); Chamber Music; György Pauk, violin; Jenő Jandó, piano; Kodály Quartet; Attila Falvay, violin; Tamás Szabó, violin; Gäbor Fias, viola; János Devich, cello;" ..and displays it in the webUI.
With 7.9.0 - 1402065712 at least comments with "german umlauts" are displayed. Looks good4me!
Putting the following string into a comment: äöü!"§$%&/()=?´`*~'#;:_-.,><|°^ Results in a doubble comment :-) --- Comment: äVier Jahre nach ihrem gefeierten Hit-Album The Reminder meldet sich Feist mit ihrem neuen Werk zurueck Metals , das vierte Studioalbum der Kanadierin, erscheint am 30. September. Bei den Aufnahmen an der kalifornischen Kueste wurde die 35-Jaehrige sowohl von langjaehrigen Weggefaehrten Chilly Gonzales und Mocky als auch von neuen Verbuendeten wie Valgeir Sigurdsson (Bonne, Prince, Billy, Bjoerk) unterstuetzt. Die Stuecke, die auf Metals versammelt sind, rangieren von sachte polternden, stimmungsvollen Klangteppichen bis hin zu extrem intensiven Tracks, die wie ein musikalisches Pendant den aufziehenden Nebel und das darauf folgende Gewitter widerspiegeln. Feist ist zurueck - und besser denn je. Deluxe Edition kommt in einer schoenen Digipack Ausstattung. / äöü!"§$%&/()=?´`*~'#;:_-.,><|°^Vier Jahre nach ihrem gefeierten Hit-Album The Reminder meldet sich Feist mit ihrem neuen Werk zurueck Metals , das vierte Studioalbum der Kanadierin, erscheint am 30. September. Bei den Aufnahmen an der kalifornischen Kueste wurde die 35-Jaehrige sowohl von langjaehrigen Weggefaehrten Chilly Gonzales und Mocky als auch von neuen Verbuendeten wie Valgeir Sigurdsson (Bonne, Prince, Billy, Bjoerk) unterstuetzt. Die Stuecke, die auf Metals versammelt sind, rangieren von sachte polternden, stimmungsvollen Klangteppichen bis hin zu extrem intensiven Tracks, die wie ein musikalisches Pendant den aufziehenden Nebel und das darauf folgende Gewitter widerspiegeln. Feist ist zurueck - und besser denn je. Deluxe Edition kommt in einer schoenen Digipack Ausstattung. --- Funny. Now we have the comment (good), but sometime a little bit too much of it ;-)
And more funny: The following string a the beginning of a comment field, does not double the comment: äöü§$%&/()=?[]}\+#-.,;:_'*^°´`~|²³ß Any ideas?
Copying the string from https://bugs-archive.lyrion.org/show_bug.cgi?id=17863#c20 again in the comment, has no effect (do double comment). So, I can't reconstruct the behaviour...
Let's consider this fixed. dev04 - please feel free to open a new bug report about your specific issue if you can reproduce it. Thanks!