Bug 8195 - SqueezeCenter not always treating FLAC tags as UTF-8
: SqueezeCenter not always treating FLAC tags as UTF-8
Status: RESOLVED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Scanner
: 7.0
: PC Other
: -- normal (vote)
: Future
Assigned To: Unassigned bug - please assign me!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-19 10:38 UTC by Christopher Key
Modified: 2011-03-16 04:35 UTC (History)
3 users (show)

See Also:
Category: ---


Attachments
Test case 1, detect correctly as UTF-8 (8.09 KB, application/octet-stream)
2008-05-19 10:47 UTC, Christopher Key
Details
Test case 2, not detected as UTF-8 (8.09 KB, application/octet-stream)
2008-05-19 10:48 UTC, Christopher Key
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Key 2008-05-19 10:38:40 UTC
SqueezeCenter
Comment 1 Christopher Key 2008-05-19 10:46:44 UTC
Oops, committed by mistake while trying to add an attachment!

SqueezeCenter appears to be attempting to autodetect the encoding used within FLAC tags.  Occasionally, it is not detecting it as UTF-8, causing the tag data to become corrupted.  The two attached files demonstrate this.  01.flac is tagged with ALBUM="Test žƒÅ", which is detected as UTF-8 and is hence correctly entered into the database.  02.flac is tagged with ALBUM="Test ž", which is apparently not detected as UTF-8, and hence end up corrupted on insertion"

This behaviour is incorrect.  FLAC tags are embedded Vorbis Comments, which mandate the use of UTF-8 text, http://www.xiph.org/vorbis/doc/v-comment.html.  No autodetection should be taking place.

Comment 2 Christopher Key 2008-05-19 10:47:42 UTC
Created attachment 3358 [details]
Test case 1, detect correctly as UTF-8
Comment 3 Christopher Key 2008-05-19 10:48:06 UTC
Created attachment 3359 [details]
Test case 2, not detected as UTF-8
Comment 4 Michael Herger 2008-05-20 00:53:16 UTC
Probably related to bug 5686.

An attempt to be more standard compliant instead of doing our own guessing was reverted: "I guarantee that change 17945 will cause more pain for everyone than fixing this one person's problem." Now you'd be two...

(https://bugs-archive.lyrion.org/show_bug.cgi?id=5686#c21)
Comment 5 Christopher Key 2008-05-20 01:09:40 UTC
(In reply to comment #4)
> Probably related to bug 5686.
> 
> An attempt to be more standard compliant instead of doing our own guessing was
> reverted: "I guarantee that change 17945 will cause more pain for everyone than
> fixing this one person's problem." Now you'd be two...
> 
> (https://bugs-archive.lyrion.org/show_bug.cgi?id=5686#c21)
> 

Thanks Michael,

Certainly the same underlying issue as 5686, although I'm not sure whether change 17945 is directly responsible.  I did have a bit more of a did through the code, SqueezeCenter does seem to be trying to do the right thing, Slim::Formats::FLAC::_getTag doesn't attempt any detection, it just assumes UTF8, which is correct.  (although wouldn't it make more sense to be dealing with the encoding in Audio::FLAC::Header, as soon as the data is read / parsed, and leave Slim::Formats::FLAC blissfully unaware that text encodings even exist?)

I guess that something odd must be happening downstream of here, but I'm not familiar enough with the code to follow the path through.
Comment 6 Chris Owens 2008-06-09 09:49:31 UTC
Michael to take a look for 7.1, but if it looks like it's going to break more
than it fixes, feel free to push it off.
Comment 7 Michael Herger 2008-06-11 07:55:59 UTC
Christopher - did a quick test with your files and can only confirm that it's the same issue I've mentioned before (though in a different place): as people's files seem to be in any encoding possible, we have to apply some guessing. And sometimes it just fails.

If you applied the following patch, it very likely would work:

Index: /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm
===================================================================
--- /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm	(revision 20638)
+++ /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm	(working copy)
@@ -916,12 +916,6 @@
 
 		for my $value (@$values) {
 
-			if ($] > 5.007) {
-				$value = Slim::Utils::Unicode::utf8decode($value, 'utf8');
-			} else {
-				$value = Slim::Utils::Unicode::utf8toLatin1($value);
-			}
-
 			if ($count == 1) {
 
 				$tags->{$tag} = $value;


All this patch does is remove the decoding (which involves the guessing) and trust on the tag being utf8.

I'm not sure we're going to fix this. It very likely breaks things for more users than it helps. I'm sorry.
Comment 8 Christopher Key 2008-06-12 05:07:34 UTC
(In reply to comment #7)
> All this patch does is remove the decoding (which involves the guessing) and
> trust on the tag being utf8.
> 
> I'm not sure we're going to fix this. It very likely breaks things for more
> users than it helps. I'm sorry.
> 

I've patched it slightly differently, it now just called utf8::decode($value) instead of the Slim::Utils..., and that works for me.  I'm guessing that this is the correct thing to do, given that $value initially contains an octet sequence which needs converting to perl's internal string representation.

I can understand your not wanting to fix this, and it's not really a problem to patch things here whenever I upgrade.  Nevertheless, the FLAC format does explicitly state UTF8 only, and I'd be surprised if many people had files violating this.  Looking at the code, J-River appears to be base64 encoding data in the tags, but this is still compatible with utf8, (infact, it is a utf8 encoding of the base64 encoding of the image data).

In my opinion, the cleanest solution would be to deal with the encoding entirely in FLAC::Given this, the cleanest solution would be to deal with encoding wholly in AUDIO::FLAC::Header, having it return strings in perl's internal format.

Nevertheless, if you feel that fixing this is just going to cause more tech support problems, it's no problem, as long as it remains a simple patch to get the correct behaviour.
Comment 9 Chris Owens 2008-06-13 09:50:47 UTC
Alan Young suggests that the only way to make this work for everybody would be to add a preference for the behavior.

We have been warned that removing the 'guessing' will cause lots of tech support calls.
Comment 10 Andy Grundman 2008-06-13 09:53:42 UTC
Dan: Anything that could be done in Audio::FLAC::Header related to this?  We also probably should upgrade to A::F::H 2.2.
Comment 11 Christopher Key 2009-11-02 02:14:03 UTC
This appears fixed in 7.4.1.  Audio::Scan does a blanket sv_utf8_decode(value); in _split_vorbis_comment.  I did have charset issues after upgrading, but can't reproduce them after a full rescan.