Bugzilla – Bug 8195
SqueezeCenter not always treating FLAC tags as UTF-8
Last modified: 2011-03-16 04:35:07 UTC
SqueezeCenter
Oops, committed by mistake while trying to add an attachment! SqueezeCenter appears to be attempting to autodetect the encoding used within FLAC tags. Occasionally, it is not detecting it as UTF-8, causing the tag data to become corrupted. The two attached files demonstrate this. 01.flac is tagged with ALBUM="Test žƒÅ", which is detected as UTF-8 and is hence correctly entered into the database. 02.flac is tagged with ALBUM="Test ž", which is apparently not detected as UTF-8, and hence end up corrupted on insertion" This behaviour is incorrect. FLAC tags are embedded Vorbis Comments, which mandate the use of UTF-8 text, http://www.xiph.org/vorbis/doc/v-comment.html. No autodetection should be taking place.
Created attachment 3358 [details] Test case 1, detect correctly as UTF-8
Created attachment 3359 [details] Test case 2, not detected as UTF-8
Probably related to bug 5686. An attempt to be more standard compliant instead of doing our own guessing was reverted: "I guarantee that change 17945 will cause more pain for everyone than fixing this one person's problem." Now you'd be two... (https://bugs-archive.lyrion.org/show_bug.cgi?id=5686#c21)
(In reply to comment #4) > Probably related to bug 5686. > > An attempt to be more standard compliant instead of doing our own guessing was > reverted: "I guarantee that change 17945 will cause more pain for everyone than > fixing this one person's problem." Now you'd be two... > > (https://bugs-archive.lyrion.org/show_bug.cgi?id=5686#c21) > Thanks Michael, Certainly the same underlying issue as 5686, although I'm not sure whether change 17945 is directly responsible. I did have a bit more of a did through the code, SqueezeCenter does seem to be trying to do the right thing, Slim::Formats::FLAC::_getTag doesn't attempt any detection, it just assumes UTF8, which is correct. (although wouldn't it make more sense to be dealing with the encoding in Audio::FLAC::Header, as soon as the data is read / parsed, and leave Slim::Formats::FLAC blissfully unaware that text encodings even exist?) I guess that something odd must be happening downstream of here, but I'm not familiar enough with the code to follow the path through.
Michael to take a look for 7.1, but if it looks like it's going to break more than it fixes, feel free to push it off.
Christopher - did a quick test with your files and can only confirm that it's the same issue I've mentioned before (though in a different place): as people's files seem to be in any encoding possible, we have to apply some guessing. And sometimes it just fails. If you applied the following patch, it very likely would work: Index: /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm =================================================================== --- /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm (revision 20638) +++ /Users/mh/Documents/workspace/unstable/server/Slim/Formats/FLAC.pm (working copy) @@ -916,12 +916,6 @@ for my $value (@$values) { - if ($] > 5.007) { - $value = Slim::Utils::Unicode::utf8decode($value, 'utf8'); - } else { - $value = Slim::Utils::Unicode::utf8toLatin1($value); - } - if ($count == 1) { $tags->{$tag} = $value; All this patch does is remove the decoding (which involves the guessing) and trust on the tag being utf8. I'm not sure we're going to fix this. It very likely breaks things for more users than it helps. I'm sorry.
(In reply to comment #7) > All this patch does is remove the decoding (which involves the guessing) and > trust on the tag being utf8. > > I'm not sure we're going to fix this. It very likely breaks things for more > users than it helps. I'm sorry. > I've patched it slightly differently, it now just called utf8::decode($value) instead of the Slim::Utils..., and that works for me. I'm guessing that this is the correct thing to do, given that $value initially contains an octet sequence which needs converting to perl's internal string representation. I can understand your not wanting to fix this, and it's not really a problem to patch things here whenever I upgrade. Nevertheless, the FLAC format does explicitly state UTF8 only, and I'd be surprised if many people had files violating this. Looking at the code, J-River appears to be base64 encoding data in the tags, but this is still compatible with utf8, (infact, it is a utf8 encoding of the base64 encoding of the image data). In my opinion, the cleanest solution would be to deal with the encoding entirely in FLAC::Given this, the cleanest solution would be to deal with encoding wholly in AUDIO::FLAC::Header, having it return strings in perl's internal format. Nevertheless, if you feel that fixing this is just going to cause more tech support problems, it's no problem, as long as it remains a simple patch to get the correct behaviour.
Alan Young suggests that the only way to make this work for everybody would be to add a preference for the behavior. We have been warned that removing the 'guessing' will cause lots of tech support calls.
Dan: Anything that could be done in Audio::FLAC::Header related to this? We also probably should upgrade to A::F::H 2.2.
This appears fixed in 7.4.1. Audio::Scan does a blanket sv_utf8_decode(value); in _split_vorbis_comment. I did have charset issues after upgrading, but can't reproduce them after a full rescan.