Bugzilla – Bug 9868
MP3::Info should use all tag fields to determine charset
Last modified: 2008-12-22 15:54:08 UTC
Created attachment 4193 [details] patch for lib/MP3/Info.pm Some artist, album and track names in my library with Japanese characters get displayed incorrectly by SqueezeCenter and SqueezeBox Controller. For example, an artist tag that says "Ozaki Yutaka" (see http://en.wikipedia.org/wiki/Yutaka_Ozaki for actual kanji characters) in Shift_JIS encoding will get displayed in SqueezeCenter with garbled characters. It appears that the MP3::Info module used to extract tags from MP3 files is not guessing the correct character set for tag values. MP3::Info uses Encode::Detect::Detector (if available) to detect the charset for each individual tag string, but a single artist or album tag is usually too short for it to reliably detect the charset. A typical Japanese artist name consists of only four characters, and I suppose the situation is similar for Chinese names, too. I wish to suggest that, MP3::Info could identify tag charsets more accurately if it used all of the tags in a track to detect which charset is being used in those tags. This fix won't solve the problem entirely, because some tracks use CJK characters only in the artist tag (for example) and Latin characters in the other tags, but for most files it would be a great improvement. I've attached a patch for Info.pm that works for me. However, I'm not sure which tag fields ought be used to detect the charset. (The patch currently uses all tags whose IDs begin with a 'T' or "COM".) Also, it only works if Encode::Detect::Detector is being used, and not Encode::Guess.
cc'ing Dan per Andy
Dan, did you ever see this patch? Do you have an opinion?
Some feedback came in that this patch would add a lot of processing for the common case, and might cause unforseen bugs. I'll continue to keep an eye on this to see if it grows in popularity.