Bug 9868 - MP3::Info should use all tag fields to determine charset
: MP3::Info should use all tag fields to determine charset
Status: NEW
Product: Logitech Media Server
Classification: Unclassified
Component: Tagging
: unspecified
: PC All
: -- normal with 1 vote (vote)
: Future
Assigned To: Unassigned bug - please assign me!
: charset_issues
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-01 06:17 UTC by Kensaku Yamaguchi
Modified: 2008-12-22 15:54 UTC (History)
3 users (show)

See Also:
Category: ---


Attachments
patch for lib/MP3/Info.pm (1.89 KB, patch)
2008-11-01 06:17 UTC, Kensaku Yamaguchi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kensaku Yamaguchi 2008-11-01 06:17:53 UTC
Created attachment 4193 [details]
patch for lib/MP3/Info.pm

Some artist, album and track names in my library with Japanese characters get displayed incorrectly by SqueezeCenter and SqueezeBox Controller.
For example, an artist tag that says "Ozaki Yutaka" (see http://en.wikipedia.org/wiki/Yutaka_Ozaki for actual kanji characters) in Shift_JIS encoding will get displayed in SqueezeCenter with garbled characters.

It appears that the MP3::Info module used to extract tags from MP3 files is not guessing the correct character set for tag values.
MP3::Info uses Encode::Detect::Detector (if available) to detect the charset for each individual tag string, but a single artist or album tag is usually too short for it to reliably detect the charset.
A typical Japanese artist name consists of only four characters, and I suppose the situation is similar for Chinese names, too.

I wish to suggest that, MP3::Info could identify tag charsets more accurately if it used all of the tags in a track to detect which charset is being used in those tags.
This fix won't solve the problem entirely, because some tracks use CJK characters only in the artist tag (for example) and Latin characters in the other tags, but for most files it would be a great improvement.

I've attached a patch for Info.pm that works for me.
However, I'm not sure which tag fields ought be used to detect the charset.
(The patch currently uses all tags whose IDs begin with a 'T' or "COM".)
Also, it only works if Encode::Detect::Detector is being used, and not Encode::Guess.
Comment 1 Chris Owens 2008-11-10 09:16:27 UTC
cc'ing Dan per Andy
Comment 2 Chris Owens 2008-12-22 09:38:10 UTC
Dan, did you ever see this patch?  Do you have an opinion?  
Comment 3 Chris Owens 2008-12-22 15:54:08 UTC
Some feedback came in that this patch would add a lot of processing for the common case, and might cause unforseen bugs.

I'll continue to keep an eye on this to see if it grows in popularity.