Bugzilla – Bug 14732
id3v2.3 with UTF-8 characters scanned incorrectly
Last modified: 2009-10-13 11:20:30 UTC
Created attachment 6090 [details] mp3 file with id3v2.3 tags from id3v2 command line program I recently upgraded from 7.3.3 to 7.4. Something is going wrong in between tag reading and insertion into the database that is messing up accented characters (anything outside standard ascii, from what I can tell). All of my music is tagged with id3v2.3 tags as generated by the id3v2 command-line tool. I've attached a shortened mp3 as an example: $ id3v2 -l testfile.mp3 id3v2 tag info for testfile.mp3: TPE1 (Lead performer(s)/Soloist(s)): Björk TALB (Album/Movie/Show title): Debut TIT2 (Title/songname/content description): Human Behavior COMM (Comments): (ENCODING)[]: EAC 0.99pb5 / LAME v3.98.2 -V2 --noreplaygain / mp3gain TYER (Year): 1993 TRCK (Track number/Position in set): 1 TPUB (Publisher): Elektra COMM (Comments): (LABELNO)[]: 61468-2 COMM (Comments): (MP3GAIN)[]: 5 When this is scanned into the database, the artist is saved as "Björk." Other artist have similar problems: Fauré, Gabriel => Fauré, Gabriel µ-ziq => µ-ziq I know id3v2.3 does not use utf8 for tag data, but it was working fine in 7.3.3. Any suggestions on things to try here? The linux server and database are both running utf8: mysql variables: collation connection utf8_general_ci collation database utf8_general_ci collation server utf8_general_ci server information: Version: 7.4.0 - r28672 @ Mon Sep 28 17:52:57 PDT 2009 Hostname: htpc Server IP Address: 192.168.1.7 Server HTTP Port Number: 9000 Operating system: Debian - EN - utf8 Platform Architecture: i686-linux Perl Version: 5.10.0 - i686-linux MySQL Version: 5.0.75-0ubuntu10.2 I can provide a detailed scan log if necessary.
possible related to bug 14728
As you said, ID3v2.3 does not support UTF-8. You'll need to re-tag with ID3v2.4 or switch to UTF-16 which is supported by v2.3. From the v2.3 spec: If nothing else is said a string is represented as ISO-8859-1 [ISO-8859-1] characters in the range $20 - $FF. Such strings are represented as <text string>, or <full text string> if newlines are allowed, in the frame descriptions. All Unicode strings [UNICODE] use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE FF) to identify the byte order.
This is strange as these characters in this type of tag have worked with Slimserver for many years. I will just have to substitute regular ascii characters for them.
What app did you use to tag these files? You don't need to substitute ASCII characters, ID3v2.3 is perfectly capable of storing any Unicode character you want, they just need to be stored using UTF-16 encoding.
For the benefit of anyone looking at the bug report in the future ... looks like the id3v2 command line tool is obsolete. The eyeD3 tool is a good replacement. I was able to redo my problem files with eyeD3 and these options: --to-v2.3 --set-encoding=utf16-LE --no-tagging-time-frame and it looks like these tags are compatible with squeezebox server and all my various portable players. I have some portables that don't read id3v2.4 so that is why I am stuck with v2.3.
Great, eyeD3 is a good tool and I use it a lot for testing. :)