Bugzilla – Bug 2005
Database problem with Various Artists and accented characters
Last modified: 2009-09-08 09:30:09 UTC
System information: Slimserver svn revision 4019 running on WinXP. Whole album flac files with embedded cuesheets and meta information. I am not at all an expert on this, but I will do my best to describe an incorrect behavior of the Various Artists functionality when the tags contain accented characters. I have a Various Artists album with tags containing accented characters. The tag contents is provided as attachment 1 [details]. The attached file was generated with the following command: metaflac --list --no-utf8-convert --block-number=3 Diverse_Artister-Med_Blanke_Ark-1994.flac > e:\audio\temp\tags.txt In 'Server Settings' -> 'Behavior' I have 'Group compilation albums together' enabled. Attachment 2 [details] contains a database dump after a wipe and rescan. The attached file was generated with the following command: echo .dump | sqlite3 slimserversql.db > slimserversql.txt The problem I observe is as follows: During the "basic" scan, the accented characters are saved to the database with an encoding using two bytes. During the post-processing step to identify the Various Artists album, the strings are retrieved from the database in a way where the accented characters are encoded with a single byte. For the tracks with no accented characters, the encoding doesn't matter and there is a match between the retrieved and the original strings. The role integer is correctly changed from 1 to 6 (ARTIST to TRACKARTIST) in the database. However, for the tracks containing accented characters, the retrieved string with single byte encoding does not match the original string which uses double byte encoding. The result of this is that instead of changing the role of the exisiting database entry from 1 to 6, a new entry is made in the database with a string using a single byte encoding of the accented characters. As seen the attached databse dump, the tracks containing accented characters have two entries in the database. One entry with role 1 and the characters encoded with two bytes, and a second entry with role 6 and the characters encoded with one byte. The correct behavior should be only one entry with role 6 and the characters encoded with two bytes.
Created attachment 751 [details] Tag contents
Created attachment 752 [details] Database dump
Could you attach the first 100k or so of that FLAC file? I don't need the whole thing - just the part with the tags. Thanks.
Created attachment 772 [details] Flac file The attached file is a flac file of duration 5 seconds. It emulates a whole-album file. The flac file contains internal cuesheet and tags in the numbered vorbis comment format. The file is a various artists album where the tags contain Norwegian characters.
Fixed in subversion change 4071.
Thanks, works now. Based on the --no-utf8-convert flag available with metaflac, it looks like the tags will be utf8 encoded UNLESS it is explicitly stated that it should use the locale encoding. Assuming embedded flac tags to be utf8 encoded therefore seems to be the correct thing to do.