Bug 2005 - Database problem with Various Artists and accented characters
: Database problem with Various Artists and accented characters
Status: RESOLVED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Database
: 6.2.0
: PC Windows XP
: P2 normal (vote)
: ---
Assigned To: Dan Sully
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-21 14:38 UTC by sbjaerum
Modified: 2009-09-08 09:30 UTC (History)
0 users

See Also:
Category: ---


Attachments
Tag contents (2.11 KB, text/plain)
2005-08-21 14:39 UTC, sbjaerum
Details
Database dump (16.93 KB, text/plain)
2005-08-21 14:40 UTC, sbjaerum
Details
Flac file (461.99 KB, application/octet-stream)
2005-08-25 11:49 UTC, sbjaerum
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sbjaerum 2005-08-21 14:38:02 UTC
System information:
Slimserver svn revision 4019 running on WinXP.
Whole album flac files with embedded cuesheets and meta information.

I am not at all an expert on this, but I will do my best to describe an
incorrect behavior of the Various Artists functionality when the tags contain
accented characters.

I have a Various Artists album with tags containing accented characters. The tag
contents is provided as attachment 1 [details]. The attached file was generated with the
following command:
metaflac --list --no-utf8-convert --block-number=3
Diverse_Artister-Med_Blanke_Ark-1994.flac > e:\audio\temp\tags.txt

In 'Server Settings' -> 'Behavior' I have 'Group compilation albums together'
enabled.

Attachment 2 [details] contains a database dump after a wipe and rescan.
The attached file was generated with the following command:
echo .dump | sqlite3 slimserversql.db > slimserversql.txt

The problem I observe is as follows:
During the "basic" scan, the accented characters are saved to the database with
an encoding using two bytes.
During the post-processing step to identify the Various Artists album, the
strings are retrieved from the database in a way where the accented characters
are encoded with a single byte. For the tracks with no accented characters, the
encoding doesn't matter and there is a match between the retrieved and the
original strings. The role integer is correctly changed from 1 to 6 (ARTIST to
TRACKARTIST) in the database. However, for the tracks containing accented
characters, the retrieved string with single byte encoding does not match the
original string which uses double byte encoding. The result of this is that
instead of changing the role of the exisiting database entry from 1 to 6, a new
entry is made in the database with a string using a single byte encoding of the
accented characters. As seen the attached databse dump, the tracks containing
accented characters have two entries in the database. One entry with role 1 and
the characters encoded with two bytes, and a second entry with role 6 and the
characters encoded with one byte. The correct behavior should be only one entry
with role 6 and the characters encoded with two bytes.
Comment 1 sbjaerum 2005-08-21 14:39:16 UTC
Created attachment 751 [details]
Tag contents
Comment 2 sbjaerum 2005-08-21 14:40:10 UTC
Created attachment 752 [details]
Database dump
Comment 3 Dan Sully 2005-08-25 10:59:28 UTC
Could you attach the first 100k or so of that FLAC file?

I don't need the whole thing - just the part with the tags.

Thanks.
Comment 4 sbjaerum 2005-08-25 11:49:40 UTC
Created attachment 772 [details]
Flac file

The attached file is a flac file of duration 5 seconds. It emulates a
whole-album file.
The flac file contains internal cuesheet and tags in the numbered vorbis
comment format.
The file is a various artists album where the tags contain Norwegian
characters.
Comment 5 Dan Sully 2005-08-25 12:16:56 UTC
Fixed in subversion change 4071.
Comment 6 sbjaerum 2005-08-25 13:24:26 UTC
Thanks, works now.

Based on the --no-utf8-convert flag available with metaflac, it looks like the
tags will be utf8 encoded UNLESS it is explicitly stated that it should use the
locale encoding. Assuming embedded flac tags to be utf8 encoded therefore seems
to be the correct thing to do.