Bug 2005 – Database problem with Various Artists and accented characters

Bug 2005 - Database problem with Various Artists and accented characters

Summary:

Database problem with Various Artists and accented characters

Status:	RESOLVED FIXED

Product:	Logitech Media Server
Classification:	Unclassified
Component:	Database
Version:	6.2.0
Platform:	PC Windows XP

Importance:	P2 normal (vote)
Target Milestone:	---
Assigned To:	Dan Sully

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2005-08-21 14:38 UTC by sbjaerum
Modified:	2009-09-08 09:30 UTC (History)
CC List:	0 users

See Also:
Category:	---

Attachments
Tag contents (2.11 KB, text/plain) 2005-08-21 14:39 UTC, sbjaerum	Details
Database dump (16.93 KB, text/plain) 2005-08-21 14:40 UTC, sbjaerum	Details
Flac file (461.99 KB, application/octet-stream) 2005-08-25 11:49 UTC, sbjaerum	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description sbjaerum 2005-08-21 14:38:02 UTC

System information:
Slimserver svn revision 4019 running on WinXP.
Whole album flac files with embedded cuesheets and meta information.

I am not at all an expert on this, but I will do my best to describe an
incorrect behavior of the Various Artists functionality when the tags contain
accented characters.

I have a Various Artists album with tags containing accented characters. The tag
contents is provided as attachment 1 [details]. The attached file was generated with the
following command:
metaflac --list --no-utf8-convert --block-number=3
Diverse_Artister-Med_Blanke_Ark-1994.flac > e:\audio\temp\tags.txt

In 'Server Settings' -> 'Behavior' I have 'Group compilation albums together'
enabled.

Attachment 2 [details] contains a database dump after a wipe and rescan.
The attached file was generated with the following command:
echo .dump | sqlite3 slimserversql.db > slimserversql.txt

The problem I observe is as follows:
During the "basic" scan, the accented characters are saved to the database with
an encoding using two bytes.
During the post-processing step to identify the Various Artists album, the
strings are retrieved from the database in a way where the accented characters
are encoded with a single byte. For the tracks with no accented characters, the
encoding doesn't matter and there is a match between the retrieved and the
original strings. The role integer is correctly changed from 1 to 6 (ARTIST to
TRACKARTIST) in the database. However, for the tracks containing accented
characters, the retrieved string with single byte encoding does not match the
original string which uses double byte encoding. The result of this is that
instead of changing the role of the exisiting database entry from 1 to 6, a new
entry is made in the database with a string using a single byte encoding of the
accented characters. As seen the attached databse dump, the tracks containing
accented characters have two entries in the database. One entry with role 1 and
the characters encoded with two bytes, and a second entry with role 6 and the
characters encoded with one byte. The correct behavior should be only one entry
with role 6 and the characters encoded with two bytes.

Comment 1 sbjaerum 2005-08-21 14:39:16 UTC

Created attachment 751 [details]
Tag contents

Comment 2 sbjaerum 2005-08-21 14:40:10 UTC

Created attachment 752 [details]
Database dump

Comment 3 Dan Sully 2005-08-25 10:59:28 UTC

Could you attach the first 100k or so of that FLAC file?

I don't need the whole thing - just the part with the tags.

Thanks.

Comment 4 sbjaerum 2005-08-25 11:49:40 UTC

Created attachment 772 [details]
Flac file

The attached file is a flac file of duration 5 seconds. It emulates a
whole-album file.
The flac file contains internal cuesheet and tags in the numbered vorbis
comment format.
The file is a various artists album where the tags contain Norwegian
characters.

Comment 5 Dan Sully 2005-08-25 12:16:56 UTC

Fixed in subversion change 4071.

Comment 6 sbjaerum 2005-08-25 13:24:26 UTC

Thanks, works now.

Based on the --no-utf8-convert flag available with metaflac, it looks like the
tags will be utf8 encoded UNLESS it is explicitly stated that it should use the
locale encoding. Assuming embedded flac tags to be utf8 encoded therefore seems
to be the correct thing to do.