Bug 18153 - Duplicated album/contributor names appearing in DB - Unicode NFC vs NFD
: Duplicated album/contributor names appearing in DB - Unicode NFC vs NFD
Status: UNCONFIRMED
Product: Logitech Media Server
Classification: Unclassified
Component: Database
: 7.8.0
: All Debian Linux
: -- normal (vote)
: ---
Assigned To: Unassigned bug - please assign me!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-05-12 14:15 UTC by Martin Williams
Modified: 2017-05-12 14:15 UTC (History)
0 users

See Also:
Category: ---


Attachments
Proposed patch (2.02 KB, patch)
2017-05-12 14:15 UTC, Martin Williams
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Williams 2017-05-12 14:15:05 UTC
Created attachment 7760 [details]
Proposed patch

Some unicode characters may be represented in more than one way
(typically accented characters also in the ISO 8859-1 set).

If/when this occurs, we can end up with duplicated artist/album
references in the DB, according to the particular sequence of
Unicode characters used in a given tag. This is irritating.

Normalizing all relevant tags to a single 'canonical' form
avoids this. NFC appears to be the most commonly used form,
so using that form is likely to be most efficient.

This is a somewhat 'rare' occurrence, but it does/did happen to
me. The attached proof of concept patch has, in my case, resolved
matters rather well. Completely, actually. I've been using it for
about two years without any obvious problem.

The patch applies to LMS 7.8, but may not cleanly apply to 7.9. I
can probably amend it so that it does, if there's any interest.


Another way of dealing with the problem is, of course, to simply
retag the offending files and hope for the best ! But tedious,
and not guaranteed to work.


One possible source, in my case, is that tags may have been derived
from file names. I use both Mac OSX and Linux. Mac stores file names
using NFD (I believe) and Linux uses NFC. Of course, it may simply
be that tags have been created by different tag editors, with
differing ideas on how to do it.