Bugzilla – Bug 18153
Duplicated album/contributor names appearing in DB - Unicode NFC vs NFD
Last modified: 2017-05-12 14:15:59 UTC
Created attachment 7760 [details] Proposed patch Some unicode characters may be represented in more than one way (typically accented characters also in the ISO 8859-1 set). If/when this occurs, we can end up with duplicated artist/album references in the DB, according to the particular sequence of Unicode characters used in a given tag. This is irritating. Normalizing all relevant tags to a single 'canonical' form avoids this. NFC appears to be the most commonly used form, so using that form is likely to be most efficient. This is a somewhat 'rare' occurrence, but it does/did happen to me. The attached proof of concept patch has, in my case, resolved matters rather well. Completely, actually. I've been using it for about two years without any obvious problem. The patch applies to LMS 7.8, but may not cleanly apply to 7.9. I can probably amend it so that it does, if there's any interest. Another way of dealing with the problem is, of course, to simply retag the offending files and hope for the best ! But tedious, and not guaranteed to work. One possible source, in my case, is that tags may have been derived from file names. I use both Mac OSX and Linux. Mac stores file names using NFD (I believe) and Linux uses NFC. Of course, it may simply be that tags have been created by different tag editors, with differing ideas on how to do it.