Bug 15776 - 7.4.x scanner is confused by multiple versions of same album
: 7.4.x scanner is confused by multiple versions of same album
Status: REOPENED
Product: Logitech Media Server
Classification: Unclassified
Component: Scanner
: 7.4.2
: All Debian Linux
: -- enhancement (vote)
: Future
Assigned To: Unassigned bug - please assign me!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-24 18:40 UTC by gfawkes115
Modified: 2010-11-11 00:44 UTC (History)
5 users (show)

See Also:
Category: Bug


Attachments
server config/logs (5.15 KB, application/x-gzip)
2010-02-24 18:42 UTC, gfawkes115
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gfawkes115 2010-02-24 18:40:30 UTC
The scanner in the 7.4 series is confused by multiple versions of the same album. I had this problem with 7.4.0 when it was released and reverted to 7.3.2, as it's a show-stopper for me. I tried today's 7.4.2 release and I'm still encountering the same issue. This problem does not exist in 7.3.2.

Here is my test case: create a top-level directory with two subdirectories. One subdirectory contains 13 FLAC-encoded tracks from the stereo version of The Beatles 2009 re-release of Sgt. Pepper's, and the other contains 13 FLAC-encoded tracks from the mono version of The Beatles 2009 re-release of the same album. There are no other files or subdirectories of any kind under the top-level directory (no CUE files, no M3U files, etc.).

The directory names are identical except for the word "stereo" or "mono":

  The Beatles - Sgt. Pepper's Lonely Hearts Club Band (2009 mono remaster)/
  The Beatles - Sgt. Pepper's Lonely Hearts Club Band (2009 stereo remaster)/

All of the FLAC files have been tagged by MusicBrainz Picard version 0.11. Obviously, both of these albums have the exact same track titles, track order, artist, and album name, since they're the same album with the only difference being how they were mastered. Here's an example of the tags: the first section shows the tags on the stereo version of track 1, and the second section shows the tags on the mono version of track 1:

-- The Beatles - Sgt. Pepper's Lonely Hearts Club Band (2009 stereo remaster)/01 - Sgt. Pepper's Lonely Hearts Club Band.flac
- FLAC, 122.89 seconds, 44100 Hz (audio/x-flac)
producer=George Martin
format=CD
releasecountry=XE
label=Apple Records
totaltracks=13
musicbrainz_albumartistid=b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
composer=Paul McCartney
composer=John Lennon
date=2009-09-09
engineer=Geoff Emerick
comment=2009 EMI/Apple stereo remaster
asin=B000002UAU
albumartistsort=Beatles, The
language=eng
script=Latn
title=Sgt. Pepper's Lonely Hearts Club Band
musicbrainz_albumid=44b7cab1-0ce1-404e-9089-b458eb3fa530
releasestatus=official
albumartist=The Beatles
catalognumber=5099969945922
album=Sgt. Pepper's Lonely Hearts Club Band
musicbrainz_artistid=b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
releasetype=album
performer=Paul McCartney (lead vocal)
artist=The Beatles
musicbrainz_trackid=237acd31-db9b-40e0-9263-9659a10ec98f
artistsort=Beatles, The
genre=Classic Rock
tracknumber=1

-- The Beatles - Sgt. Pepper's Lonely Hearts Club Band (2009 mono remaster)/01 - Sgt. Pepper's Lonely Hearts Club Band.flac
- FLAC, 122.51 seconds, 44100 Hz (audio/x-flac)
producer=George Martin
format=CD
releasecountry=XW
label=EMI Records
totaltracks=13
musicbrainz_albumartistid=b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
composer=Paul McCartney
composer=John Lennon
date=2009-09-09
engineer=Geoff Emerick
comment=2009 EMI/Apple mono remaster
asin=B000002UAU
albumartistsort=Beatles, The
language=eng
script=Latn
title=Sgt. Pepper's Lonely Hearts Club Band
musicbrainz_albumid=44b7cab1-0ce1-404e-9089-b458eb3fa530
releasestatus=official
albumartist=The Beatles
catalognumber=PMC 7027
album=Sgt. Pepper's Lonely Hearts Club Band
musicbrainz_artistid=b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
releasetype=album
performer=Paul McCartney (lead vocal)
artist=The Beatles
musicbrainz_trackid=237acd31-db9b-40e0-9263-9659a10ec98f
artistsort=Beatles, The
genre=Classic Rock
tracknumber=1

As you can see, the tags are identical except for the label, release country, catalog number, and a comment that I added manually in Picard so that I could tell the tracks apart in the SqueezeCenter/SqueezeBox Server interface. Again, this is as it should be: both of these albums have the same name, tracks, artist, etc., and the fact that they're mastered differently is of no consequence.

Now, if you set up SqueezeBox Server 7.4.2 on a Debian system using the Debian package from the slimdevices.com repo, and set the music folder to this top-level directory, after the initial scan is finished, you will have 14 albums in your music library: the complete mono version of the album Sgt. Pepper's Lonely Hearts Club Band with 13 tracks; and 13 additional albums titled Sgt. Pepper's Lonely Hearts Club Band, each with exactly 1 unique track from the stereo version of the album. Subsequent rescans (either "look for new" or "start over") make no difference in the outcome.

If you do the same with Squeezecenter 7.3.2, you will get 2 albums, each titled Sgt. Pepper's Lonely Hearts Club Band, each with the proper tracks. You can't tell the difference between the two in the Album browsing interface, but if you drill down and click on any of the tracks, the COMMENT tag will tell you which one it is.

When I run a scan on my full library with 7.4.2, any album for which I have multiple versions -- exact same album name, number of tracks, artist, etc., and the only difference being how the album was encoded, digitized or mastered -- will exhibit this bug. But with 7.3.2 and the same full library, each version shows up as a separate, complete album in the library. (I haven't tried later version of 7.3.x because of other bugs: 7.3.2 is the last version of Squeeze* that has worked perfectly for me.) Note that my 7.4.2 install is brand new: I started from scratch and did not attempt to upgrade my existing 7.3.x install. The only non-Slimdevices plugin I've configured on my 7.4.2 server is the BBCiPlayer plugin: there are no third-party multi-library or custom scanning plugins whatsoever.

I don't think it's a particularly unusual or bizarre situation to have multiple versions of the same album: chances are any Beatles fan will have several copies of at least some of their albums, for example.  People with high-res vinyl rips of certain albums are another example. I could work around this bug by adding the COMMENT field to the album name, but that's a hack: the albums should have an ALBUM tag that matches what you would find in the databases of MusicBrainz, LastFM, Pandora, MusicIP, etc.

I've attached the following files to assist with the debug: my server.prefs, my plugin/state.prefs, my server.log and my scanner.log. The system is an amd64 Debian sid machine that's up-to-date as of 2010-02-24.
Comment 1 gfawkes115 2010-02-24 18:42:03 UTC
Created attachment 6560 [details]
server config/logs
Comment 2 Philip Meyer 2010-02-24 23:55:22 UTC
I have several repeated albums, but do not see this issue.  I found my "same album name by same artist" cases using the following query:

select contributor, titlesort, count(*)
from albums
group by contributor, titlesort
having count(*) > 1

I do not see an album per track.  I don't think I was seeing this when I had 7.4.2 installed.  I am running on Windows.

Have you tried this in 7.5?
Comment 3 gfawkes115 2010-02-25 00:06:31 UTC
(In reply to comment #2)
> I have several repeated albums, but do not see this issue.  I found my "same
> album name by same artist" cases using the following query:
> 
> select contributor, titlesort, count(*)
> from albums
> group by contributor, titlesort
> having count(*) > 1
> 
> I do not see an album per track.  I don't think I was seeing this when I had
> 7.4.2 installed.  I am running on Windows.
> 
> Have you tried this in 7.5?

No, I haven't tried it in 7.5. Has scanning or the schema changed significantly in 7.5?

In any case, I'm trying to stick with release versions, especially since 7.3.2 "just works."
Comment 4 Michael Herger 2010-02-25 00:23:02 UTC
I'm no expert in this field, but I think the musicbrainz_albumid (or any musicbrainz_*id) is taking a very strong role: if this is identical, then it's the same album. Remove it in one of the copies, and you might get the two albums you want. Not that it would help navigating by album title though...
Comment 5 gfawkes115 2010-02-25 03:02:19 UTC
(In reply to comment #4)
> I'm no expert in this field, but I think the musicbrainz_albumid (or any
> musicbrainz_*id) is taking a very strong role: if this is identical, then it's
> the same album. Remove it in one of the copies, and you might get the two
> albums you want. Not that it would help navigating by album title though...

I appreciate the reply, but I don't want to do that. I use the musicbrainz_albumid's to refresh the tags from time to time (the MB database is constantly changing and adding new metadata). I'm also a stickler for proper tags, which is why I insist on not adding version information to the album title in the first place, so removing them would be counter-productive. And if, as you suggest, it won't help browsing by album title, it's not going to be a very useful hack, anyway.

If the scanner is matching tracks to albums using the mb albumid, wouldn't I get one album with multiple copies of each track? That's not what's happening.

The scanner log shows messages like this for each duplicate track:

[10-02-24 18:17:25.1444] Carp::Clan::__ANON__ (216) Warning: DBIx::Class::ResultSet::single(): Query returned more than one row. SQL that returns multiple rows is DEPRECATED for ->find and ->single at /usr/share/perl5/Slim/Schema.pm line 2538

Line 2538 in Schema.pm is tagged with the ominous comment:

# XXX: can return multiple objects

I'm tempted to hack around in there to see if I can fix it, but I'm hoping one of you devs can do that better than I :)

Could it be that this has something to do with a particular version of a Perl module in Debian sid? I was under the impression that new-ish versions of Squeeze* came with all of the packages they need in order to avoid versioning problems, but I don't know for certain. Is that the case?
Comment 6 Michael Herger 2010-02-25 03:52:00 UTC
> I appreciate the reply, but I don't want to do that.
 
Then you're probably out of luck. AFAIK it has a key role, and support for  
it was added to allow having an album spread across multiple folders.
 
But as these indeed are distinct and different albums they should have different IDs anyway. Maybe MB needs an update on these rather rare recordings?
Comment 7 gfawkes115 2010-02-25 04:26:04 UTC
(In reply to comment #6)
> > I appreciate the reply, but I don't want to do that.
> 
> Then you're probably out of luck. AFAIK it has a key role, and support for  
> it was added to allow having an album spread across multiple folders.
> 
> But as these indeed are distinct and different albums they should have
> different IDs anyway. Maybe MB needs an update on these rather rare recordings?

Rare? Bands and publishers have been releasing remastered versions of albums for years. I don't mean "special editions" and the like, just remasters. The example I used in my bug report was from last year's Beatles mono and stereo remasters, which sold about 2.25 million copies in their first four days of release. It's not a stretch to imagine that many people have at least a couple of albums that are identical other than their release dates, and anyone who's tagging those tracks with MusicBrainz is going to run into this problem. MusicBrainz isn't going to change it's entire schema just to accomodate SqueezeBox Server, in any case.

I'm not convinced that the musicbrainz_albumid tag is the problem, anyway. If it were, why wouldn't all the tracks from each version appear in a single album in the library, instead of what's actually happening: one complete album followed by individual albums each containing exactly one track from the other versions? Furthermore, Philip in comment #2 says that he's seeing the desired behavior in Windows.
Comment 8 gfawkes115 2010-02-25 04:27:56 UTC
(In reply to comment #7)

> MusicBrainz isn't going to change it's entire schema just to accomodate
> SqueezeBox Server, in any case.

s/it's/its/
Comment 9 Andy Grundman 2010-02-25 04:43:01 UTC
If you tag files with the same MB Album ID, they are the same album.  This is not a bug.
Comment 10 Andy Grundman 2010-02-25 05:11:24 UTC
OK actually I will leave this open as an enhancement.  The 'right' solution is to check multiple various tags when determining if an album is the same.  Not likely to happen unless someone provides a good patch.  I suggest you talk to Moonbase about this one.
Comment 11 gfawkes115 2010-02-25 05:13:04 UTC
i Andy,

(In reply to comment #9)
> If you tag files with the same MB Album ID, they are the same album.  

The MusicBrainz album ID can't be the whole story. The server must be keying on
the album name tag as well, or else people who are adding "(extended version)"
or "(2009 remaster)" to their titles would be complaining, too, right? If I'm
correct, then the code is already using multiple criteria to decide album
identity. It would be helpful to users like me if we had the option of adding a
track's pathname to the mix so that it could be used to distinguish multiple
versions. I'd happily trade off support for albums spread across multiple
folders, if that's what caused this change.

> This is not a bug.

It is if you relied on the way this scenario behaved in versions from 6.x to
7.3.2, at least. And in any case, your statement that "same MB album ID = same
album" is *not* the behavior I'm seeing. I'm seeing N+1 albums, where N is the
number of tracks on the album. So that's a bug, whether you disagree that
behavior change is a bug or not.

How about an option to enable the old behavior for users who use MusicBrainz
and have multiple versions of the same album(s)? Otherwise, you're putting
those users in a Catch 22 situation: the server uses MusicBrainz album IDs to
group albums, but MusicBrainz gives multiple releases of the same album the
same album ID. I can't satisfy both constraints with the 7.4.x behavior.
Comment 12 gfawkes115 2010-02-25 05:14:54 UTC
(In reply to comment #10)
> OK actually I will leave this open as an enhancement.  The 'right' solution is
> to check multiple various tags when determining if an album is the same.  Not
> likely to happen unless someone provides a good patch.  I suggest you talk to
> Moonbase about this one.

I'd be happy to. Who's Moonbase?
Comment 13 gfawkes115 2010-02-25 05:23:53 UTC
(In reply to comment #12)
> (In reply to comment #10)
> > OK actually I will leave this open as an enhancement.  The 'right' solution is
> > to check multiple various tags when determining if an album is the same.  Not
> > likely to happen unless someone provides a good patch.  I suggest you talk to
> > Moonbase about this one.
> 
> I'd be happy to. Who's Moonbase?

Never mind, I see he's been cc:'ed on this

Moonbase, feel free to get in touch via email. If you think this is in fact the expected behavior and not a bug (c.f. Phil's message in comment #2, he says he's seeing the behavior I want), then if you could point me to the places in the code where I should start looking to make a change, I'm happy to do that. Using the MusicBrainz releasecountry, catalognumber and/or date tags are obvious choices to discriminate between versions, but it's also fairly common for people to make high-res vinyl rips in addition to their CD rips or digital versions of the exact same album, so I think the pathname discriminator is the most straightforward one, though that might break this folder-spanning feature that Michael mentioned.
Comment 14 Jim McAtee 2010-02-25 08:24:47 UTC
(In reply to comment #10)
> The 'right' solution is to check multiple various tags when determining if an
> album is the same.

It's not going to do any good checking tags in this case, as they will all be identical.  You'd have to use the directory path.  I'm not sure where SbS currently stands with permitting tracks from an album to reside in different directories, but it would be impossible with a change like this.
Comment 15 Andy Grundman 2010-02-25 08:31:23 UTC
I meant tags such as release date, release country, catalog number, etc, that MusicBrainz provides.
Comment 16 gfawkes115 2010-02-26 00:33:07 UTC
While diffing Schema.pm in 7.3.2 vs. 7.4.2, I ran across this addition in 7.4.2:

        # Bug 10583 - Also check for MusicBrainz Album Id                                                                       
        my $albumid  = $attributes->{'MUSICBRAINZ_ALBUM_ID'};

That looked suspicious. Bug 10583, filed by moonbase, looks like it was the genesis for supporting tracks from the same album in multiple folders. It also says this:

>  I guess SC wants to compare Album Artist, Album Name, and the complete release
>  date, constructed from both TYER+TDAT. If these dont’ agree or TDAT is
>  empty/non-existent, the match will fail and as many »albums« created as there
>  are tracks on the (real) album!

Getting as many albums as there are tracks is *exactly* the behavior I'm seeing now post-7.3.2. Also, the fix for #10583 was merged for the 7.3.3 release. As I noted in my original report here, I've been sticking with 7.3.2 because of issues I've had with subsequent versions. My memory isn't clear anymore on exactly what broke for me in 7.3.3 -- I only know that I've also tried 7.4.0 and 7.4.2 and experienced the bug I've filed here -- but it makes sense that I would have skipped 7.3.3 after encountering this issue as well.

Obviously, from my perspective, the changeset for #10583 fixed one bug and introduced another. I'll keep digging and see if I can come up with a compromise fix.
Comment 17 Mike Walsh 2010-11-11 00:44:30 UTC
i'm just curious, i don't use musicbrainz, but why does it say that the stereo beatles and the mono beatles are the same?  to me, thats ludicrous.  what about the 1987 versions, does it think they are the same too?

i tend to agree with the earlier comment that said MB's DB is the problem.