Bug 17456 – Persistent database adds new entries instead of updating existing

Bug 17456 - Persistent database adds new entries instead of updating existing

Summary:

Persistent database adds new entries instead of updating existing

Status:	RESOLVED FIXED

Product:	Logitech Media Server
Classification:	Unclassified
Component:	Scanner
Version:	7.6.0
Platform:	PC Other

Importance:	P2 major with 5 votes (vote)
Target Milestone:	7.6.1
Assigned To:	Michael Herger

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-08-16 22:54 UTC by Erland Isaksson
Modified:	2011-08-17 21:51 UTC (History)
CC List:	2 users (show)

See Also:
Category:	---

Attachments
Patch that fixes the problem in the code (419 bytes, patch) 2011-08-17 04:16 UTC, Erland Isaksson	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Erland Isaksson 2011-08-16 22:54:21 UTC

This has been tested with the release candidate for 7.6.1 where it doesn't work, it works properly in 7.6.0 so it's some change between 7.6.0 and 7.6.1 which is causing the problems.

In 7.6.1 release candidate the persist.db only seems to add new entries every time a full rescan occurs, the consequence is at least:
- Extra entries are created in the persist.db every rescan which will make it eventually fill the disk
- Users of TrackStat won't be able to use this version of Squeezebox Server as this plugin relies on the persist.db being correct.
- It will fill the server.log with messages like this one every time someone access something that shows information from the persist.db, for example the Song Info page:

Warning: [07:41:04.0171] DBIx::Class::ResultSet::single(): Query returned more than one row.  SQL that returns multiple rows is DEPRECATED for ->find and ->single at /opt/squeezecenter76_empty/Slim/Schema/Track.pm line 547


It looks like the duplicated rows looks exactly the same except for the id and added column with differs.

The duplication only happens for scanned tracks, so a rescan for new/changed files which doesn't find any changes won't create any duplicates.

Comment 1 Erland Isaksson 2011-08-16 23:01:31 UTC

When correcting this, remember that there are a lot of existing users who might have a lot of extra entries in their persist.db because they have been forced to try 7.6.1 beta versions since the official 7.6.0 didn't work. 

So it's probably appropriate if some kind of database clean-up could also be done.

Also, remember that the data in persist.db is supposed to survive upgrades, so don't simply delete it in the installation program unless it's possible to restore the data.

Comment 2 Erland Isaksson 2011-08-16 23:09:53 UTC

Oh, and just to make sure it isn't missed, the added column isn't supposed to be updated when you do a full rescan, the purpose of having the adding column in the persist.db file is because we want it to survive a full rescan.

Comment 3 Michael Herger 2011-08-17 03:15:52 UTC

Andy - I was easily able to reproduce this, but can't figure out what might be causing it.

Comment 4 Erland Isaksson 2011-08-17 03:45:43 UTC

This is related to change r33019, because it works in r33018 but it doesn't work in r33019.

Comment 5 Erland Isaksson 2011-08-17 04:16:17 UTC

Created attachment 7401 [details]
Patch that fixes the problem in the code

The attached patch will fix the issue in the code.

It does not handle the fact that a lot of beta testers currently have a persist.db with a lot of duplicated rows but at least it stops even more users to be affected by this.

Comment 6 Michael Herger 2011-08-17 04:27:21 UTC

Thanks for figuring this out. Could you please try this patch?

Index: TrackPersistent.pm
=====================================================
--- TrackPersistent.pm	(revision 33102)
+++ TrackPersistent.pm	(working copy)
@@ -90,7 +90,7 @@
 			WHERE (	urlmd5 = ? OR musicbrainz_id = ? )
 		} );
 		
-		$sth->execute( $mbid, $urlmd5 );
+		$sth->execute( $urlmd5, $mbid );
 	}
 	else {

This should take care of the wrong search, leading to the duplicates. But it doesn't fix the existing duplicates yet.

Comment 7 SVN Bot 2011-08-17 04:29:00 UTC

 == Auto-comment from SVN commit #33103 to the slim repo by mherger ==
 == http://svn.slimdevices.com/slim?view=revision&revision=33103 ==

Bug: 17456
Description: fix the order in which we pass the parameters to the query

Comment 8 Michael Herger 2011-08-17 05:33:10 UTC

What about the following to clean up the duplicates?

-- create temporary table to de-duplicate persistent data
DROP TABLE IF EXISTS temp;
CREATE TEMPORARY TABLE temp 
	AS SELECT * FROM persistentdb.tracks_persistent;

DELETE FROM persistentdb.tracks_persistent;

-- try to consolidate duplicates
INSERT INTO persistentdb.tracks_persistent (
	url,
	musicbrainz_id,
	added,
	rating,
	playCount,
	lastPlayed,
	urlmd5
) SELECT 
	MAX(url), 
	MAX(musicbrainz_id), 
	MIN(added), 
	MAX(rating), 
	SUM(playCount), 
	MAX(lastPlayed), 
	MAX(urlmd5)
FROM temp GROUP BY urlmd5, musicbrainz_id;

DROP TABLE temp;

Comment 9 Michael Herger 2011-08-17 05:36:24 UTC

The SQL script would potentially result in wrong ratings (the max. is being used, not the most recent), but that's probably something we can live with. Only ratings changes applied within the last week with the latest nightly build would be ignored.

Comment 10 Erland Isaksson 2011-08-17 05:55:02 UTC

(In reply to comment #9)
> The SQL script would potentially result in wrong ratings (the max. is being
> used, not the most recent), but that's probably something we can live with.
> Only ratings changes applied within the last week with the latest nightly build
> would be ignored.

Is there a reason you used SUM instead of MAX for play count ?
I didn't look how the duplicates look like, do the newly created duplicate always have playcount=0 ?
In that case SUM should work, else MAX is probably more correct.

Have you tried the SQL in a larger database to make sure it doesn't take too long time to execute ? A user with a 50 000 track library who have performed 5 full rescans will have about 250 000 records in the tracks_persistent table and the temporary table you create doesn't have any indexes, so we probably should try it in a larger library before committing it.

Comment 11 Michael Herger 2011-08-17 06:00:22 UTC

> Is there a reason you used SUM instead of MAX for play count ?

Yes: if there are multiple entries, the old one would potentially have the old count, the new one just the new counts.

> I didn't look how the duplicates look like, do the newly created duplicate
> always have playcount=0 ?

Don't know.

> In that case SUM should work, else MAX is probably more correct.

The duplicates are created as new rows with empty values. They're not duplicated from an existing record. Thus they would start at 0.

> Have you tried the SQL in a larger database to make sure it doesn't take too
> long time to execute ? A user with a 50 000 track library who have performed 5
> full rescans will have about 250 000 records in the tracks_persistent table and
> the temporary table you create doesn't have any indexes, so we probably should
> try it in a larger library before committing it.

No, I didn't. TANSTAAFL. 3k tracks are handled in a fraction of a second on my MacBook. And this should only be run once.

I wonder about the trade-off of creating the indexes for this one-off action vs. just running it without any index. It might not be worth it.

Comment 12 Erland Isaksson 2011-08-17 06:09:09 UTC

(In reply to comment #11)
> 
> > Have you tried the SQL in a larger database to make sure it doesn't take too
> > long time to execute ? A user with a 50 000 track library who have performed 5
> > full rescans will have about 250 000 records in the tracks_persistent table and
> > the temporary table you create doesn't have any indexes, so we probably should
> > try it in a larger library before committing it.
> 
> No, I didn't. TANSTAAFL. 3k tracks are handled in a fraction of a second on my
> MacBook. And this should only be run once.
> 
> I wonder about the trade-off of creating the indexes for this one-off action
> vs. just running it without any index. It might not be worth it.

As long as we are talking about a few seconds or even a minute we should be fine, it just needs to be fast enough so the user doesn't start to reboot computer or kill process because it takes to long for SBS to startup. I was just asking because I know from own experience that some queries with a few 100 000 rows can take minutes or even longer without indexes, but I think all I've had this problem with have been with joins and this isn't the case here.

Comment 13 Michael Herger 2011-08-17 06:10:56 UTC

If you have a large persist.db, you could give it a test ride: copy the file, and run the SQL in eg. the SQLiteManager Add-in in Firefox.

Comment 14 Erland Isaksson 2011-08-17 06:48:37 UTC

(In reply to comment #13)
> If you have a large persist.db, you could give it a test ride: copy the file,
> and run the SQL in eg. the SQLiteManager Add-in in Firefox.

I've tried it with SQLiteManager with a 50 000 tracks database on my 2GHz Macbook and then the script takes about 7-8 seconds to execute. Feels acceptable to me since it should only execute once.

Comment 15 Michael Herger 2011-08-17 08:36:00 UTC

I checked in the SQL script which will hopefully fix this issue. Please let me know how it works for you. Thanks!

Comment 16 Erland Isaksson 2011-08-17 10:34:08 UTC

(In reply to comment #15)
> I checked in the SQL script which will hopefully fix this issue. Please let me
> know how it works for you. Thanks!

Seems to work as far as I can see.

Comment 17 Mike Walsh 2011-08-17 12:12:52 UTC

so this will be part of 7.6.1 official release?

Comment 18 Andy Grundman 2011-08-17 12:13:45 UTC

Yes.

Comment 19 Michael Herger 2011-08-17 21:51:55 UTC

Fixed in 33105.