Bugzilla – Bug 17456
Persistent database adds new entries instead of updating existing
Last modified: 2011-08-17 21:51:55 UTC
This has been tested with the release candidate for 7.6.1 where it doesn't work, it works properly in 7.6.0 so it's some change between 7.6.0 and 7.6.1 which is causing the problems. In 7.6.1 release candidate the persist.db only seems to add new entries every time a full rescan occurs, the consequence is at least: - Extra entries are created in the persist.db every rescan which will make it eventually fill the disk - Users of TrackStat won't be able to use this version of Squeezebox Server as this plugin relies on the persist.db being correct. - It will fill the server.log with messages like this one every time someone access something that shows information from the persist.db, for example the Song Info page: Warning: [07:41:04.0171] DBIx::Class::ResultSet::single(): Query returned more than one row. SQL that returns multiple rows is DEPRECATED for ->find and ->single at /opt/squeezecenter76_empty/Slim/Schema/Track.pm line 547 It looks like the duplicated rows looks exactly the same except for the id and added column with differs. The duplication only happens for scanned tracks, so a rescan for new/changed files which doesn't find any changes won't create any duplicates.
When correcting this, remember that there are a lot of existing users who might have a lot of extra entries in their persist.db because they have been forced to try 7.6.1 beta versions since the official 7.6.0 didn't work. So it's probably appropriate if some kind of database clean-up could also be done. Also, remember that the data in persist.db is supposed to survive upgrades, so don't simply delete it in the installation program unless it's possible to restore the data.
Oh, and just to make sure it isn't missed, the added column isn't supposed to be updated when you do a full rescan, the purpose of having the adding column in the persist.db file is because we want it to survive a full rescan.
Andy - I was easily able to reproduce this, but can't figure out what might be causing it.
This is related to change r33019, because it works in r33018 but it doesn't work in r33019.
Created attachment 7401 [details] Patch that fixes the problem in the code The attached patch will fix the issue in the code. It does not handle the fact that a lot of beta testers currently have a persist.db with a lot of duplicated rows but at least it stops even more users to be affected by this.
Thanks for figuring this out. Could you please try this patch? Index: TrackPersistent.pm ===================================================== --- TrackPersistent.pm (revision 33102) +++ TrackPersistent.pm (working copy) @@ -90,7 +90,7 @@ WHERE ( urlmd5 = ? OR musicbrainz_id = ? ) } ); - $sth->execute( $mbid, $urlmd5 ); + $sth->execute( $urlmd5, $mbid ); } else { This should take care of the wrong search, leading to the duplicates. But it doesn't fix the existing duplicates yet.
== Auto-comment from SVN commit #33103 to the slim repo by mherger == == http://svn.slimdevices.com/slim?view=revision&revision=33103 == Bug: 17456 Description: fix the order in which we pass the parameters to the query
What about the following to clean up the duplicates? -- create temporary table to de-duplicate persistent data DROP TABLE IF EXISTS temp; CREATE TEMPORARY TABLE temp AS SELECT * FROM persistentdb.tracks_persistent; DELETE FROM persistentdb.tracks_persistent; -- try to consolidate duplicates INSERT INTO persistentdb.tracks_persistent ( url, musicbrainz_id, added, rating, playCount, lastPlayed, urlmd5 ) SELECT MAX(url), MAX(musicbrainz_id), MIN(added), MAX(rating), SUM(playCount), MAX(lastPlayed), MAX(urlmd5) FROM temp GROUP BY urlmd5, musicbrainz_id; DROP TABLE temp;
The SQL script would potentially result in wrong ratings (the max. is being used, not the most recent), but that's probably something we can live with. Only ratings changes applied within the last week with the latest nightly build would be ignored.
(In reply to comment #9) > The SQL script would potentially result in wrong ratings (the max. is being > used, not the most recent), but that's probably something we can live with. > Only ratings changes applied within the last week with the latest nightly build > would be ignored. Is there a reason you used SUM instead of MAX for play count ? I didn't look how the duplicates look like, do the newly created duplicate always have playcount=0 ? In that case SUM should work, else MAX is probably more correct. Have you tried the SQL in a larger database to make sure it doesn't take too long time to execute ? A user with a 50 000 track library who have performed 5 full rescans will have about 250 000 records in the tracks_persistent table and the temporary table you create doesn't have any indexes, so we probably should try it in a larger library before committing it.
> Is there a reason you used SUM instead of MAX for play count ? Yes: if there are multiple entries, the old one would potentially have the old count, the new one just the new counts. > I didn't look how the duplicates look like, do the newly created duplicate > always have playcount=0 ? Don't know. > In that case SUM should work, else MAX is probably more correct. The duplicates are created as new rows with empty values. They're not duplicated from an existing record. Thus they would start at 0. > Have you tried the SQL in a larger database to make sure it doesn't take too > long time to execute ? A user with a 50 000 track library who have performed 5 > full rescans will have about 250 000 records in the tracks_persistent table and > the temporary table you create doesn't have any indexes, so we probably should > try it in a larger library before committing it. No, I didn't. TANSTAAFL. 3k tracks are handled in a fraction of a second on my MacBook. And this should only be run once. I wonder about the trade-off of creating the indexes for this one-off action vs. just running it without any index. It might not be worth it.
(In reply to comment #11) > > > Have you tried the SQL in a larger database to make sure it doesn't take too > > long time to execute ? A user with a 50 000 track library who have performed 5 > > full rescans will have about 250 000 records in the tracks_persistent table and > > the temporary table you create doesn't have any indexes, so we probably should > > try it in a larger library before committing it. > > No, I didn't. TANSTAAFL. 3k tracks are handled in a fraction of a second on my > MacBook. And this should only be run once. > > I wonder about the trade-off of creating the indexes for this one-off action > vs. just running it without any index. It might not be worth it. As long as we are talking about a few seconds or even a minute we should be fine, it just needs to be fast enough so the user doesn't start to reboot computer or kill process because it takes to long for SBS to startup. I was just asking because I know from own experience that some queries with a few 100 000 rows can take minutes or even longer without indexes, but I think all I've had this problem with have been with joins and this isn't the case here.
If you have a large persist.db, you could give it a test ride: copy the file, and run the SQL in eg. the SQLiteManager Add-in in Firefox.
(In reply to comment #13) > If you have a large persist.db, you could give it a test ride: copy the file, > and run the SQL in eg. the SQLiteManager Add-in in Firefox. I've tried it with SQLiteManager with a 50 000 tracks database on my 2GHz Macbook and then the script takes about 7-8 seconds to execute. Feels acceptable to me since it should only execute once.
I checked in the SQL script which will hopefully fix this issue. Please let me know how it works for you. Thanks!
(In reply to comment #15) > I checked in the SQL script which will hopefully fix this issue. Please let me > know how it works for you. Thanks! Seems to work as far as I can see.
so this will be part of 7.6.1 official release?
Yes.
Fixed in 33105.