Bug 8029 – Database indexing and SQL optimisation

Bug 8029 - Database indexing and SQL optimisation

Summary:

Database indexing and SQL optimisation

Status:	RESOLVED INVALID

Product:	Logitech Media Server
Classification:	Unclassified
Component:	Database
Version:	7.4.0
Platform:	Infrant ReadyNAS Other

Importance:	-- enhancement (vote)
Target Milestone:	New Schema
Assigned To:	Brandon Black

URL:
Keywords:	new_schema

Depends on:	8303
Blocks:
	Show dependency tree / graph

Reported:	2008-05-01 08:09 UTC by Pete Larrett
Modified:	2014-08-08 15:38 UTC (History)
CC List:	3 users (show)

See Also:
Category:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pete Larrett 2008-05-01 08:09:18 UTC

Currently running SqueezeCenter Version: 7.0.1 - 19294 @ Thu May 1 00:21:56 PDT 2008 - Netgear RAIDiator - EN - utf8

On the Slim Services forum, I was given a pointer to look at the database and some of the commonly used SQL. This was because I metioned that my day job is a mainframe DB2 Database Administrator and somebody said it might be an idea to have a look.

I couldn't see which indexes are currently set up but I would strongly recommend all of the 'Number 1' and 'Number 2' indexes listed below and as many of the others as you can afford. Don't forget that there is a performance overhead when a new item is added to the database but, at the moment, the scanning process takes so long as to make this negligible. The real benefit comes when retrieving data and these indexes will definitely help most, if not all of the SELECT statements in the document I saw. Also, look at the note following the indexes about 'many-to-many' tables. This confirms, to me, the behaviour I have been seeing since I first got the software - Basically, the bigger the database, the slower it gets.

Good luck

Table name with proposed indexes underneath
-------------------------------------------
contributors
IX1 id,namesort
IX2 namesort

contributor_album
IX1 contributor,role
IX2 role

albums
IX1 id,contributor,titlesort,compilation,year,disc,des c
IX2 compilation
IX3 titlesort,disc
IX4 Year,desc,titlesort,disc

tracks
IX1 album,audio,titlesort,disc,tracknum
IX2 audio

genre_track
IX1 genre,track

genres
IX1 id,name
IX2 name

I've also spotted in the documentation that several 'many-to-many' tables are created and maintained. These are an enormous I/O overhead and, if they are not indexed, will give exactly the performance log-jam we are experiencing.

Basically, the bigger the library, the longer each operation will take as the program has to do a table-scan every time.

I would strongly suggest removing the 'many-to-many' tables and all the associated creation and maintenance and applying indexes to the main tables. Then MySQL can take care of I/O in a far more efficient and much faster way.

Try running an EXPLAIN on some of the SQL statements before and after Indexing. You should see a HUGE difference.

Comment 1 KDF 2008-05-01 09:36:02 UTC

new reports should leave target and priority undefined as this will flag for review.

Can you add the link to the forum thread in case any takers want quick access to any prior discussion. thanks.

Comment 2 Pete Larrett 2008-05-01 11:44:21 UTC

(In reply to comment #0)
> Currently running SqueezeCenter Version: 7.0.1 - 19294 @ Thu May 1 00:21:56 PDT
> 2008 - Netgear RAIDiator - EN - utf8
> 
> On the Slim Services forum, I was given a pointer to look at the database and
> some of the commonly used SQL. This was because I metioned that my day job is a
> mainframe DB2 Database Administrator and somebody said it might be an idea to
> have a look.
> 
> I couldn't see which indexes are currently set up but I would strongly
> recommend all of the 'Number 1' and 'Number 2' indexes listed below and as many
> of the others as you can afford. Don't forget that there is a performance
> overhead when a new item is added to the database but, at the moment, the
> scanning process takes so long as to make this negligible. The real benefit
> comes when retrieving data and these indexes will definitely help most, if not
> all of the SELECT statements in the document I saw. Also, look at the note
> following the indexes about 'many-to-many' tables. This confirms, to me, the
> behaviour I have been seeing since I first got the software - Basically, the
> bigger the database, the slower it gets.
> 
> Good luck
> 
> Table name with proposed indexes underneath
> -------------------------------------------
> contributors
> IX1 id,namesort
> IX2 namesort
> 
> contributor_album
> IX1 contributor,role
> IX2 role
> 
> albums
> IX1 id,contributor,titlesort,compilation,year,disc,des c
> IX2 compilation
> IX3 titlesort,disc
> IX4 Year,desc,titlesort,disc
> 
> tracks
> IX1 album,audio,titlesort,disc,tracknum
> IX2 audio
> 
> genre_track
> IX1 genre,track
> 
> genres
> IX1 id,name
> IX2 name
> 
> I've also spotted in the documentation that several 'many-to-many' tables are
> created and maintained. These are an enormous I/O overhead and, if they are not
> indexed, will give exactly the performance log-jam we are experiencing.
> 
> Basically, the bigger the library, the longer each operation will take as the
> program has to do a table-scan every time.
> 
> I would strongly suggest removing the 'many-to-many' tables and all the
> associated creation and maintenance and applying indexes to the main tables.
> Then MySQL can take care of I/O in a far more efficient and much faster way.
> 
> Try running an EXPLAIN on some of the SQL statements before and after Indexing.
> You should see a HUGE difference.
> 

The forums I am referring to are:

http://www.readynas.com/forum/viewtopic.php?f=22&t=16965&st=0&sk=t&sd=a&start=150
http://forums.slimdevices.com/showthread.php?t=45261&page=4
http://www.audiocircle.com/circles/index.php?topic=51863.new#new

Comment 3 Pete Larrett 2008-05-07 01:20:23 UTC

I was pointed to a different Wiki for the database structure - http://wiki.slimdevices.com/index.ph...abaseStructure - but I saw nothing there that changed my opinion.

I still see the 'Many-to-Many' tables in there. The problem with these is that they have to be created and maintained, which takes time and slows down the scanning process. Also, when they are being read, they are very inefficient as they will be doing table scans and the larger the database, the longer it takes - exponentially!

By removing these and replacing them with indexes on the main tables, the creation I-O will, almost certainly, be reduced during scanning and the access paths will be vastly improved and oblivious to the size of the database.

Again, I would strongly suggest creating a clone of the database, remove the 'Many-to-Many' tables. Create the Indexes and run an EXPLAIN of the SQL against the old and new databases. The difference should be obvious.
Edit/Delete Message

Comment 4 Jim McAtee 2008-05-07 02:52:40 UTC

What would be the purpose of an index such as id,namesort on contributors?

Comment 5 Pete Larrett 2008-05-07 03:22:13 UTC

To be honest, the ID columns MAY not need to be indexed (or even required - I haven't had that much time to analyze it) but statements like this:

select contributors.* from contributors,contributor_album,albums
    where
        contributors.id=contributor_album.contributor and
        contributor_album.role in (1,5,6) and
        contributor_album.album=albums.id and
        albums.compilation is null
    group by contributors.id
    order by contributors.namesort

could be re-written as:

select contributors.* 
  from contributors,
       albums
    where
        contributors.namesort=albums.contributor and
        album.role in (1,5,6) and
        albums.compilation is null
    group by contributors.id
    order by contributors.namesort

I'm assuming I've got the correct column names here as the Wiki doesn't go that deep but the second statement above would only require indexes on contributors.namesor and albums.contributor.

If there is any more info on columns and datatypes, please let me know.

Comment 6 Pete Larrett 2008-05-07 23:12:43 UTC

It seems to me that there are 2 distinct functions to the SqueezeCenter application. One is to gather the information about all the tracks in the library and the other is to access that information to play the track (i.e. access and run the file).

In fact, the application seems to be trying to do everything at the same time EVERY time it runs. That is where the processing wastage is arising. Let's treat these seperately:

SCAN
----

In this part of the exercise the application should scan every track, playlist, favourite etc. in the library and build the database. Obviously, this is going to be quite a long process but once the tables have been built they should be easily accessible if they are well designed and indexed.

PLAYLIST
--------

When we want to play tracks, albums, playlists etc., the information is all now available and simply needs to be copied to the 'Playlist' area of memory. It doesn't need to be checked again until each track is due to be played. If that track doesn't exist, skip it until you find one that does. When the playlist needs to be saved, the only information needed is ID, Sequence number, Title, Album, Artist and file location.

ERROR REPORTING
---------------

During the scan process, Playlists could be checked for any tracks which no longer exist. These lines could easily be removed from the list and an error report could be written to the log.

In all of the above, I am assuming the recommendations I have made regarding Many-to-Many tables and Indexes have been applied as I can't see any of these methods working efficiently without them.

Seriously, if you want to discuss this further offline and you are a Slim Devices employee, please get in touch through my e-mail address in the forum database. I'd be delighted to help.

Comment 7 Pete Larrett 2008-05-13 22:45:17 UTC

Is anything happening on this?

I've been thinking about it and I realise that any change to the structure would result in a major re-write to the system but, given the performance at run-time, I think I may have a suggestion which would alleviate things.

Firstly, during a scan, the system builds the catalog. This is extensive for many combinations of usage. It takes many hours and, if that is the way it has to be, OK.

When we build a playlist, the catalog is accessed and this, also, takes a very long time. The main thing to notice is that, the longer the playlist, the time taken to add track or albums increases exponentially. This indicates, to me, that there are table scans taking place which get longer and longer. This means that the system is not scalable.

Lastly, at run-time, the playlist takes a VERY long time to load (again, size dependant) and, I believe, is STILL accessing and processing against the Catalog.

My suggestion is:

De-couple the Playlist table from the rest of the database. Make it a 'Note Pad' type of table which contains the following columns = Playlist name, Sequence number, Artist, Album, Track, Artwork id, File name. Put an Index on the first 4 columns and, after that, no other table should be required for listening to music.

The second recommendation MIGHT help the building of Playlists, given that a full re-design is probably out of the question (at this time). If an Index is added to both columns in all of the Many-to-Many tables and then an Index is added to the corresponding columns in the Parent tables, the SQL used to build the Playlists will not perform Table Scans and performance should be improved greatly with no code changes. The overhead to this will be the maintenance of the Indexes during the Scan but, since that already takes so long, it won't make a lot of difference.
Edit/Delete Message

Comment 8 Pete Larrett 2008-05-14 02:17:27 UTC

I know there was some criticism of my use of AAC (*.m4a) file types in one of the forums. I only used this because they all originate from iTunes and I couldn't find a sensible way to convert them while maintaining the file structure. anyway, I think I have found a piece of software which will do this for me and I wanted to know what the opinion is for the best quality file type I can convert to that can be handled easily by Squeezecenter?

Any suggestions? FLAC seems to be preferable to me but I could be wrong.

Comment 9 Pete Larrett 2008-05-16 02:28:09 UTC

I downloaded the 14th May nightly update and it seemed to be extremely well behaved. I noticed today that a 'final' release has been posted on Slimdevices for the ReadyNas and I've just installed that, along with the plugin pack and the Multi-player plugin pack (both of which have been impressive so far but I would advise turning off the ones you don't need).

For the first time in ages (based on the 14th May version) I'm quite optimistic here. It feels like a lot has been done and I'll report my findings back.

The only downer is that I now have to wait several hours for the initial library scan to complete before I can really try it out but the radio stations are working fine and the synchronization seems good.

More later....

Comment 10 Pete Larrett 2008-05-18 15:05:10 UTC

The latest version on SqueezeCenter for ReadyNAS has now been running all weekend and is a significant improvement on all previous verions in many respects. There are still problems but I'll summarize them here:

[list=]Stuttering - I've not had any stuttering/rebuffering problems with this at all. A BIG bonus and very welcome.[/list]
[list=]Network disconnection - One of my Squeezeboxes DID suddenly disconnect and I had to connect it via wireless instead of ethernet. This really needs to be addressed.[/list]
[list]Playlist build - this really hasn't changed at all. There is no sign of any improvement. Just make sure you have some wet paint nearby to watch dry.[/list]
[list]Sound quality - no problems with that. It doesn't seem to have any problems with my AAC-based library, either[/list]

Overall, well done but still not perfect. PLEASE get this performance issue sorted for Release 7.1 (or even 8).

Comment 11 Pete Larrett 2008-05-23 01:06:42 UTC

OK, I've had some REALLY good results from the database changes I've made to the SqueezeCenter database on my Mac. so good, in fact, that I can't wait to apply them to the ReadyNAS. The only problem is, I can't connect to it. Does anybody know the host name, port number, socket name, user-id and password I should use to connect MySQL administrator to it, please? Once I've tested it there, I'll publish the results. They are very good indeed.

Comment 12 Pete Larrett 2008-05-25 15:09:31 UTC

Optimisation of the SqueezeCenter database
------------------------------------------

From the outset, I was not happy with the overall performance of the Squeezebox. Mine is used in conjunction with a ReadyNAS and I was constantly told that this had limitations, which meant it was always going to be ‘slower than on other platforms’. I downloaded the SqueezeCenter software to my Mac PowerBook G4 and found that I still wasn’t overwhelmed – the same problems were there.

My main issue was that, the larger the Playlist, the longer it took to add items to it. This meant that functionality was constrained by size, which should never be an issue in a well-designed database. I should state here that I am a professional Mainframe and mid-range Database Administrator (DB2, IDMS, SQL Server and a bit of Oracle).

I downloaded MySQL Administrator and Query Browser and also switched on SQL Logging in SqueezeCenter.

Firstly, with Administrator, I looked at the tables and their Indexes. This immediately rang alarm bells. Among the issues I found were:

• Several Many-to-Many tables – These are always going to cause problems in a database, as they are an I-O overhead to maintain and, without careful indexing, to reference.
• The Many-to-Many tables all had Foreign Keys (RI) to their parent tables. This is pointless in a database that is not heavily transactional. Worse still, they all had ‘Cascade on Delete’ parameters. The fact that RI exists plus the Cascade parameter creates an enormous I-O overhead on the system.
• Nearly all of the tables had inadequate Indexing for the Queries being run. This creates even more I-O and adds to the performance problems.
• Within the code itself, there is huge repetition of effort and checking of the Catalog tables at times when no such checking is required for normal functioning.

After this, I ran a full Scan. I saved the scan log from this and trimmed it down to retrieve all of the SQL used during the different parts of the process. I then created, saved, loaded and cleared a Playlist. I saved the server log and trimmed it in the same way as the scan log.

I saved a representative sample of 40 SQL statements that were run regularly and used the MySQL Query Browser to EXPLAIN them. This showed the access paths used when each was run.

My actions were limited in that I didn’t want to change any of the code (I’m a DBA, not a programmer) and, therefore, I could only provide fixes that made the database fit the programs as they are now. If any programming changes recommended later are applied, the database can be changed accordingly.

The first thing to remove was the RI (Foreign Keys). These served no practical purpose and were a performance overhead. I did this and ran various tests with no adverse effects.

Secondly, I applied Indexes to the Many-to-Many tables (these only have 2-3 columns) in EVERY possible permutation. For instance, where there are columns A, B and C, I created indexes for ABC, ACB, BAC, BCA, CAB and CBA. Obviously, this is well overdone but, without access to the code, the overhead of creating these at the time of the scan was far less than that of creating the RI and the potential benefits when the database is READ were huge. I also removed the existing, single column, Indexes.

Next, I looked at the 40 SQL statements I had saved and ensured each table had an appropriate index for optimum retrieval for each statement. The only other Index I added which had a big most impact was:

• CONTRIBUTOR on the Albums table

Having carried out all of the work on the tables, indexes and RI, I used MySQL Administrator to Optimise the tables. As so many new indexes had been created, this seemed like a good idea to make sure they were properly organised.

I then ran all of the sample statements through MySQL Query Browser to compare the results. All of the statements returned the same result sets except they were visibly faster and the access paths were greatly improved.

Following this, I ran a full library scan again and created, saved, cleared, loaded and played a Playlist. The difference was remarkable. The whole system was MUCH less clunky and had no loss of functionality or any errors.

Recommendations
---------------

1. I Strongly recommend the removal of all Foreign Keys in the database. These are an overhead providing a database facility that is inappropriate for this application.
2. I also recommend a complete overhaul of the table indexes – especially on the Many-to-Many tables. This can, obviously, be carried out more easily with full vision of the code (which I still don’t seem to have) but, in many cases, making the Index fit the SQL in the log is not a bad place to start.
3. Is it possible to add a database function to the Settings panels that would instigate a MySQL Optimisation? Alternatively, can this be done at the end of the scan process? Thinking about this, the scan process DOES talk of Optimisation as it’s last step but I’m not sure what it means by this.
4. Is it possible to switch off artwork? A lot of time is spent in the programs chasing artwork that may or may not be there and the only place you normally see it is in the web interface. Personally, I’m not bothered and if the system can work more efficiently without it, I’d rather have this option.

The application
---------------

As I said earlier, all of the above database changes were made without changing any of the code. I did notice, though, that the strategies of the programs could be modified to produce the same results much more efficiently.

1. Library scan – when a scan takes place, it creates a Catalog of tables that are used later when creating and running Playlists. The reason for scanning is to take into account and new or deleted music, Playlists etc. This takes a considerable time to run but, if it supports the system later, it has to be done. It does mean, however, that the music currently being played can be disrupted – almost to the point of being unlistenable. Is there any way that the scan can be run in it’s own memory address space to keep it separate from the part of the application which is playing the music (or vice versa)?
2. Creating Playlists - When creating a Playlist, the programs effectively carry out ‘mini-scans’ for EVERY track to make sure they still exist!! This is really where the performance goes downhill once the database access issues have been resolved. Why is this done again? We already have the Catalog tables from the last scan and all we need to create the Playlist is the URL and the artist/album/track etc. details. If the physical files have been deleted, we shouldn’t care at this stage. We are just building a ‘shopping list’ of tracks to play in the future based on our latest Catalog. Adopting this strategy will cut out a whole load of I-O activity.
3. Loading Playlists - When loading a Playlist that has already been created, the ‘mini-scan’ process described in ‘2’ is run again!!! Really, how often do you want to scan this database before you actually PLAY SOME MUSIC!!!!
4. Saving Playlists – Guess what? When we save a Playlist, we perform yet another ‘mini-scan’. Stop messing about. Just save what you have got.
5. Playing Playlists – at the time of playing, the application checks to see if the file exists (as if it hasn’t checked enough times before). If it exists, it plays it. If it doesn’t, it moves on to the next – although sometimes it hangs.

Let’s get this into perspective – taking the shopping list analogy. You need to go to the supermarket to get some food. You know the stuff that they sell so you make a list of what you want before you go. You go to the shelves for the items you want and, if they haven’t got one of them, you move on to the next item on the list. That is, in my opinion, exactly the strategy that SqueezeCenter should be adopting. With the current setup, you make your list, phone the store to see if they have everything in stock, go to the store, check with customer services to see if they still have it all and then go to the shelves to get it (or not, if you’ve spent too much time checking and, in the meantime, someone else has taken it!).

Conclusion
----------

This has taken a lot of work in the face of many on these forums taking the attitude that ‘If you’re running a ReadyNAS, what do you expect’! Maybe it’s not the fastest piece of kit on the market but I knew that when I used it for backup and retrieval purposes it showed no problems with response times. Therefore. The software had to be – at least – partly to blame. What I have presented here is only a workaround solution but it gives a lot of pointers to the kind of changes the developers should be focussing on in the future. Remember, the whole idea of this kit is to PLAY MUSIC – not to endlessly wait for an unresponsive machine to maintain an over-engineered database.

Caveat
------

If you feel confident to make these changes yourself, good luck. I have had nothing but good results from them, so far. However, I take no responsibility for any loss of data or service from your setup. Let’s hope the developers take notice of this and incorporate these, or similar, improvements into the released versions of the software.

Comment 13 Andy Grundman 2008-06-12 12:10:02 UTC

This should be looked at when designing the new schema.

Comment 14 Walker LaRon 2009-06-30 13:00:36 UTC

Ticket #090629-000097


***From the customer****

Have the developers looked at the following bug report?
It seems that a qualified DBA has looked into improving the sqqueezeserver pereformance on the SPARC CPU class of Readynas products.
He claims to have made significant performance improvements by modifying the databases.
Please look into this for a 7.4 release as the current speed of the product is very disappointing.

Thanks,
Davide

*************

Comment 15 Andy Grundman 2009-06-30 13:13:42 UTC

The Sparc ReadyNas is slow because of the CPU, our database design isn't the problem.

Comment 16 Michael Herger 2014-08-08 15:38:49 UTC

This report is no longer valid. The schema has changed several times, DB backend has been replaced, and a lot of optimization applied to the DB work.