Bugzilla – Bug 1446
MusicMagic plugin eats memory and cpu
Last modified: 2008-08-18 10:54:16 UTC
using a recent trunk checkout, if I select "use musicmagic" in the server settings, within a day the server will start exibiting drop outs and very slow brower response. Looking at the server I see that slimserver.pl is using something like 98% of the cpu and 75% or more of the memory (and it's not doing a rescan or anything). These numbers don't drop if I leave the server alone for awhile, in fact the memory usage will keep going up. Restarting the server brings everything back to normal levels, but the numbers will start clibming again. If I switch to "don't use musicmagic", then slimserver hovers below 7% cpu while playing (and not rescanning) and memory stays below 5%. I tried this on another machine, and though the numbers differed slightly with the machine specs, the general effect was the same. It also doesn't seem to matter if MMM itself is on the same box or a remote one. I can attach the output from a --d_musicmagic, but that seems to be more about the data being echanged between the two than anything obviously relevent here. I'll try to add more useful info when I've had a chance to poke at it a bit more.
Michael - can you get an strace of the process?
some additional observations and ponderings... On further observation, I have seen the cpu usage (of slimserver.pl) drop to normal levels for a short while and then pop back up pegging it's cpu. This didn't seem to be a problem when I first started playing with the plugin, but the dropouts are what made be notice it, and the SB2 seems immune to the dropouts (yay for buffer), so I can't really say whether it's been happening all along or not. However... when I first launched the mmm it quickly went through and identified everything it could (and marked them green) and I could start using those songs in slim mixes right away. and then it goes back to verify it's assessment of those songs, which is a long process (it's been running for a couple weeks now). There seems to be two parts to this process, fingerprinting and analysis, and the analysis part takes much longer of the two. for about a week, the fingerprint part seemed essentially paused while analysis would grind away at it's cue. a few days ago analysis caught up and fingerprinting started grinding through it's queue again. (I only half grok what mmm is doing under the covers, so forgive me if I'm butchering this discription. This is all based on my understanding of the x/y/z analysis line at the bottom of their window.) Now, I'm not totally sure about how slimserver gets updates from mmm, but if mmm decided that each time it finished a verification it constituted an update (perhaps regardless of whether it's opinion of the track changed) and if slimservers processing that update was more cpu intensive than one would hope for... then perhaps the mmm switching over to fingerprinting mode and churning through tracks faster is why the slim is chewing up cpu more noticibly than it was before? And when it occationally goes back to analysis for things it's just fingerprinted might explain why slim's cpu usage occationally drops to "normal". Just a guess, and it doesn't address the memory leak, but I figured it was worth a mention. anyway... on to the strace...
Created attachment 465 [details] strace of slim using mmm and eating memory/cpu anyway.... I attached strace to the running slim process (let me know if you'd prefer a start to stop version for some reason). The trace was growing fairly quickly, so I didn't let it run too long. Let me know if you need a bigger chunk. This was on a secondary server that only talks to software players, but it's the server that mmm is actually running on, so I can say it's running unmodified code (had to tweak the code on the "real" server to talk to a remote mmm, but the cpu/mem behaviour is similar to what's seen here, even though the cpu is beefier). This is running with revision 3062. No players are actually playing at the time this was taken. There are multiple cpus on this box, so mmm's analysis eating up it's own cpu isn't interfering with silm and what it's doing with it's cpu. This process has been running for a couple days I think, there's enough memory on this box that It takes longer to eat through it, but the effect is still pretty obvious. top - 12:15:55 up 52 days, 19:57, 8 users, load average: 2.20, 2.30, 3.54 Tasks: 101 total, 3 running, 98 sleeping, 0 stopped, 0 zombie Cpu(s): 49.3% us, 0.9% sy, 0.0% ni, 49.0% id, 0.6% wa, 0.0% hi, 0.2% si Mem: 4057572k total, 4039080k used, 18492k free, 103920k buffers Swap: 2097136k total, 117492k used, 1979644k free, 1416724k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28074 michael 25 0 61732 50m 4524 R 99.9 1.3 0:32.22 MusicMagicServi 5844 root 17 0 1926m 1.9g 5948 R 99.0 48.5 1862:33 slimserver.pl 359 root 5 -10 0 0 0 S 0.3 0.0 3:26.41 reiserfs/0 28172 root 15 0 0 0 0 S 0.3 0.0 1:14.03 pdflush 21392 root 16 0 2064 1084 1848 R 0.3 0.0 0:05.58 top 1 root 16 0 1500 480 1344 S 0.0 0.0 0:05.53 init
A thread on the mailing list seems to confirm my suspicion above about what's going on with the cpu. Just to verify, I set the musicmagic reload interval to 3600. Sure enough when the cpu did settle down, it stayed normal for about an hour. Then it shot back up for 20 mins, and then back down again. That doesn't solve the problem, but at least we know what's going on there. Let me know if you want to split this out from the memory leak as sepperate issues.
clearly something is convincing the server that the musicmagic db is changing. howver, you can now set the musicmagic reload interval to 0 to disable the automatic reload completely if you like.
do we have any d_musicmagic output available? It might be good to know where its looping or just to be certain that this is just the result of the rescannign process. I haven't seen this myself, running with 6.1, mmm 1.1.5 on win2k while at work.
some new debug options checked into trunk at change 3107 i'd be interested in seeing d_musicmagic logs from this from startup of both MMM and slimserver. plus, a second log after shutting both down and starting again.
sorry, I've been busy lately and haven't had a chance to poke at this. I do have an output capture of --d_musicmagic that was taken (coincidently) using revision 3108, but I let it run a bit too long, so I'll need to trim it down in size before attaching it. in the meantime, mmm has finally finished analysing my files, which saves me from the cpu problem (at least until I add more music). I'll have to watch it for awhile to see what happens to the memory usage in this state.
np, every bit helps. I'm definitely seeing the rescan on start, and know the cause (not sure how to fix without more info from MusicMagic). What I'm not seeing is the eating of cpu. Both on windows (activeperl) and linux with 1.4G (ish) pc's the scan seems to run without especially overloading the server.
wendell and I have tracked down potential bottlenecks and some changes are included in the new musicmagic 1.1.5.1 beta. Slimserver can check on the MTime for the contents only so that things like ratings and playback count will no longer trigger a rescan. There is also a new getStatus API command that slimserver can now use to only trigger a rescan when musicmagic is idle, avoiding a usage during analysis. hoping to get a patch in for this and bug 1180 tonight for 6.1 users to test (with latest beta of musicmagic)
committed a possible fix to 6.1 at change 3272. we now only scan if musicmagic returns idle from getStatus use cacheid?contents for fileMTime to avoid a rescan from simple internal status changes. please test and confirm if this helps relieve cpu issues, as I was never able to reproduce that problem. You will need the very latest MusicMagic 1.1.5.1 from here: http://music.predixis.com/beta.html will close if there is confirmation
Michael: Is this fixed for you?
Michael?
sorry, I'm not near the server running mmm at the moment. The server in question has been generating some strange messages from slimserver lately, but I haven't had the time or net connection to verify if it's related to mmm or not. I'll be near that machine again after the 17th, and I'll be able to add more useful comments then. sorry for the delay.
Pinging Michael again.
testing it now
This appears to be fixed now, Thanks! Sorry for taking so long to verify.
This bug was marked resolved in Slimserver 6.1, which is several versions ago. If you're still seeing this bug, please re-open it. Thanks!