Bugzilla – Bug 16658
alarm can sometimes not be silenced due to OOM error when checking if headphones are connected
Last modified: 2011-05-23 09:49:57 UTC
Created attachment 7030 [details] remove grep/sed from popen command and process amixer output natively in Lua this bug was submitted by a forum user and the user supplied a patch for evaluation. recommend this patch is given a thorough testing and checked in before 7.6.0 release. testing notes: if possible, reproduce OOM error before applying patch. include testing for correct behavior with the headphones in, and with no headphones attached, to confirm that the correct state is determined in each case. if OOM error was reproduced before patch, confirm that after patch the removal of grep/sed from the popopen solves the OOM error.
The patch doesn't solve the problem, I guess it may have only made it occur slightly later than usual. Context, Radio was up for about a day, and had been playing Pandora for a couple of hours. # uptime 21:50:54 up 1 day, 2:04, load average: 0.49, 0.36, 0.29 Installed applets: Custom Clock Installed apps: Pandora, Slacker, Sounds & Effects When the OOM occurred the alarm had just interrupted playback, and I selected 'turn off alarm' about 9 seconds after it had started. The alarm popup disappeared but it kept playing, pressing pause silenced it. The following is the memory usage about 1 1/2 hours after the alarm had gone off, the radio has been idle since then. It is low on memory but not extreme, clearly there is 2MB free + a bunch of recoverable buffer memory. According to /proc/meminfo there is 15MB of pagecache. There is no dmesg output indicating that the kernel OOM killer did anything so it may be a lua/jive thing. # free total used free shared buffers Mem: 62104 60204 1900 0 5648 Swap: 0 0 0 Total: 62104 60204 1900 I'll attach relevant parts from /var/log/messages and the current process list. The only thing I noticed there is that the jive process went from about 25MB after bootup to 40MB. The RSS is over 34MB so I don't think this is a memory leak.
Created attachment 7031 [details] relevant part of /var/log/messages from the SB Radio
Created attachment 7032 [details] ps auxwww output, about 1 1/2 hours after OOM event.
Created attachment 7033 [details] alarmsnoozeapplet fragment Line numbers in the /var/log/messages trace are off because of my attempt to fix AlarmSnoozeApplet.lua.
I've been digging further into this. It turns out that the OOM happens fairly frequently and causes things like, - when the alarm is 'turned off' through the popup, the sound keeps playing until either pause is pressed or the device is powered off. - when the alarm turns off because of the automatic timeout, the fallback alarm activates about 10 seconds later. Then even when the fallback alarm turns off some thread keeps logging alarm state updates every 10 seconds or so. After the system is rebooted it works fine for about a day or two. After that it consistently fails with an ENOMEM error in io.popen. Turns out that after a reboot the jive process size is about 26-30MB, after a day or two the process size has grown to about 40-42MB (currently 34796MB RSS). My current theory is that since the SB Radio has no swap configured and only 64MB of ram and it looks like /proc/sys/vm/overcommit_memory is disabled, the kernel does not allow the 40MB jive process to fork. The alarm failed this morning and I have not rebooted, I just enabled overcommit_memory and I'll report if the ENOMEM still happens when the alarm goes off tonight.
After your next alarm test, I'd also be interested if you can reproduce the OOM issue without Custom Clock applet installed. (the apps don't matter, as they are not really apps at all but pointers to server-side resources)
Sure, no problem. I'm not using very fancy settings, but looking at the Custom Clock applet code it is unclear if it is trying to cache anything in non-weak tables. From the looks of it it only tries to cache superdatetime related data which I am not using. But after tonight I'll run for a few days without it just to make sure.
> The alarm failed this morning and I have not rebooted, I just enabled > overcommit_memory and I'll report if the ENOMEM still happens when the alarm > goes off tonight. Alarm went off as normal, turned it off through the popup menu with no problems. No fallback alarm, no OOM errors in /var/log/messages, everything worked as it was supposed to. jive process size is definitely still over 40MB, # ps auxwww | grep jive root 742 36.8 56.1 40700 34864 ? Sl Nov15 2086:24 /usr/bin/jive I'm running a reinstall of the firmware (7.5.1 r9218) and will run a couple of days with an unmodified setup (besides activating ssh) to see if the OOM errors can be reproduced.
(In reply to comment #8) > I'm running a reinstall of the firmware (7.5.1 r9218) and will run a couple of > days with an unmodified setup (besides activating ssh) to see if the OOM errors > can be reproduced. Oddly enough the reinstall didn't actually seem to have done anything. The custom clock applet was still enabled. Disabled it by hand and the radio rebooted. After boot the jive process was a little under 26MB and has slowly been growing since then. At some point the OOM errors started happening again, trying to turn off the alarm with the on-screen display would leave it running until pause was used to stop the alarm playback. If I let the alarm run and wait for it to automatically dismiss and the fallback alarm kicks in a little while later. So overall, yes it uses about 7MB less compared to with Custom Clock installed, but it is just big enough to cause memory allocation errors on fork because the jive process size is >50% of the available memory and overcommit is disabled. Shortly after boot, root 741 31.0 22.9 25952 14224 ? Sl 20:49 1:53 /usr/bin/jive After starting Pandora, root 741 32.1 37.2 28268 23144 ? Sl 20:49 2:29 /usr/bin/jive The current process size (about 3 days later), root 741 35.5 66.3 47180 41176 ? Sl Nov19 1387:23 /usr/bin/jive I'm a little surprised about the current size, I thought it had topped out around 33MB as the artwork cache filled up and didn't check for the past day or so, not sure why it is so much more now.
Jan, what's the status with this? Are you still seeing OOM problems?
On 01/14/2011 12:59 PM, bugs@bugs.slimdevices.com wrote: > > Jan, what's the status with this? Are you still seeing OOM problems? Yes, the 'visible' effect is that most of the time when the alarm is turned off the music will keep playing. If the alarm stops because the timer expires (I have it set for one minute) and it does stop playing the backup alarm often kicks in about 10 seconds later. By default Linux uses some heuristic based on available memory and swap space to figure out if there is sufficient memory to fork a process. This can be disabled by changing the value of /proc/sys/vm/overcommit_memory in which case fork (and popen) do succeed. However I don't consider this an appropriate long-term fix. I've looked at the code and there are actually only very few places where squeezeplay forks and it should be possible to avoid these forks. The headphone switch already generates a jive event which I don't think is tracked at the moment but if it were there is no need to fork off a helper process to check the state. Additionally exposing the necessary mixer functions to enable/disable the headphone output as native Jive operations should, as far as I can tell, eliminate the remaining places where squeezeplay forks. That is about as far as I got into looking at this. Jan
== Auto-comment from SVN commit #9296 to the jive repo by ayoung == == http://svn.slimdevices.com/jive?view=revision&revision=9296 == Fixed bug 16658: alarm can sometimes not be silenced due to OOM error when checking if headphones are connected Avoid use of external amixer command for manipulating audio endpoint by extending facilities already available in SqueezeboxBabyApplet. Generalize code and comments so that mechanism is not specific to Baby.
I did not use a pcall() or appletManager:hasService() on the call to the overrideAudioEndpoint service because, looking at other instances of product-specific services, they also are not protected. I did not test the preceding change on Fab4 - will do so when it is built with the next nightly.
Been verified by plugging in headphones. Please reopen if seen again.