Bugzilla – Bug 15915
Out of memory error and crash after running ~15 hours with 4+ players
Last modified: 2010-03-30 14:12:49 UTC
Have Fab4/TinySC running with 4 players streaming MP3 files from attached WD 1.5TB USB self-powered drive. (2 Boom, 2 SB3, Fab4 is PB3) Started all 4 players streaming MP3 (different Harry Potter Audio CDs), with Fab4 itself continuously playing another Harry Potter MP3 story. Started 6:30 pm, and appeared to crash with "System Error" screen at around 11:00 pm. Have crashlog file, which says it's out of memory. File is attached. This is second time in 2 days this has happened. For some reason, I ignored the first failure. Guess I shouldn't have! Was able to play this configuration continuously last weekend, but not this week. Test was stopped when building experienced power outage last Monday March 15 and I then updated firmware. Now on r8656. Not sure what firmware version was running last weekend. Maybe 863x? Now I can't get it to stay up for 24 hours. Please review crashlog and suggest additional debug switches to set.
Created attachment 6665 [details] Crash log
Created attachment 6666 [details] Messages log file -- doesn't appear to have info, but just in case ....
Two observations: not sure if either is significant. 1. ntfs-g3 was being used. 2. Both TinySC (slimpserver.pl) and SP (jive) were selected to be killed.
Good point. One thing different in this configuration versus last weekend is the use of the 1.5TB drive in NTFS format. Last weekend it was using a different drive (IDE 120GB 4200 RPM with PATA-USB converter) but it was also an NTFS drive. Not sure why a different drive would make a difference, but I'll try it and see.
Swapped out with different drive -- Hitachi 250GB USB Simple Drive Mini. Still running same config after 16.5 hours. No obvious memory issues from free or cat /proc/meminfo commands.
Still running after 72 hours with Hitachi 250GB drive. Running firmware r8660.
This Hitachi drive is also preformatted in FAT32, not NTFS. Ran the test for days on the PATA drive in NTFS v3.0 format (formatted using Windows 2000), so possibly an issue with NTFS on 1.5TB WD drive? Maybe WD drive in NTFS v3.1 format?
Mickey - have you been playing music on the fab4 itself too? If not: what screensaver are you using? I noticed SBS on my fab4 quitting too. But only on of my two devices. Both are running SBS accessing a 2.5" disk with a few thousand tracks. The biggest difference between the two is one uses the ImageViewer applet to show local image files as a screensaver. Checking memory this morning, jive on that device was using 45-50MB, while the other one (running the clock saver) was at around 30MB. I wonder whether ImageViewer is leaking memory. You aren't using it as your screensaver by chance?
Mickey, what means are you using to build the playlists and how big are they? I ran a 48hr test with 4 players: fab4 + 3 x baby. Each had a single-album MP3 playlist. TinySC was using an SD-card for the library. No leak. Will now run test with ip3k-display players attached.
In the bug meeting today, speculation is that it's some combination of NTFS and a large drive. If we push this to 7.5.x, it should be P1 there.
The following link describes how the kernel decides which application to kill: http://linux-mm.org/OOM_Killer
Created attachment 6681 [details] crashlog - after about 10h idling with the ImageViewer I didn't see a steady growth of memory usage, but rather an up and down (with larger or smaller images). For about 7-8h I thought it was quite stable. But when SBS got killed, jive had grown to about 50MB. There obviously is something wrong with it. It eventually died when it should have resized a 6MP image. Famous last words (more to be found in the attached log file): Mar 22 19:11:35 squeezeplay: DEBUG applet.ImageViewer - ImageViewerApplet.lua:550 image rendering Mar 22 19:11:35 squeezeplay: INFO applet.ImageViewer - ImageSourceLocalStorage.lua:141 Next image in queue: /media/sda1/images/gegenlicht.jpg Mar 22 19:11:39 kernel: jive invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 Mar 22 19:11:39 kernel: [<c02f6bfc>]
Here is the path to the watchdog source code: 7.5/trunk/squeezeos/poky/meta-squeezeos/packages/watchdog which gets installed to: 7.5/trunk/squeezeos/poky/build/tmp-fab4/work/armv6-none-linux-gnueabi/watchdog-5.6-r5/watchdog-5.6 There is a watchdog configuration file under /etc that indicates the behavior of the watchdog at run-time. Here is the path to the watchdog config file in the source tree. 7.5/trunk/squeezeos/poky/meta-squeezeos/packages/base-files/files/watchdog.conf
What are these errors in the logs: Mar 18 10:11:54 squeezeplay: audio_thread_execute:908 xrun (snd_pcm_mmap_commit) err=-32 Mar 18 10:11:54 squeezeplay: audio_thread_execute:798 xrun (snd_pcm_wait) Mar 18 10:11:54 squeezeplay: audio_thread_execute:800 PCM wait failed: Text file busy Mar 18 10:11:54 squeezeplay: audio_thread_execute:752 underrun!!! (at least 10.478 ms long) Mar 18 10:11:54 squeezeplay: audio_thread_execute:798 xrun (snd_pcm_wait) Mar 18 10:11:54 squeezeplay: audio_thread_execute:752 underrun!!! (at least 6812718.075 ms long)
They are to do with use of ALSA. I do not think that they are significant when they occur at the start of playback after a period of not playing.
Have 3 Fab4/TinySC combinations running with r8661. Each has 4 IP3K players. - Hitachi Simpledrive Mini 250GB drive FAT32 - Seagate Expansion 2TB NTFS - Hitachi XL10000 1TB NTFS All are running fine after 43 hours. Original bug happened with WD Elements 1.5 TB drive. Will stop test on 250GB and attach WD Elements drive for retesting.
Mickey can no longer repro. Please reopen if anyone still is seeing an issue!
We should look into our policy of dealing with out of memory situations. I am not sure killing an application by kernel is the best approach.