Bugzilla – Bug 12602
Random crashes (was In Store demo crashes)
Last modified: 2009-09-08 09:29:24 UTC
I did a little burn-in test of the demo applet last night, trying to run it overnight without issue. I turned the volume down on the audio clip (so I wouldn't go crazy). About 2 or 3 hours in, the volume jumped up from full mute to about 50%, which also happens to be the initial volume of the audio loop. Not sure what might cause this, or whether it's even something we need to worry about. Assigning to Richard for comment.
maybe the system rebooted?
that's very possible, I was in the other room when it jumped up so I wouldn't have seen it reboot.
Shouldn't the volume be preserved across reboots?
yes, just seen this here. it crashed and rebooted.
I can look to saving the current volume of the demo as a pref and have it reset to that on reboot, but obviously the big problem is the crash, which we'll likely not want to ever happen in a store :)
I've had three units running all weekend, and can confirm the volume changes after a crash/reboot. I'll continue to investigate.
I now have the three units running again: 1. no changes 2. no watchdog 3. no wireless driver I'll be keeping a tally over the next couple of days to see how many times they reboot.
Richard says: still investigating. It only crashes 2/3 times a day, and I'm trying to get some useful debug out of it. I have an angle of attack, but due to the infrequent crashes it's slow progress.
I've estimated a week, but it may only be hours. Just depends on that lucky break :)
== Auto-comment from SVN commit #7013 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7013 == Bug #12602 Added a hack that might work around the random crashes/lockups. If this works it gives a clue to the error, if not it should be harmless.
Tried r7016 on 12 PB2 babies and no change to the random crashes. 11:22:20 up 9:29, load average: 0.32, 0.41, 0.53 11:22:28 up 8:01, load average: 0.31, 0.32, 0.30 11:22:33 up 4:15, load average: 0.48, 0.35, 0.26 11:22:34 up 17:14, load average: 0.36, 0.28, 0.26 11:22:36 up 15:27, load average: 0.57, 0.35, 0.28 07:42:06 up 20:36, load average: 0.20, 0.35, 0.32 11:22:43 up 10:23, load average: 0.19, 0.26, 0.26 11:22:47 up 15:27, load average: 0.48, 0.33, 0.28 11:22:53 up 2:22, load average: 0.20, 0.28, 0.26 11:22:56 up 7:43, load average: 0.61, 0.79, 0.86 16:16:28 up 20:36, load average: 0.82, 0.89, 1.01 11:23:00 up 20:36, load average: 1.16, 0.95, 0.80 11:23:04 up 20:36, load average: 0.29, 0.37, 0.36 So far only only 4 of the 12 babies (20:36) have not rebooted since loading r7016. This is similar to previous tests so far. Not sure why two of them have the wrong time.
== Auto-comment from SVN commit #7131 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7131 == Bug #12602 Updates to non RT kernel patches (just for testing).
== Auto-comment from SVN commit #7237 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7237 == Bug #12602 Possible workaround for random crashes.
== Auto-comment from SVN commit #7260 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7260 == Bug #12602 Crasher workaround from trunk.
Significant progress has now been made to understanding the cause of the crashes. Updating the hours left to reflect this.
== Auto-comment from SVN commit #7273 to the jive repo by ccrome == == https://svn.slimdevices.com/jive?view=revision&revision=7273 == Bug #12602. multi-read workaround for random crash problem. Wastes a bunch of cycles, but should work.
== Auto-comment from SVN commit #7275 to the jive repo by ccrome == == https://svn.slimdevices.com/jive?view=revision&revision=7275 == Bug #12602
So far so good! I've been pinging 8 babies for nearly 13 hours. Not a single hang or reboot. 12:45:51 up 12:45, load average: 0.32, 0.35, 0.38 12:46:48 up 12:46, load average: 0.33, 0.42, 0.39 12:46:43 up 12:46, load average: 0.21, 0.35, 0.36 12:46:42 up 12:46, load average: 1.17, 1.73, 1.85 12:46:40 up 12:46, load average: 1.05, 0.70, 0.61 12:46:40 up 12:46, load average: 0.72, 0.77, 0.92 12:46:42 up 12:46, load average: 1.28, 1.03, 1.04 12:46:38 up 12:46, load average: 0.98, 0.69, 0.74
== Auto-comment from SVN commit #7291 to the jive repo by ccrome == == https://svn.slimdevices.com/jive?view=revision&revision=7291 == Bug #12602. Richard's 'real' fix for the crasher. We still need to get verification from freescale that the EPIT is in fact synchronous, and will never read incorrect values.
== Auto-comment from SVN commit #7296 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7296 == Bug #12602 The GPT driver used as the kernel time source is suffers from unreliable reads from the counter. This adds a EPIT driver, than provides an alternative clock. Testing so far shows that the EPIT does not have the unreliable counter reads seen with the GPT.
== Auto-comment from SVN commit #7297 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7297 == Bug #12602 Improved GPT driver (now unused) Updated canary patch to work with previous changes.
== Auto-comment from SVN commit #7298 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7298 == Bug #12602 Added comment based on Remy's suggestions.
== Auto-comment from SVN commit #7299 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7299 == Bug #12602 Turn off debug.
== Auto-comment from SVN commit #7352 to the jive repo by richard == == https://svn.slimdevices.com/jive?view=revision&revision=7352 == Bug #12602 Added epit clocksource patch to MP2 firmware, this fixes the random crashes.
I'm happy to say, that this now seems to be FIXED!