Bug 12602 - Random crashes (was In Store demo crashes)
: Random crashes (was In Store demo crashes)
Status: RESOLVED FIXED
Product: SB Radio
Classification: Unclassified
Component: Setup
: Include FW version in comment
: PC Other
: P1 normal (vote)
: MP
Assigned To: Richard Titmuss
:
Depends on:
Blocks: 12978
  Show dependency treegraph
 
Reported: 2009-06-30 06:45 UTC by Ben Klaas
Modified: 2009-09-08 09:29 UTC (History)
1 user (show)

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Klaas 2009-06-30 06:45:45 UTC
I did a little burn-in test of the demo applet last night, trying to run it overnight without issue. I turned the volume down on the audio clip (so I wouldn't go crazy). About 2 or 3 hours in, the volume jumped up from full mute to about 50%, which also happens to be the initial volume of the audio loop.

Not sure what might cause this, or whether it's even something we need to worry about. Assigning to Richard for comment.
Comment 1 Richard Titmuss 2009-07-01 08:43:55 UTC
maybe the system rebooted?
Comment 2 Ben Klaas 2009-07-01 08:52:46 UTC
that's very possible, I was in the other room when it jumped up so I wouldn't have seen it reboot.
Comment 3 Blackketter Dean 2009-07-01 08:57:26 UTC
Shouldn't the volume be preserved across reboots?
Comment 4 Richard Titmuss 2009-07-01 13:44:27 UTC
yes, just seen this here. it crashed and rebooted.
Comment 5 Ben Klaas 2009-07-01 13:47:08 UTC
I can look to saving the current volume of the demo as a pref and have it reset to that on reboot, but obviously the big problem is the crash, which we'll likely not want to ever happen in a store :)
Comment 6 Richard Titmuss 2009-07-05 15:20:28 UTC
I've had three units running all weekend, and can confirm the volume changes after a crash/reboot. I'll continue to investigate.
Comment 7 Richard Titmuss 2009-07-06 01:07:35 UTC
I now have the three units running again:
1. no changes
2. no watchdog
3. no wireless driver

I'll be keeping a tally over the next couple of days to see how many times they reboot.
Comment 8 Pat Ransil 2009-07-20 14:41:42 UTC
Richard says: still investigating. It only crashes 2/3 times a day, and I'm 
trying to get some useful debug out of it. I have an angle of attack, 
but due to the infrequent crashes it's slow progress.
Comment 9 Richard Titmuss 2009-07-30 03:38:47 UTC
I've estimated a week, but it may only be hours. Just depends on that lucky break :)
Comment 10 SVN Bot 2009-08-11 12:20:28 UTC
 == Auto-comment from SVN commit #7013 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7013 ==

Bug #12602
Added a hack that might work around the random crashes/lockups. If this works it gives a clue to the error, if not it should be harmless.
Comment 11 Spies Steven 2009-08-12 11:29:17 UTC
Tried r7016 on 12 PB2 babies and no change to the random crashes.

 11:22:20 up  9:29, load average: 0.32, 0.41, 0.53
 11:22:28 up  8:01, load average: 0.31, 0.32, 0.30
 11:22:33 up  4:15, load average: 0.48, 0.35, 0.26
 11:22:34 up 17:14, load average: 0.36, 0.28, 0.26
 11:22:36 up 15:27, load average: 0.57, 0.35, 0.28
 07:42:06 up 20:36, load average: 0.20, 0.35, 0.32
 11:22:43 up 10:23, load average: 0.19, 0.26, 0.26
 11:22:47 up 15:27, load average: 0.48, 0.33, 0.28
 11:22:53 up  2:22, load average: 0.20, 0.28, 0.26
 11:22:56 up  7:43, load average: 0.61, 0.79, 0.86
 16:16:28 up 20:36, load average: 0.82, 0.89, 1.01
 11:23:00 up 20:36, load average: 1.16, 0.95, 0.80
 11:23:04 up 20:36, load average: 0.29, 0.37, 0.36

So far only only 4 of the 12 babies (20:36) have not rebooted since loading r7016.  This is similar to previous tests so far.  Not sure why two of them have the wrong time.
Comment 12 SVN Bot 2009-08-18 09:32:54 UTC
 == Auto-comment from SVN commit #7131 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7131 ==

Bug #12602
Updates to non RT kernel patches (just for testing).
Comment 13 SVN Bot 2009-08-24 12:03:35 UTC
 == Auto-comment from SVN commit #7237 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7237 ==

Bug #12602
Possible workaround for random crashes.
Comment 14 SVN Bot 2009-08-25 08:55:46 UTC
 == Auto-comment from SVN commit #7260 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7260 ==

Bug #12602
Crasher workaround from trunk.
Comment 15 Richard Titmuss 2009-08-25 08:56:32 UTC
Significant progress has now been made to understanding the cause of the crashes. Updating the hours left to reflect this.
Comment 16 SVN Bot 2009-08-26 15:45:27 UTC
 == Auto-comment from SVN commit #7273 to the jive repo by ccrome ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7273 ==

Bug #12602.  multi-read workaround for random crash problem.  Wastes a bunch of cycles, but should work.
Comment 17 SVN Bot 2009-08-26 16:44:40 UTC
 == Auto-comment from SVN commit #7275 to the jive repo by ccrome ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7275 ==

Bug #12602
Comment 18 Caleb Crome 2009-08-27 07:02:25 UTC
So far so good!  I've been pinging 8 babies for nearly 13 hours.  Not a single hang or reboot.

 12:45:51 up 12:45, load average: 0.32, 0.35, 0.38
 12:46:48 up 12:46, load average: 0.33, 0.42, 0.39
 12:46:43 up 12:46, load average: 0.21, 0.35, 0.36
 12:46:42 up 12:46, load average: 1.17, 1.73, 1.85
 12:46:40 up 12:46, load average: 1.05, 0.70, 0.61
 12:46:40 up 12:46, load average: 0.72, 0.77, 0.92
 12:46:42 up 12:46, load average: 1.28, 1.03, 1.04
 12:46:38 up 12:46, load average: 0.98, 0.69, 0.74
Comment 19 SVN Bot 2009-08-27 16:01:31 UTC
 == Auto-comment from SVN commit #7291 to the jive repo by ccrome ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7291 ==

Bug #12602.  Richard's 'real' fix for the crasher.  We still need to get verification from freescale that the EPIT is in fact synchronous, and will never read incorrect values.
Comment 20 SVN Bot 2009-08-28 03:35:18 UTC
 == Auto-comment from SVN commit #7296 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7296 ==

Bug #12602
The GPT driver used as the kernel time source is suffers from unreliable reads from the counter. This adds a EPIT 
driver, than provides an alternative clock. Testing so far shows that the EPIT does not have the unreliable 
counter reads seen with the GPT.
Comment 21 SVN Bot 2009-08-28 03:41:39 UTC
 == Auto-comment from SVN commit #7297 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7297 ==

Bug #12602
Improved GPT driver (now unused)
Updated canary patch to work with previous changes.
Comment 22 SVN Bot 2009-08-28 04:05:56 UTC
 == Auto-comment from SVN commit #7298 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7298 ==

Bug #12602
Added comment based on Remy's suggestions.
Comment 23 SVN Bot 2009-08-28 05:18:38 UTC
 == Auto-comment from SVN commit #7299 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7299 ==

Bug #12602
Turn off debug.
Comment 24 SVN Bot 2009-09-01 04:06:43 UTC
 == Auto-comment from SVN commit #7352 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7352 ==

Bug #12602
Added epit clocksource patch to MP2 firmware, this fixes the random crashes.
Comment 25 Richard Titmuss 2009-09-02 01:04:39 UTC
I'm happy to say, that this now seems to be FIXED!