Bug 13535 - wlan kernel oops (was sometimes baby UI is painfully slow)
: wlan kernel oops (was sometimes baby UI is painfully slow)
Status: CLOSED FIXED
Product: SB Radio
Classification: Unclassified
Component: Menus
: Include FW version in comment
: PC Windows XP
: P1 critical (vote)
: 7.4.0
Assigned To: Richard Titmuss
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-20 12:28 UTC by Ross Levine
Modified: 2010-05-27 14:46 UTC (History)
8 users (show)

See Also:
Category: ---


Attachments
Rico's log (98.11 KB, text/plain)
2009-08-24 11:09 UTC, Wadzinski Tom
Details
2nd log (149.71 KB, text/x-log)
2009-08-24 14:36 UTC, Enrico Principi
Details
Long log incl. kernel oops & watchdog (69.27 KB, text/x-log)
2009-08-24 16:27 UTC, Enrico Principi
Details
console output (3.47 KB, text/plain)
2009-08-25 18:11 UTC, Ross Levine
Details
packet capture (112.96 KB, application/octet-stream)
2009-08-25 18:18 UTC, Ross Levine
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Levine 2009-08-20 12:28:14 UTC
I see this maybe once per day, and so far haven't been able to make a connection to anything. Sometimes the UI takes more than a few seconds to respond to scrolling. Rebooting always fixes this. Checking top from serial shows nothing out of the ordinary, jive using ~40% memory and 1-3% CPU.
Comment 1 James Richardson 2009-08-20 13:23:46 UTC
I have experienced this at times as well, but could not reproduce it reliably.
Comment 2 Ben Klaas 2009-08-20 18:13:19 UTC
It doesn't serve this bug well to have it in my court. Unassigning and tagging with bug_meeting.
Comment 3 Felix Mueller 2009-08-24 03:03:55 UTC
I've talked to Richard about this and he said it would be good if you could get a log when this happens and then talk to him.
Comment 4 James Richardson 2009-08-24 09:12:33 UTC
Give Richard a chat on IM when you see this happening, make sure you have serial cable attached.
Comment 5 Wadzinski Tom 2009-08-24 11:09:28 UTC
Created attachment 5676 [details]
Rico's log
Comment 6 Wadzinski Tom 2009-08-24 11:11:32 UTC
Rico saw lagginess (see attached log). He is running a private network with no connection to SN. The lagginess appear after a cold reboot. He said: "10:46:09 is when it recovers", which in the log is when the idle disconnects to the SCs occurs.
Comment 7 Ross Levine 2009-08-24 13:32:13 UTC
tail -f /var/log/messages ? or some other logging?
Comment 8 Enrico Principi 2009-08-24 14:36:15 UTC
Created attachment 5678 [details]
2nd log

Happened again during boot & reconnect to wireless interface.  
Connected power supply, removed battery, connected serial, no shell prompt, yet there was feedback from plugging in ethernet.
Wading through the slow menus I'm able to get it to connect via ethernet, whereupon baby again recovers and the UI is snappy.
Looks like udhcpc dies several times and respawns, finally gets a proper lease around 7:40, and everything goes back to normal, except the serial console still won't put up a prompt.
Comment 9 Enrico Principi 2009-08-24 16:27:33 UTC
Created attachment 5680 [details]
Long log incl. kernel oops & watchdog

Once more, this time serial tty was enabled at bootup.  I've seen this kernel oops before and not had any ill effects so I'm not sure it's related.
However line 811 shows watchdog finally kicking in before I had a chance to switch to wired networking.
For some reason baby cannot get a dhcp lease but wants to resume playback..?
Unfortunately as before top was not helpful here, maybe a wireshark capture would be handy if it happens again.
Comment 10 Ross Levine 2009-08-25 18:11:12 UTC
Created attachment 5692 [details]
console output

With the D-link WBR-1310 I'm able to reproduce this consistently from a factory reset state using WPA2, Cipher type Auto (TKIP / AES), PSK 8~63 ASCII.
Comment 11 Ross Levine 2009-08-25 18:18:39 UTC
Created attachment 5693 [details]
packet capture

Packet capture while reproducing this bug, see packets 512 through ~524.
Comment 12 Ross Levine 2009-08-25 18:20:11 UTC
Felix do you need anything else from QA? If you would like this router shipped to you just say the word. :-)

Thank you Rico!
Comment 13 Felix Mueller 2009-08-25 23:23:09 UTC
Rico: Thanks, not yet.

Richard: Some of the logs show the kernel oops you are looking into with Atheros.
Comment 14 Ross Levine 2009-09-03 17:19:01 UTC
r7360 reproduces this consistently while trying to connect via wireless to linksys WRT610N WPA2.
Comment 15 Ross Levine 2009-09-03 17:32:38 UTC
r7406 reproduces this consistently with D-link DIR655. 

Tempting to upgrade the severity of this bug, in some ways it is blocking (or at least painfully slowing) router testing for baby.
Comment 16 Chris Owens 2009-09-03 17:59:24 UTC
I was just talking to Ross about how slow this makes router testing, and I'm raising the severity.  

I know that Richard and Felix have plenty of work, and it's all high priority, but this is my vote that this bug is definitely making QA's life much more difficult.

Please schedule your work with our pain in mind.  :)
Comment 17 SVN Bot 2009-09-04 10:21:51 UTC
 == Auto-comment from SVN commit #7416 to the jive repo by richard ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7416 ==

Bug #13535
Fix atheros driver for RT Linux.
Comment 18 Ross Levine 2009-09-04 15:01:07 UTC
Richard made some amazing progress on this, I can no longer reproduce. He mentioned wanting to track this down further so we'll leave this bug open. 

Thanks Richard, this helps me a lot.
Comment 19 Felix Mueller 2009-09-05 01:26:12 UTC
Sorry for the late feedback, but I had a hard time yesterday to reproduce with r7406. For some strange reason I couldn't make it happen.

But this morning, when I re-tried with r7406, the issue occurred twice out of 20 attempts.

After updating to r7418 it did not happen once in 20 attempts. Looks good I think.
Comment 20 Richard Titmuss 2009-09-07 05:40:31 UTC
Further investigation appears to suggest the additional error I was seeing is caused by using LOCKDEP kernel debugging with loadable modules. I don't think this is a problem when the debugging is disabled.

Based on the comments from Ross and Felix this is now confirmed fixed.
Comment 21 James Richardson 2009-10-06 13:16:56 UTC
This bug has been marked as fixed in the 7.4.0 release version of SqueezeBox Server!
    * SqueezeCenter: 28672
    * Squeezebox 2 and 3: 130
    * Transporter: 80
    * Receiver: 65
    * Boom: 50
    * Controller: 7790
    * Radio: 7790  

Please see the Release Notes for all the details: http://wiki.slimdevices.com/index.php/Release_Notes

If you haven't already, please download and install the new version from http://www.logitechsqueezebox.com/support/download-squeezebox-server.html

If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look.
Comment 22 Chris Owens 2010-05-27 14:46:11 UTC
These bugs have all been marked resolved and belong to a component which is being removed.  Therefore they have been moved to the most applicable of the new components.