Bug 8159 - Controller won't reconnect to network without reboot
: Controller won't reconnect to network without reboot
Status: CLOSED FIXED
Product: SB Controller
Classification: Unclassified
Component: SB Server
: unspecified
: PC Windows XP
: P1 critical with 2 votes (vote)
: 7.3
Assigned To: Ross Levine
http://forums.slimdevices.com/showthr...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-15 02:02 UTC by Patrick Dixon
Modified: 2009-09-08 09:21 UTC (History)
8 users (show)

See Also:
Category: ---


Attachments
jive slimserver log (161.11 KB, application/octet-stream)
2008-09-25 17:00 UTC, Ross Levine
Details
slimserver logging enabled, power management disabled, DHCP (98.99 KB, application/x-zip-compressed)
2008-10-16 15:46 UTC, Ross Levine
Details
static IPs, no DHCP (77.53 KB, application/x-zip-compressed)
2008-10-16 15:47 UTC, Ross Levine
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Dixon 2008-05-15 02:02:33 UTC
jive_7.1_r2435.bin

SqueezeCenter Version: 7.2 - 19660 @ Tue May 13 00:21:14 PDT 2008 - Debian - EN - utf8
Server IP address: 192.168.1.99
Perl Version: 5.8.8 x86_64-linux-gnu-thread-multi
MySQL Version: 5.0.45-Debian_1ubuntu3.3

Platform Architecture: x86_64-linux

If I leave the Controller handset in a part of the house which has no/poor wireless network it looses connection to SC7 and the network.  When I then bring it back within network range and try to use it, it attempts to re-connect but constantly fails.  I have to power off and then back on to get it to reconnect to the network correctly - which it does without any problem.

The wireless network has no security.
Comment 1 Ross Levine 2008-05-15 13:52:14 UTC
Patrick how long are you leaving the controller out of range? I tried but only for 1 minute, and it reconnected just fine for me. 
Comment 2 Patrick Dixon 2008-05-15 14:02:32 UTC
Much longer than that - I put in the charging cradle which is out of range, so it may be an hour or two, or overnight.
Comment 3 Ross Levine 2008-05-15 16:55:23 UTC
Is the wireless icon red or blue? I left the controller out of range for an hour, when I came back into range the icon remained blue until I tried to do anything. I went to browse music and the icon turned white, everything worked just fine. I'll investigate further but I'm still not seeing this as you describe. 
Comment 4 Patrick Dixon 2008-05-16 05:29:48 UTC
OK, this morning at 8:45am I put the handset in the cradle (out of wireless range) to recharge.  The Duet Rx was 'on' and with a playlist loaded, but not actually playing anything.

At 1.11pm I removed the handset from the cradle and pressed a key, and the display came up with a blue wireless icon.  I took the handset back into the room where the Duet Rx is, and pressed play.  The wireless icon remained blue.  The handset told me that the song was playing and gave me an updating progress bar, but nothing was playing on the Rx. I could (seemingly) pause and stop and start the song (and the handset told me the Rx was doing all this), but I couldn't turn the Duet 'off'.  Nothing was playing and the wireless icon remained blue throughout.

At 1.15pm I got bored of waiting and switched the handset off and then immediately back on.  Pressing play played the song, and the Duet Rx could be turned 'off' and 'on' - ie it all worked correctly. The wireless icon shows max white.

This is completely repeatable for me - the only other thing I can think of, is that there are no other wireless networks around here.  We live at the end of a 1/2 mile drive and some distance from any neighbours (hence the lack of wireless security), so when the handset goes out of range, it sees no wireless activity of any kind.
Comment 5 Patrick Dixon 2008-05-16 05:30:57 UTC
I'll try it for 30 minutes to see how short it needs to be out of range.
Comment 6 Patrick Dixon 2008-05-22 15:41:22 UTC
I've been trying to work out exactly what procedure will reproduce this, but all I can say so far is that it seems to have to be in the charging cradle when it's out of range.

It's actually right on the margin where my charging cradle is, and I can reproduce this bug consistently if I put it in the charging cradle and leave it for a while.  I'm not sure exactly how long, and I can't tell what state it has to be in because the blank screensaver kicks in, and then checking it changes the state.

I haven't been able to reproduce the bug if I don't put leave the handset in the cradle when it's out of range - it seems to reconnect correctly each time.

HTH.
Comment 7 Richard Titmuss 2008-06-26 00:50:31 UTC
Ross, can you please test re Patricks last comments. Thanks.
Comment 8 Ross Levine 2008-06-27 14:46:29 UTC
Patrick thanks for the additional details. I've tried leaving SBC in the charging cradle and out of range for 1 hour, 3 hours, and over night. Each time SBC reconnects just fine when I bring it back within range of the wireless network. 

Could you tell me more about your network? How many wireless access points, are there any repeaters?
Comment 9 KDF 2008-07-17 01:05:39 UTC
I've seen this problem.  I only ever notice it when left in the charger overnight (it's the controller inthe bedroom, so it's only used later night and following morning). I wonder if it may be the result of losing wireless AND a server restart. My setup restarts every night around 5am a part of the logrotate. Perhaps  wireless kicks in and the server isnt there, so it fails to get a full update?

My other controller sits in the same room as the AP so never has this problem, as it's never out of range.
Comment 10 FredFredrickson 2008-07-31 08:30:02 UTC
I have replicated this bug, with the latest built controller software (can't recall the version off hand), here's a quick way to replicate the bug.

-Turn on controller, make sure it's connected and working.
-Unplug router.
-Wait 10 seconds.
-Plug router back in.

All my receivers quickly get back online, but the controller is useless until I reboot it. I do not get a red, or blue wireless icon. It remains white the entire time.
Comment 11 Ross Levine 2008-07-31 13:53:56 UTC
Thanks Fred, what router are you using? I just tried your steps and I'm not seeing the issue with my Belkin F5D8231-4 and 7.1 r2722
Comment 12 James Richardson 2008-09-17 09:47:11 UTC
Possible related form posting 

http://forums.slimdevices.com/showthread.php?p=338658#post338658
Comment 13 Ross Levine 2008-09-24 17:54:45 UTC
(In reply to comment #10)
> I have replicated this bug, with the latest built controller software (can't
> recall the version off hand), here's a quick way to replicate the bug.
> 
> -Turn on controller, make sure it's connected and working.
> -Unplug router.
> -Wait 10 seconds.
> -Plug router back in.
> 
> All my receivers quickly get back online, but the controller is useless until I
> reboot it. I do not get a red, or blue wireless icon. It remains white the
> entire time.
> 

Fred are you still able to reproduce reliably with these steps? I've now tried this with several routers and multiple Duet's and still I'm unable. Every time I try this all Receivers reconnect after about 30 seconds and Controllers after about 60 seconds. 
Comment 14 Ross Levine 2008-09-25 15:25:46 UTC
I reproduced it! Actiontec MI424 with DHCP lease time of 10 minutes and WPA2, 5 controllers 3 receivers. Receivers connected via Ethernet. 

Take Controller(s) out of wireless range, red wireless icon. Let them suspend, I set suspend to 60 minutes to speed this up. While Controller(s) are suspended, power off the router and power it back on a few minutes later. Then bring controller(s) back in wireless range, all blue wireless icons. Powering controller off and back on reconnects. Note, receivers all reconnect just fine, this sure sounds a lot like what support and forums are describing. 

I'll reproduce this one more time and attach a jive log. 
Comment 15 Ross Levine 2008-09-25 17:00:28 UTC
Created attachment 4063 [details]
jive slimserver log

Reproduced this with the steps above. This time I left 2 Controllers within wireless range and this log is from a Controller that was not within range. All 3 were suspended, then router off, back on, wake Controllers, 2 within range reconnected fine the third remains stuck with blue wireless.
Comment 16 FredFredrickson 2008-09-26 08:44:19 UTC
I will give it a try tonight with the latest nightly and get back to you
Comment 17 Ross Levine 2008-09-26 18:48:44 UTC
Reproduced this with a Netgear WNDR3300 as well. 
Comment 18 Ross Levine 2008-09-30 18:02:23 UTC
Richard, does this reproducible case help? If you find time let me know what else you'd like from QA. 

Increasing severity based on our QA meeting. 
Comment 19 Ben Klaas 2008-10-03 13:31:43 UTC
*** Bug 9346 has been marked as a duplicate of this bug. ***
Comment 20 Ben Klaas 2008-10-03 13:36:13 UTC
somewhere...another bug, in the forums, can't find it... but somewhere there was a post about someone getting in this situation and powering down the controller (currently the only workaround for this ugliness). When the user powered down, they saw valid browse data from SC load behind the semi-transparent "goodbye" power down popup.

I just saw the same thing.

It's a very curious development, and at least to me hints that maybe it's not wifi that's the issue, but rather the UI losing its ability to refresh windows with data coming in via the network.
Comment 21 Ben Klaas 2008-10-03 14:05:56 UTC
One of the difficult issues for us here is reproducing this bug reliably. If anyone watching this bug wants to help out, esp. those that see this problem as reproducible, I have some suggestions.

When the controller is in the "blue icon" state:

1. post the output of `netstat -a` from both the SC server machine and from the controller

2. leave the controller in this state and restart SqueezeCenter. Does this do anything to the controller's behavior?

thanks in advance for any help you might provide. I'll keep my eye on this here and hope to reproduce the issue as well.

Comment 22 Ben Klaas 2008-10-04 13:48:32 UTC
Don't know how, but I reproduced this today. Four controllers in the house, 3 of 4 in the "Blue Icon of Death" mode

SqueezeCenter is at 192.168.1.2 (for clarification, there is another development at SC @ 192.168.1.126. None of my controllers' selected player are connected to that SC). 

netstat output for server and two of the three SBCs stuck in BIOD

Server is @ 192.168.1.2
bklaas@mediumspicy:~$ netstat -a | grep 9000
tcp        0      0 *:9000                  *:*                     LISTEN     
tcp        0      0 mediumspicy:9000        192.168.1.127:25894     ESTABLISHED
tcp        0      0 mediumspicy:9000        192.168.1.189:26688     ESTABLISHED
tcp        0      0 mediumspicy:9000        192.168.1.120:60200     ESTABLISHED
tcp        0      0 mediumspicy:9000        192.168.1.125:47763     TIME_WAIT  

(FYI the last entry, which matches controller #1, disappeared the next time I invoked netstat a few minutes later)

Controller #1 @ 192.168.1.125
# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:netperf         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:time            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:echo            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:daytime         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:telnet          0.0.0.0:*               LISTEN      
tcp        0      0 192.168.1.125:ssh       192.168.1.126:56335     ESTABLISHED 
tcp        0      0 192.168.1.125:47529     nat.dc.squeezenetwork.com:9000 ESTABLISHED 
tcp        0      0 192.168.1.125:47762     192.168.1.2:9000        TIME_WAIT   
udp        0      0 0.0.0.0:32768           0.0.0.0:*                           
udp        0      0 0.0.0.0:32769           0.0.0.0:*                           
udp        0      0 0.0.0.0:32771           0.0.0.0:*                           
udp        0      0 0.0.0.0:32772           0.0.0.0:*                           
udp        0      0 0.0.0.0:echo            0.0.0.0:*                           
udp        0      0 0.0.0.0:32776           0.0.0.0:*                           
udp        0      0 0.0.0.0:32777           0.0.0.0:*                           
udp        0      0 0.0.0.0:daytime         0.0.0.0:*                           
udp        0      0 0.0.0.0:time            0.0.0.0:*                           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  7      [ ]         DGRAM                    1317   /dev/log
unix  4      [ ]         DGRAM                    1360   /var/run/wpa_supplicant/eth0
unix  2      [ ]         DGRAM                    1362   /tmp/wpa_ctrl_284-0
unix  2      [ ]         DGRAM                    1421   /tmp/wpa_ctrl_288-0
unix  2      [ ]         DGRAM                    38394  
unix  2      [ ]         DGRAM                    10733  
unix  2      [ ]         DGRAM                    1402   
unix  3      [ ]         STREAM     CONNECTED     1399   
unix  3      [ ]         STREAM     CONNECTED     1398   
unix  2      [ ]         DGRAM                    1348   
unix  2      [ ]         DGRAM                    1320   
# 

Controller #2 @ 192.168.1.165:
# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:netperf         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:time            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:echo            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:daytime         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:telnet          0.0.0.0:*               LISTEN      
tcp        0      0 192.168.1.165:37992     nat.dc.squeezenetwork.com:9000 ESTABLISHED 
tcp        0      0 192.168.1.165:ssh       192.168.1.126:56476     ESTABLISHED 
udp        0      0 0.0.0.0:32768           0.0.0.0:*                           
udp        0      0 0.0.0.0:32769           0.0.0.0:*                           
udp        0      0 0.0.0.0:32771           0.0.0.0:*                           
udp        0      0 0.0.0.0:32772           0.0.0.0:*                           
udp        0      0 0.0.0.0:echo            0.0.0.0:*                           
udp        0      0 0.0.0.0:daytime         0.0.0.0:*                           
udp        0      0 0.0.0.0:time            0.0.0.0:*                           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  7      [ ]         DGRAM                    1317   /dev/log
unix  4      [ ]         DGRAM                    1361   /var/run/wpa_supplicant/eth0
unix  2      [ ]         DGRAM                    1363   /tmp/wpa_ctrl_284-0
unix  2      [ ]         DGRAM                    1420   /tmp/wpa_ctrl_288-0
unix  2      [ ]         DGRAM                    38391  
unix  2      [ ]         DGRAM                    1407   
unix  2      [ ]         DGRAM                    1403   
unix  3      [ ]         STREAM     CONNECTED     1398   
unix  3      [ ]         STREAM     CONNECTED     1397   
unix  2      [ ]         DGRAM                    1349   
unix  2      [ ]         DGRAM                    1320   
# 
Comment 23 Ben Klaas 2008-10-04 13:57:14 UTC
I restarted my SC and 2 of the 3 BIOD controller's re-established connection to SC, albeit not immediately (within a few minutes of restart). The third did not.


here is the netstat data from that controller. You'll see from the ping below it, it does still has network connectivity to the SC machine:

# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:netperf         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:time            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:echo            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:daytime         0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:telnet          0.0.0.0:*               LISTEN      
tcp        0      0 192.168.1.187:ssh       192.168.1.126:56898     ESTABLISHED 
tcp        0      0 192.168.1.187:36226     nat.dc.squeezenetwork.com:9000 ESTABLISHED 
udp        0      0 0.0.0.0:32768           0.0.0.0:*                           
udp        0      0 0.0.0.0:32769           0.0.0.0:*                           
udp        0      0 0.0.0.0:32771           0.0.0.0:*                           
udp        0      0 0.0.0.0:32772           0.0.0.0:*                           
udp        0      0 0.0.0.0:32773           0.0.0.0:*                           
udp        0      0 0.0.0.0:32774           0.0.0.0:*                           
udp        0      0 0.0.0.0:echo            0.0.0.0:*                           
udp        0      0 0.0.0.0:32776           0.0.0.0:*                           
udp        0      0 0.0.0.0:daytime         0.0.0.0:*                           
udp        0      0 0.0.0.0:time            0.0.0.0:*                           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  7      [ ]         DGRAM                    1365   /dev/log
unix  4      [ ]         DGRAM                    1410   /var/run/wpa_supplicant/eth0
unix  2      [ ]         DGRAM                    1412   /tmp/wpa_ctrl_289-0
unix  2      [ ]         DGRAM                    1473   /tmp/wpa_ctrl_293-0
unix  2      [ ]         DGRAM                    64804  
unix  2      [ ]         DGRAM                    1457   
unix  2      [ ]         DGRAM                    1448   
unix  3      [ ]         STREAM     CONNECTED     1441   
unix  3      [ ]         STREAM     CONNECTED     1440   
unix  2      [ ]         DGRAM                    1398   
unix  2      [ ]         DGRAM                    1370   
# ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: seq=0 ttl=64 time=5.851 ms
64 bytes from 192.168.1.2: seq=1 ttl=64 time=5.147 ms

--- 192.168.1.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 5.147/5.499/5.851 ms
# 
Comment 24 Chris Owens 2008-10-06 11:11:56 UTC
Hm.  So in addition to the case Ross is able to reproduce, there may be another case with a Squeezecenter element.  Interesting.
Comment 25 Ross Levine 2008-10-06 16:03:15 UTC
Ben and Richard do you need anything else from QA on this bug? 
Comment 26 Ross Levine 2008-10-07 17:50:40 UTC
This might be a little silly but James suggested looking at this, and he is correct there is another situation where Controller will fail to reconnect to the network. If the DHCP maximum number of devices is reached, and Controller is suspended for the duration of the DHCP lease time, meanwhile another device connects occupying the final DHCP device per the maximum, Controller will only show the please wait spinning wheel and never show an error message. Separate bug?
Comment 27 Ben Klaas 2008-10-07 21:43:46 UTC
on the previous comment: separate bug I think, and not nearly as critical as this one to fix.

I'm afraid this bug might be in rudderless ship territory right now...I reproduced it and provided some netstat data, but I don't think it provided any particular smoking gun, other than the wifi on the controller is still up with IP connectivity. This may be actually be a problem with SqueezeCenter, but I don't have strong enough data to support or refute that conclusively.

I'm really not sure where to go next with this. 

FWIW, I view this as the single most important bug to fix on the controller leading up to 7.3.
Comment 28 Chris Owens 2008-10-08 14:50:28 UTC
It seems to me like this should be assigned to someone other than Ross at this point, although of course he will continue working on it as needed.  Richard, however, is quite busy with another project.

Dean, can we get your input on the next step for this critical bug?
Comment 29 Blackketter Dean 2008-10-08 16:38:48 UTC
I agree, Ben, it is critical.  I think richard needs to look at this. 

Richard?
Comment 30 Ben Klaas 2008-10-14 07:49:42 UTC
another interesting data point to add:

last night the controller in my bedroom was in the "blue icon" state. 

by following these steps:

1. Settings->Advanced->Wireless Network->My currently selected network
2. Select "Forget Network"
3. Go back to Settings->Advanced->Wireless Network and reconnect to the same network

I was able to resume normal operation of my controller (white icon, SC communication worked)
Comment 31 Ross Levine 2008-10-16 15:46:53 UTC
Created attachment 4145 [details]
slimserver logging enabled, power management disabled, DHCP
Comment 32 Ross Levine 2008-10-16 15:47:33 UTC
Created attachment 4146 [details]
static IPs, no DHCP
Comment 33 Ross Levine 2008-10-16 15:49:42 UTC
Enabled slimserver in advanced - logging, and reproduced using my steps in comment #14. Ben I think if you can dedicate a wireless access point to this, using my steps you can reproduce this as often and quickly as you need. Please let me know if you'd like anything else from me. 
Comment 34 Blackketter Dean 2008-10-19 20:53:20 UTC
Thoughs on Ross' new discovery and log, Richard?
Comment 35 James Richardson 2008-10-20 09:22:37 UTC
Ben and Tom are working on this now, they can not reproduce it all the time.  They are attempting to narrow down where the error is now.

Comment 36 Ross Levine 2008-10-20 16:28:24 UTC
Tom points (and I've reproduced) that simply browsing to music library, then artist, prompts the Controller to reconnect to the source. This fixes the problem. 

Does this help?
Comment 37 Richard Titmuss 2008-10-22 10:29:31 UTC
Case in comment #14 is fixed in 7.3 r3188.

I think it unlikely that the bug fixed would apply when the controller was in a cradle (comments #6 and #9) or for AP rebooting (#10). The later works for me consistently.

Comment 38 Patrick Dixon 2008-10-24 09:27:52 UTC
7.3 r3188 does seem to have made its way out to me (ubuntu repos), so I'm unable to test it.
Comment 39 Ben Klaas 2008-10-24 10:38:13 UTC
thanks Patrick and apologies...we're working out some logistical issues with our automated tests that qualify firmware for distribution to beta users.

The good news is that r3191 has some more fixes that _may_ solve this problem. We're hopeful enough that we're also going to be pushing these changes into 7.2.1 firmware so more people can get it. Stay tuned...
Comment 40 Patrick Dixon 2008-11-01 04:48:36 UTC
7.3 r3220

I still see the original bug with this firmware.  The handset shows blue wireless on removing from the charging cradle, and cannot find the network until it's rebooted.

The WAP/DHCP server in my case is a Linksys BEFW11S4 V4.
Comment 41 Ben Klaas 2008-11-05 12:08:32 UTC
Patrick, how often do you see this bug with your controller and >= r3191?

It's puzzling, because the consensus from other reports is that the issue appears resolved, and since we pushed r3191 out to 7.2.1 SC users, customer support calls on the issue have gone to nil. 
Comment 42 Mike Walsh 2008-11-05 12:28:45 UTC
i'd bet its the router.  i had that one, and frankly its crappy, which isn't to say slim shouldn't work with it, but when i went from linksys to what i have now, it was a huge difference.  your router is B class, right?
Comment 43 Patrick Dixon 2008-11-05 13:14:05 UTC
I don't use it as a router, only as a WAP and DHCP server, but yes, it is pretty crap!

I have switched the DHCP server over to my main router, and I haven't seen the issue since, so I think it may just be a compatibility issue with the DHCP server on the Linksys, which isn't really worth worrying about.  I'll test a little more and report back within a couple of days.

Anyway, it's great that the support calls have dropped, well done on the fix(s)!

Comment 44 Mike Walsh 2008-11-05 13:17:44 UTC
i'd bet you are right.  the problem for slim is that those routers, (i used to have one) are still very popular, or rather, pervasive.  

i swore linksys off long ago.  i have heard of success people have had using open source firmware tho, (meaning in general, not with slim issues)
Comment 45 Ben Klaas 2008-11-05 13:21:11 UTC
with cautious optimism I am moving this bug to FIXED

if anyone recreates this with firmware > r3191, reopen and if at all possible please include as much information as possible on the environment and steps to reproduce.

----
on the router discussion- fwiw, I use open source DD-WRT on a Linksys WRT-54GL gateway router and it works really well (and a sizable improvement over the Linksys firmware). I hear even better things about Tomato, but haven't gotten around to using that one. the GL version of that gateway is linux-based with more memory, whereas the G version (or at least newer "G"s) are vxWorks based and have less memory.
Comment 46 Richard Scholl 2008-11-13 15:06:04 UTC
(In reply to comment #45)
> with cautious optimism I am moving this bug to FIXED
> if anyone recreates this with firmware > r3191, reopen and if at all possible
> please include as much information as possible on the environment and steps to
> reproduce.
> ----
> on the router discussion- fwiw, I use open source DD-WRT on a Linksys WRT-54GL
> gateway router and it works really well (and a sizable improvement over the
> Linksys firmware). I hear even better things about Tomato, but haven't gotten
> around to using that one. the GL version of that gateway is linux-based with
> more memory, whereas the G version (or at least newer "G"s) are vxWorks based
> and have less memory.

Hello – not sure this is same bug but Tech Support directed me here.  I have a system with four players: three v1 or v2 Squeezeboxes running firmware v40, and a Duet receiver running firmware v48.  The Controller is running 7.2.r3191.  I have a Linksys BEFSR84 router in the system which provides DCHP services.  My SqueezeCenter is running on a Linux Box using Linux Ubuntu 7.10 2.6.22-14-server; the SC version is 7.2.1 - 23630 @ Mon Oct 20 19:54:08 PDT 2008 - Debian - EN - utf8.  The Linux Box is not running any firewall or antivirus software.

My older squeezeboxes run perfectly and never show any kind of problem unless I power down the router or something, and then they recover quickly and reliably.  The Duet receiver usually is not a problem either – its panel lamp remains white unless I reset it (or there is a major power outage or something, which is not what I’m suggesting is a problem).

The Controller is another story.  Occasionally the Controller will appear to be OK until I change the selected player, and then it can hang up with the “waiting” display – that is, the circle of blinking dots.  It can hang up in this mode for a very long time.  Getting out of this condition is not always so simple.  Below is a sequence I went through for Tech Support, taking notes as I went along trying things.  Note that all players were on and running, connected happily to the linux box running SC, serving up music which I could listen to as I struggled with the Controller. Of course excepting the Duet Receiver, which Tech Support had me reset.

Here’s my note to Tech Support outlining the sequence I went through:
_____________________________
OK, David,
As I said before, I followed the directions and reset both the receiver and the controller (but remember I have three squeezeboxes and the duet receiver in the same system). When I did this, everything seemed to be OK - I checked the controller by selecting the various music sources, and all came up ok. Then, after about an hour, I went over to the controller and looked at it. It was set up with the first music source selected - one of the squeezeboxes. I pushed the "home" button and got the usual menu, went to Select Player, and tried to select the second squeezebox, called "Music Stream 2". It's been 15 minutes and it's still saying "connecting to Music Stream 2".

This is typical of the problem. I believe that it will never come out of this mode without help.*

I pushed the left arrow button for a few seconds and it said "Problem connecting with musicserver" ('musicserver' is the name of my linux box running SqueezeCenter). I am given two choices, "try again" and "Choose Player". If I choose "try again" it still hangs up in the same way. If I push the left arrow at that point, I get a choice of "Settings" and "choose player". If I select Choose Player, I get a blank screen titled "Choose Player". If I choose "Settings" I get a choice of "Screen", "Music Source", "Home Menu", or "Advanced". It's just too complicated for me to go too far into the menu and write about it, but if I select "Music Source", it gives me a choice of "SqueezeNetwork" or "other server". Other Server asks for an IP address. In short, the controller seems lost.

If I then go back to /settings/advanced/factory reset, it resets the controller, and after entering my wireless network WEP code, it says "Problem connecting to network". At that point I removed the battery. When I reinserted it, the system came up and asked to connect to a network. It found my wireless network and I selected it. It then came up at once to a page for me to select a music player. I selected Music Stream 2 and again it locked up. 

I then pushed the left arrow and got a home page saying "settings" and "choose player". Once again, "Choose Player" was a blank screen.

Once again I removed the battery, waited a few minutes, and tried again - same result. 

At this point I once again did a Factory Reset, by the 'left arrow'/Settings/Advanced method. This time it insisted I set up the receiver, so I went into the next room and pushed the front panel button until I got the red light, and it said (eventually) "problem connecting".

Once again I did a "factory reset" as above, so that I was starting from the point where the red light was already flashing on the receiver. This time it connected properly to the wireless network (after entering the WEP code again). Then it asked me to choose the music server, which I did as usual, and it said "problem connecting to music server".

At this point I am giving up and will wait until later on with the battery out of the controller for an even longer period.**

Note that during all this time, my computer could bring up the web page served up by the computer running the SqueezeCenter program, and all older players (named Music Stream 1 through 3) showed up, and all were running and serving up music which I was listening to as I wrote this. So the Squeezeboxes and Music Server running SqueezeCenter were all running fine and compatibly the entire time.  Even the Duet receiver was working fine and serving up audio until I reset it as part of the above process.
_______________________
* Just today, as I started to write this note for the bug 8159 status, I tried out the Controller and it connected just fine to the older Squeezeboxes, but when I asked to connect to the Duet Receiver it hung up in the wait circle display for nearly five minutes.  It did finally connect, however.  I have waited an hour or more without it connecting in the past.
**Later on that day, after a couple of hours, replacement of the battery caused the unit to come up OK, after going through the startup sequence to set up the receiver.

I was not asked, and so did not execute, a ‘netstat –a’ during all this.  If desired, I can be prepared to anything anybody wants the next time it fails; if anybody wants me to do anything let me know.  I’d be happy to collect information or test a later version of the Controller firmware than 3191.

Rich Scholl
Comment 47 Richard Scholl 2008-11-17 14:27:12 UTC
(In reply to comment #45)
> with cautious optimism I am moving this bug to FIXED
> 
> if anyone recreates this with firmware > r3191, reopen and if at all possible
> please include as much information as possible on the environment and steps to
> reproduce.
> 
> ----
> on the router discussion- fwiw, I use open source DD-WRT on a Linksys WRT-54GL
> gateway router and it works really well (and a sizable improvement over the
> Linksys firmware). I hear even better things about Tomato, but haven't gotten
> around to using that one. the GL version of that gateway is linux-based with
> more memory, whereas the G version (or at least newer "G"s) are vxWorks based
> and have less memory.
> 

I now have a definite case where the controller, using r3191 firmware, is locked up with the BIOD.

I have a Linksys BEFSR84 router, which is NOT wireless; I have a NetGear 16-port gigabit switch on its output which feeds four SlimDevices players of various models as well as four wireless access points made by AMX (the Home Automation people), all are model FG2255-01 NXA WAP 200G.  Each of the four WAPs have static IPs (all different, of course).

The blue wireless icon appeared some time last night, as the unit was showing a "now playing" cover art last night when I last looked at it.  This morning the unit shows "Home" and offers selections "Settings" and "Choose Player".

As I write this, I will try Settings (since with the blue icon probably I won't be able to select a player).  I then selected Advanced/Wireless Network and the unit discovered my network and displayed the SSID.  Selecting the network showed the unit was "connected" and displayed the DCHP IP address.  The icon remains blue.  Going back to the home screen, I attempted to choose a player (I just chose the first on the list) and the unit locked up saying "Connecting to <player name>".

Holding the Home button for five seconds then turned off the controller.  I then restarted the controller by pressing the Home button again.  It gave me the usual splash screens and came up with the same screen it exited with: "Connecting to "<player name>".

Hitting the left arrow key for a few seconds brought me to Home.  I then hit Settings/Advanced/Factory Reset/Continue and rebooted the controller.  At that point I selected the language, Wireless Region, Wireless Network, Wireless Encryption, and entered the WEP key.  After some delay, the unit connected and brought up "Choose Player".  I then switched the controller off by holding down the Home key, and restarted it, because I did not want to go through the process of resetting the receiver (which is player 4 on my system), and no player showed up to select.  The unit came up again as a fresh startup (choose Language) so I went through the process once again, after reseting the receiver by pressing the front panel button until it blinked red.  This time the player was shown by its last digits of its MAC address.  After some delay the player connected to the ethernet properly and the controller looked for music sources.  It found my server and offered it as a selection.  I selected it, and got the message "problem connecting" and "try again" or "skip this step".  Meanwhile, my music server continues to serve up music and the web page at http://musicserver:9000.  The message at the bottom of the controller screen says "Couldn't connect your Squeezebox to your ethernet network (Lost Squeezebox)".

Resetting the player again, pressing the front panel button until it blinked fast red, and setting up the unit once again from the controller was successful the second time.

Obviously this is a trial and makes the system virtually unusable.  I hope you find a solution soon, as I have five of these controllers in various locations and all do this on a regular basis.

Thanks

Rich Scholl
Comment 48 Blackketter Dean 2008-11-17 17:07:59 UTC
Ross: Can you take a look at Rich's report and see if you can reproduce?
Comment 49 Ross Levine 2008-11-17 18:10:24 UTC
I agree, the failure Rich describes is critical. I wonder if this failure has something to do with bug 6085 since you have multiple wireless access points. 

Rich do your other 4 Controllers repeat this exact behavior? Are the other 4 networks also using AMX brand Access Points? 
Comment 50 Richard Scholl 2008-11-17 19:14:30 UTC
(In reply to comment #49)
> I agree, the failure Rich describes is critical. I wonder if this failure has
> something to do with bug 6085 since you have multiple wireless access points. 
> 
> Rich do your other 4 Controllers repeat this exact behavior? Are the other 4
> networks also using AMX brand Access Points? 
> 
Hello Ross,

I have just sent a longish email to LaRon Walker at Logitech Squeezebox Support which I think answers your questions.  To save time I'll cut and paste it in here:

__________________________________________
Hello LaRon,

The following information applies to my Hawaii home, which is where I am currently and where we are troubleshooting.

I have three squeezeboxes here and one Duet receiver, all wired directly into the LAN via ethernet cable.  The three squeezeboxes have S/Ns 200BD7A, 203CF2A and 209C9D0; they are running firmware version 40. The Duet receiver has a PID of LZ80426 (which I think corresponds to a serial number) and is running firmware version 48.  The controller is running version 7.2r3191 root@debian-build#111 Wed Oct 22 13:37:05 PDT 2008.  My music server is named "musicserver" and is a Intel Core Duo processor running Ubuntu Linux-server.  The following is from the SqueezeCenter status screen:

    SqueezeCenter Version: 7.2.1 - 23630 @ Mon Oct 20 19:54:08 PDT 2008 - Debian - EN - utf8
    Server IP address: 192.168.1.88
    Perl Version: 5.8.8 i486-linux-gnu-thread-multi
    MySQL Version: 5.0.45-Debian_1ubuntu3.3

    Platform Architecture: i686-linux

    Hostname: musicserver

    Server Port Number: 9000

    Total Players Recognized: 4

Regarding my Local Area Network:

I have an eight-port Linksys BEFSR84 router, which is NOT wireless. Plugged into this is a NetGear JGS524 24-port gigabit switch which feeds a number of items, including four SlimDevices players of various models. In addition, plugged into the switch are four wireless access points, made by AMX (the Home Automation people), all are model FG2255-01 NXA WAP 200G. Each of the four WAPs have static IPs (all different, of course: 192.168.1.241, 242, 244 and 245) and are set to different channels (1,3,6 and 11).  [By the way, I think I said in another email that I had a 16 port switch; this was a mistake.]


Regarding your other questions:

I have an entirely different situation in France.  There I have a D-Link combination DSL modem / wireless router.  Plugged into that is a powerline internet adapter, also by D-Link, which transmits over the power line to two D-Link adapters, in turn connected to a pair of Duet receivers.  A third Duet receiver is connected wirelessly to the router, and a fourth is connected
by CAT-5e cable directly to the router.  In addition I have two Wireless Access Points which are also D-Link, which use the power line adapter method to connect to the router.  These two supply signals to the upstairs of the house and the pool area (which  the modem/wireless router could not reach) for the use of the controllers.  I have three Controllers there, all of which are wirelessly connected, of course.  

We are not troubleshooting that location, and that would be difficult anyway because there is nobody there at present.  I'm hoping we learn something in Hawaii that I can apply in France, but we'll see... if necessary I will troubleshoot that location later on when I am next there (in 2009).

I do not want to connect the Duet receiver to the system wirelessly because the 2.4 GHz signals disturb my audio system, which is in an equipment rack with the players.  This is why every player is connected via CAT5e cable to the switch.

I can turn off all but one of the access points for a test, but I do need them most of the time (the AMX touchpanels connect wirelessly as does my wife on her laptop).  We can hope that during the period there is a single WAP operating that the problem recurs.  It seems that it occurs about once a day but not every day.  I cannot do this today, but perhaps I can get it done tomorrow.  As it is earlier here in Hawaii this might mean that you don't get any information until Wednesday, your time.

Regards

Rich
Comment 51 Richard Scholl 2008-11-17 19:23:08 UTC
(In reply to comment #49)
> I agree, the failure Rich describes is critical. I wonder if this failure has
> something to do with bug 6085 since you have multiple wireless access points. 
> 
> Rich do your other 4 Controllers repeat this exact behavior? Are the other 4
> networks also using AMX brand Access Points? 
> 
Hello Ross,

I just sent a note in response to your comment #49, but only afterward did I look up bug 6085.  As I have multiple access points in my homes both here and in France, it could be relevant.  But I would like to note that I don't typically carry my Controller here in Hawaii out of range of the principal access point it would connect to, and a similar situation would be true in France.  So if the problem did turn out to be confusion due to the multiple APs, I could live with locking the controller to a single channel, in effect locking to a particular AP.  Of course, this would not work for the other people who have a large home and want to "roam", but I just tho't I'd point out it would work for me.

Aloha

Rich
Comment 52 Richard Scholl 2008-11-20 00:07:52 UTC
Hello All,

This is an update:

The unit behaved properly for a couple of days.  During one day the system was running with a single WAP and the other day with all four on.  In either case the unit was synchronized with only one of the WAPs (that is, I didn’t try to “roam”), and it ran perfectly normally.  Just today I had a problem similar to one I’ve had before, which I’ll try to describe below.  Note that this is NOT the same as the BIOD, which I have also experienced recently with r3191 firmware.

The player was displaying cover art from Player 4 (Duet Receiver), but I cannot be sure as I write this that the display was current (that is, that the cover art was actually from the playing album).  I picked up the unit, and when I tried to switch to Player 1, the unit locked up.  The wireless icon was white, not blue.  Pushing left arrow and holding it brought up the Home page – Settings and Choose Player – but any player chosen caused a lockup in the Waiting splash screen.

I then turned the unit off with its power button and restarted it.  The unit came up in Home screen; the wireless icon was white with all bars showing.  The same thing happened when I pulled the battery and reinserted it.  The Home screen, as before, gave me Settings and Choose Player, but trying to actually choose a player caused a lockup.

Getting to the Home screen again, I walked away from the WAP, watching the icon signal level indicator go down and then back up again as I returned to the WAP.  I then walked far enough to be out of range of the WAP, so that the indicator went to a single point (i.e., zero).  At this point I was next to a different WAP (on a different channel).  The unit stayed at “zero” indicator for a short time – maybe 30 seconds – and then showed full strength, indicating that it had locked on to the new WAP on the new channel.  Then I selected a player and the unit starting working normally.

It appears that the issue comes up when for some reason it loses sync with the WAP, even though it still sees and indicates a strong signal.  I am guessing that, if one were to walk out of range of the WAP (without coming into range of another WAP), and then returned, the unit would resync, but I have yet to test that theory as it is difficult for me to turn off my other WAPs here in this environment.  I’ll see if I can do this tomorrow.  The problem of course is that the unit has to act up so I can run the test.

Rich

Comment 53 Ross Levine 2008-11-20 12:32:43 UTC
This isn't BIOD. Rich please follow up with LaRon in support; you're seeing an issue we would like to know about but this bug is specific to blue icon issues where the Controller needs to be rebooted in order to reconnect to the music source. 
Comment 54 Richard Scholl 2008-11-20 12:41:19 UTC
Hello Ross,

Yes, I know this last incidence is not BIOD, but I have definitely seen the BIOD bug on my controller here with r3191 firmware.  That is, I have seen the controller lock up with the wireless icon in the blue state.  See my comment #47 of 2008-11-17 14:27:12.

Sorry if I confused things by including an extraneous report of a separate problem in this bug log.

On the other issue I am working with LaRon as well.  If I see another incidence of the BIOD syndrome, do you want me to report it?  Is there any special sequence you think I might go through to verify it is this bug or a separate one?

Rich
Comment 55 James Richardson 2008-12-15 12:04:50 UTC
This bug has been fixed in the 7.3.0 release version of SqueezeCenter!

Please download the new version from http://www.slimdevices.com/su_downloads.html if you haven't already.  

If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look.
Comment 56 Richard Scholl 2008-12-16 06:39:17 UTC
Thanks(In reply to comment #55)
> This bug has been fixed in the 7.3.0 release version of SqueezeCenter!
> 
> Please download the new version from
> http://www.slimdevices.com/su_downloads.html if you haven't already.  
> 
> If you are still experiencing this problem, feel free to reopen the bug with
> your new comments and we'll have another look.
> 

Thanks, James.  I am no longer at that site; we have left for the holidays.  We'll be back towards the end of January and I'll download the latest SqueezeCenter then.  Meanwhile, I am headed for France (in two weeks) and I'll update that system and check things out there to make sure that all is fixed at that location.
Thanks again
Rich
Comment 57 Anoop Mehta 2009-01-23 17:15:33 UTC
I have a customer with a Rosewill RNX-N4PS router with firmware 1.0.0

He is seeing this same behavior with firmware r3476 on the Controller and SC 7.3.2 installed. 

 
Comment 58 Richard Scholl 2009-01-24 18:19:36 UTC
Hi,
I'm back in Hawaii, downloaded 7.3.2 - 24695 @ Mon Jan 19 17:13:58 PST 2009 and using r3476 on Controller.  After one week, no problems - so far, it looks like you've fixed it for me, at least!  Thanks and aloha
Rich