Bugzilla – Bug 11109
TCP changes in recent FW causing Network/DHCP issues.
Last modified: 2009-11-03 03:30:02 UTC
My post http://forums.slimdevices.com/showthread.php?t=60228 is copied below I'd like to suggest that the TCP change made in FW55 for the Duet is causing a DHCP error. Many users are seeing this as a Blue LED error, where the Duet gets hung while trying to renew its DHCP lease. It requires a power cycle to fix. I can't prove this but with FW59 my Duet would hang once every day or so. I remembered a TCP change somewhere around FW55 so I reverted to FW54. I've had no problems with the Duet in the 5 days since I reverted and I have my DHCP leases set to expire every 2 hours. BTW- It may be co-incidence, but my Belkin router also started hanging, requiring a power cycle around the same time. My SB3/Boom do not seem to have network issues, but they have static IP's. 99% of Diets are configured using DHCP. This and the Blue LED are why I believe the TCP change has caused DHCP issues. Also users reverting back to 7.2.1 say their issue goes away. Reverting to 7.2.1 would perform the FW downgrade I did as a test. Since my SB2/3 7 Boom worked fine with the TCP bug in place for the last 3 years, I think the TCP change should be reverted in FW60 as part of 7.3.3 until the proposed change is TRG'd with someone that has more low level TCP/IP knowledge then I do. Related Blue LED posts: http://forums.slimdevices.com/showth...light=blue+led http://forums.slimdevices.com/showth...light=blue+led http://forums.slimdevices.com/showth...light=blue+led http://forums.slimdevices.com/showth...light=blue+led http://forums.slimdevices.com/showth...light=blue+led Generally resulting in this thread http://forums.slimdevices.com/showthread.php?t=59863 with little or no reply from anyone. BTW. I'd like the TCP changes reverted in FW versions for the SB2/3/Boom & Transporter as well, since anyone using DHCP on these devices is probably also experiencing grief.
Thanks for taking the time to pull all reports together. QA: Could you please test if you can see a change in behavior between the mentioned firmware version numbers? Note: Some of the reports in the forums are using Net-UDAP to setup their SBR without an SBC which might be another reason for the blue LED. I retried here with an SBC and SBR setup wireless with a DHCP server (Apple Airport) set to renew every 10 minutes and I don't see the issue. SBR fw 59. Thanks Felix
Felix, The reason I waited 5-6 days to post is because this seems intermittent. FW59 will work for 1-2 days then I'll get the Blue LED. However my router was configured with much longer lease times then a present. FW54 hasn't choked on me yet, but I'll leave it on there to continue testing. BTW 1 more data point. My wireless signal strength on the receiver varies from 40% to 60%. I may not be the only person in that situation. This may or may not be a factor. I'm also wondering if the receiver is somehow losing the SC server's IP address, which causes the Blue LED to come on, and it's only when the box is power cycled that this IP pulled back in from the onboard flash. Thinking about this futher, I expect the Green LED to stay on if DHCP was the issue. I have a D-Link 802.11G router that does much better event logging then the Belkin N1 I have now. I might switch that back in to see if it captures anything. Or perhaps run Wireshark to see if the receiver is perhaps still sending out IP traffic while hung.
Created attachment 4825 [details] Wireshark screen grab I notice that all of the messages captured in the Wireshark screengrab attached have disappeared post FW54. Maybe that's by design. I've put FW59 back on my receiver and will see how long it lasts before getting a Blue LED
Hello Mick Thanks for the wireshark capture. Unfortunately I cannot see any DHCP messages in there. Is that your point? Rereading your initial post, it's not clear to me which 'TCP change' you are referring to checking SC's release notes (could you please post a bug number?). Anyways, DHCP is not using TCP but UDP which makes it very unlikely that the 'TCP change' has caused any additional DHCP issues. Felix
I'm sure I saw entry relating to TCP in a FW change, but can't find it anywhere. The only thing I can see that might be causing the Receiver (only) to hang with a blue LED is this. https://bugs-archive.lyrion.org/show_bug.cgi?id=8647 If the wireless signal is marginal and for some reason the Receiver cannot communicate with the server, would it believe it is asleep and try to send WOL? Would it sit there waiting for the users to press the button? My Receiver has been far more stable over last few days and was back on FW59, now FW60. It's been left running for 24hours and has worked fine, only stopping when the controller hung. Somehow switching off the hung controller got the Receiver to start playing music again. I've reverted the controller to r3476 which is far more stable then r3993
for the future, note that 'blocker' is for developer use to indicate that it is blocking work. Dean speculates that there's 'something wrong with DHCP and we don't understand it and it causes a crash in this particular case.' Related to bug 10774. Dean says we should focus on 10774 and see if this gets fixed automagically. This bug might be useful in finding a way to reproduce that bug. Mick, are you trying to use your router to constantly assign the same IP address to your squeezebox? Or are you just using regular DHCP to assign whatever address it wants?
Sorry about incorrectly rating the importance of this issue Chris. My router is a fairly basic Belkin job. It uses vanilla DHCP. You cannot even do static assignments by MAC address, however it would re-assign the same address over an over to the receiver. Since switching to 1/2 hour leases things seem more stable. To be honest things are frustratingly stable for me now on FW60 and r3476 on the controller, when at the time I logged the bug, I couldn't get more then 24 hours without having to power cycle the receiver which is on top of my kitchen units. I can't reproduce the issue, but I know something was wrong and others are seeing the exact same thing. I think in places where the wireless is marginal, the receiver is no longer recovering the way it used to. Perhaps something has changed which has just unmasked an issue I've had all along. I'm going away for a few days and will leave the receiver playing a playlist to see if it completes a torture test. If successful, I'll change the DHCP lease out to a week to see if that has an effect.
Created attachment 4844 [details] Server log during blue led Maybe I spoke too soon. White light on receiver. Turned on controller and hit play. No response. Now blue led on receiver. I pushed the front button, but nothing happened. Ran upstairs to capture logs and I hear music start downstairs. Attached is the log.
*What version of SqueezeCenter are you running, please include the Build Number. Go to Settings > Status. Cut and Paste the SqueezeCenter and Player Information sections to this bug *What is the brand of your Network Router *What is the Model number *What is the firmware revision in the router *What type of encryption (if any) is enabled *Is the SSID Hidden or Broadcast *Is Static or Dynamic IP address assignment used *Are your SB Receivers connecting to the network Wireless or are they Wired to your Router (or a hub then to the router) *What other devices do you have attaching to your network *What is the system hardware you have SqueezeCenter running on. Make, Model, CPU, Memory, Operating system, as much info as you can give me *When you saw this error, were you connected to SqueezeCenter or SqueezeNetwork
Created attachment 4878 [details] Server log files *What is the brand of your Network Router : Belkin 802.11n *What is the Model number : F5D8631-4 v3 (01) *What is the firmware revision in the router : 3.01.06 (Apr 10 2008 17:34:55) *What type of encryption (if any) is enabled : WEP *Is the SSID Hidden or Broadcast : Broadcast *Is Static or Dynamic IP address assignment used : DHCP (Receiver default) *Are your SB Receivers connecting to the network Wireless or are they Wired to your Router (or a hub then to the router) : Wireless *What other devices do you have attaching to your network : SB3, Boom, Jive controller, PS3, Thinkpad laptop *What is the system hardware you have SqueezeCenter running on. Make, Model, CPU, Memory, Operating system, as much info as you can give me : Dell Dimension 2.4Ghz P4 with 3Gb RAM, Win XP SP3 - same model since SC6.1 *When you saw this error, were you connected to SqueezeCenter or SqueezeNetwork : Only ever use SC. I think the following is significant. Receiver has an assigned IP 192.168.2.201, but is not visible in SC. SC is getting discovery requests but somehow failing to recognise the player. [09-03-01 22:24:17.4556] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:17.4566] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-03-01 22:24:17.4577] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk
Sorry post 10 was truncated. [09-03-01 22:24:18.9148] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-03-01 22:24:18.9156] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-03-01 22:24:18.9169] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-03-01 22:24:19.5613] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:22.0232] Slim::Networking::Discovery::gotTLVRequest (194) sending response [09-03-01 22:24:22.0241] Slim::Networking::Discovery::Server::gotTLVResponse (161) discovery response packet: [09-03-01 22:24:23.1723] Slim::Networking::Slimproto::slimproto_close (243) connection closed [09-03-01 22:24:24.4533] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:27.4606] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:27.4612] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-03-01 22:24:27.4617] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-03-01 22:24:27.4626] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-03-01 22:24:28.1556] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31001] [09-03-01 22:24:28.1562] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection [09-03-01 22:24:28.1573] Slim::Networking::Slimproto::client_readable (427) Error reading from client: Unknown error [09-03-01 22:24:28.1578] Slim::Networking::Slimproto::slimproto_close (243) connection closed [09-03-01 22:24:28.2055] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:28.2060] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-03-01 22:24:28.2066] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk [09-03-01 22:24:37.4524] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-03-01 22:24:37.4530] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-03-01 22:24:37.4535] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-03-01 22:24:37.4544] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-03-01 22:24:38.1657] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31002] [09-03-01 22:24:38.1667] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection [09-03-01 22:24:38.1680] Slim::Networking::Slimproto::client_readable (427) Error reading from client: Unknown error [09-03-01 22:24:38.1688] Slim::Networking::Slimproto::slimproto_close (243) connection closed [09-03-01 22:24:38.1829] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31003] [09-03-01 22:24:38.1837] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection [09-03-01 22:24:38.1851] Slim::Networking::Slimproto::_hello_handler (946) Killing bogus player timer. [09-03-01 22:24:38.1873] Slim::Networking::Slimproto::_hello_handler (980) Squeezebox says hello: Deviceid: 7 revision: 60 mac: 00:04:20:16:01:b8 uuid: f7977586ed9da685c379841e359b1a8b bitmapped: 32768 reconnect: 0 wlan_channellist: 0000 lang: EN [09-03-01 22:24:38.1890] Slim::Networking::Slimproto::_hello_handler (992) Squeezebox also says: bytes_received: 0 [09-03-01 22:24:38.1903] Slim::Networking::Slimproto::_hello_handler (1171) Hello from existing client: 00:04:20:16:01:b8 on ipport: 192.168.2.201:31003 [09-03-01 22:24:38.1926] Slim::Player::Squeezebox::stream (934) strm-q [09-03-01 22:24:38.1955] Slim::Player::Squeezebox::stream (934) strm-q [09-03-01 22:24:38.2129] Slim::Networking::Discovery::gotTLVRequest (194) sending response [09-03-01 22:24:38.2142] Slim::Networking::Discovery::Server::gotTLVResponse (161) discovery response packet:
I'm trying to figure out when this issue started https://bugs-archive.lyrion.org/show_bug.cgi?id=11109 and like others have found that 7.2.1 fixes the issue. I'm prepared to do some root cause analysis, but to do this I'll need a least 1 nightly for each week from Oct to Feb. I'm pretty certain that my 7.3 system was stable at Xmas when it was used quite a bit. Does anyone have a 7.3 Windows nightly from around the 20th Dec? I'm also interested in testing before and after this bug https://bugs-archive.lyrion.org/show_bug.cgi?id=10634 was fixed. I wonder does this have some unintended consequences for people like me with a receiver that has 45%-60% signal strength.
Ross: does this sound familiar to anything you are already investigating?
May I suggest a test case for QA to try to reproduce this problem? Set up a Duet system using on a SC that is programmed to go to sleep/suspend after 15 minutes inactivity. Add in a SB3 as a control. With the system working properly allow the SC to go to sleep. Wake up the SC using a WoL packet (or otherwise). Check whether the blue light on the SBR turns white and check if the SB3 comes alive. If only the SB3 comes alive, you know you have a problem. (In general, it would be good if standard testing included the suspend-awake sequence for the SC, which many of your users take for granted will be supported.)
I went back to 7.21 some time as consensus on the forums is that that version does not see the Blue LED issue on the receiver. In my case, it seems to make little difference, so I went back to the latest 7.33. I've also assigned static IP's to both the controller and receiver. What I see now is that I don't get a blue LED, the led stays white. So the issue wouldn't appear to be DHCP related. However the receiver has been "forgotten" by SC and is no longer visible in the web UI. The player also no longer responds to commands sent by the controller. So I think this is due to marginal wireless signal and how SC copes with players that momentarily "disappear" off a wireless network. There's obviously some reason why SC won't accept a reconnect. I still see all of these messages while the player has been forgotten..... 00:04:20:16:01:b8 [09-05-01 06:53:22.4416] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-05-01 06:53:22.4422] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-05-01 06:53:22.4431] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-05-01 06:53:25.4407] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 [09-05-01 06:53:25.4413] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-05-01 06:53:25.4418] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-05-01 06:53:25.4427] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-05-01 06:53:26.1568] Slim::Networking::Slimproto::slimproto_accept (190) Accepted connection from: [192.168.2.7:26721] [09-05-01 06:53:26.1574] Slim::Networking::Slimproto::slimproto_accept (193) Setting timer in 5 seconds to close bogus connection [09-05-01 06:53:26.1592] Slim::Networking::Slimproto::client_readable (430) Error reading from client: Unknown error [09-05-01 06:53:26.1597] Slim::Networking::Slimproto::slimproto_close (246) connection closed [09-05-01 06:53:26.2085] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 ... repeated over and over in the server log file until hours later the server allows the receiver to reconnect. [09-05-01 12:35:09.3204] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox [09-05-01 12:35:09.3210] Slim::Networking::Discovery::serverHostname (53) calculated oreillymj-desk length: 17 [09-05-01 12:35:09.3219] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response. [09-05-01 12:35:09.4237] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: grfe, length: 1284 [09-05-01 12:35:09.9387] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: grfe, length: 644 [09-05-01 12:35:10.0399] Slim::Networking::Slimproto::slimproto_accept (190) Accepted connection from: [192.168.2.7:26880] [09-05-01 12:35:10.0409] Slim::Networking::Slimproto::slimproto_accept (193) Setting timer in 5 seconds to close bogus connection [09-05-01 12:35:10.0419] Slim::Networking::Slimproto::client_readable (391) Slimproto frame: HELO, len: 36 [09-05-01 12:35:10.0425] Slim::Networking::Slimproto::_hello_handler (949) Killing bogus player timer. [09-05-01 12:35:10.0433] Slim::Networking::Slimproto::_hello_handler (983) Squeezebox says hello: Deviceid: 7 revision: 60 mac: 00:04:20:16:01:b8 uuid: f7977586ed9da685c379841e359b1a8b bitmapped: 32768 reconnect: 0 wlan_channellist: 0000 lang: EN [09-05-01 12:35:10.0438] Slim::Networking::Slimproto::_hello_handler (995) Squeezebox also says: bytes_received: 0 [09-05-01 12:35:10.0445] Slim::Networking::Slimproto::_hello_handler (1174) Hello from existing client: 00:04:20:16:01:b8 on ipport: 192.168.2.7:26880 [09-05-01 12:35:10.0454] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: vers, length: 5 [09-05-01 12:35:10.0470] Slim::Player::Squeezebox::stream (931) strm-q [09-05-01 12:35:10.0481] Slim::Player::Squeezebox::stream (976) sending strm frame of length: 24
Felix do you see anything useful in the log?
(In reply to comment #15) > I've also assigned static IP's to both the controller and receiver. > What I see now is that I don't get a blue LED, the led stays white. > So the issue wouldn't appear to be DHCP related. I don't get your logic there: if changing from DHCP to static IP's changed the behavior from blue light to staying white, that seems like a massive clue to me. Yet you say, the issue wouldn't appear to be DHCP related? My observation is that IP parameters obtained by DHCP appear to get corrupted whenever a discover request goes from the SBR goes unanswered because the server has gone to sleep. I can't actually peek inside the RAM of the SBR to prove it is the case, but symptoms are definitely consistent with that theory. Namely, SBR obtains good IP parameters using DHCP on first power up and will play fine for hours, but the light turns blue and stays blue if the SC is allowed to suspend/resume. Other SD players(e.g. SB3) cope with the same scenerio just fine.
Felix: your thoughts?
No new ideas, really. The initial bug talks about a fw change that might have caused this issue (and only on SBR) which is hard to believe, since all TCP related fw changes would affect all ip3k player. Also I wasn't able to reproduce. I am going to close this bug for now and suggest that people having this issue please retry with the latest soft- and firmware and if the problem still exists please reopen this bug. Thanks Felix