Bug 11109 - TCP changes in recent FW causing Network/DHCP issues.
: TCP changes in recent FW causing Network/DHCP issues.
Status: RESOLVED WORKSFORME
Product: SB Receiver
Classification: Unclassified
Component: General
: 56
: PC All
: -- normal (vote)
: ---
Assigned To: Felix Mueller
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-17 02:24 UTC by Mick
Modified: 2009-11-03 03:30 UTC (History)
2 users (show)

See Also:
Category: ---


Attachments
Wireshark screen grab (98.22 KB, image/jpeg)
2009-02-18 01:20 UTC, Mick
Details
Server log during blue led (5.40 KB, application/zip)
2009-02-23 13:40 UTC, Mick
Details
Server log files (7.93 KB, application/zip)
2009-03-01 14:43 UTC, Mick
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mick 2009-02-17 02:24:02 UTC
My post http://forums.slimdevices.com/showthread.php?t=60228 is copied below

I'd like to suggest that the TCP change made in FW55 for the Duet is causing a DHCP error.

Many users are seeing this as a Blue LED error, where the Duet gets hung while trying to renew its DHCP lease. It requires a power cycle to fix.

I can't prove this but with FW59 my Duet would hang once every day or so.
I remembered a TCP change somewhere around FW55 so I reverted to FW54.
I've had no problems with the Duet in the 5 days since I reverted and I have my DHCP leases set to expire every 2 hours.

BTW- It may be co-incidence, but my Belkin router also started hanging, requiring a power cycle around the same time.

My SB3/Boom do not seem to have network issues, but they have static IP's.
99% of Diets are configured using DHCP.
This and the Blue LED are why I believe the TCP change has caused DHCP issues. Also users reverting back to 7.2.1 say their issue goes away. Reverting to 7.2.1 would perform the FW downgrade I did as a test.

Since my SB2/3 7 Boom worked fine with the TCP bug in place for the last 3 years, I think the TCP change should be reverted in FW60 as part of 7.3.3 until the proposed change is TRG'd with someone that has more low level TCP/IP knowledge then I do.


Related Blue LED posts:

http://forums.slimdevices.com/showth...light=blue+led
http://forums.slimdevices.com/showth...light=blue+led
http://forums.slimdevices.com/showth...light=blue+led
http://forums.slimdevices.com/showth...light=blue+led
http://forums.slimdevices.com/showth...light=blue+led


Generally resulting in this thread

http://forums.slimdevices.com/showthread.php?t=59863

with little or no reply from anyone.

BTW. I'd like the TCP changes reverted in FW versions for the SB2/3/Boom & Transporter as well, since anyone using DHCP on these devices is probably also experiencing grief.
Comment 1 Felix Mueller 2009-02-17 04:22:17 UTC
Thanks for taking the time to pull all reports together.

QA: Could you please test if you can see a change in behavior between the mentioned firmware version numbers?

Note: Some of the reports in the forums are using Net-UDAP to setup their SBR without an SBC which might be another reason for the blue LED.

I retried here with an SBC and SBR setup wireless with a DHCP server (Apple Airport) set to renew every 10 minutes and I don't see the issue. SBR fw 59.

Thanks
Felix
Comment 2 Mick 2009-02-17 05:39:46 UTC
Felix,

The reason I waited 5-6 days to post is because this seems intermittent.

FW59 will work for 1-2 days then I'll get the Blue LED. However my router was configured with much longer lease times then a present.
FW54 hasn't choked on me yet, but I'll leave it on there to continue testing.


BTW 1 more data point. My wireless signal strength on the receiver varies from 40% to 60%. I may not be the only person in that situation. This may or may not be a factor.


I'm also wondering if the receiver is somehow losing the SC server's IP address, which causes the Blue LED to come on, and it's only when the box is power cycled that this IP pulled back in from the onboard flash.

Thinking about this futher, I expect the Green LED to stay on if DHCP was the issue.

I have a D-Link 802.11G router that does much better event logging then the Belkin N1 I have now. I might switch that back in to see if it captures anything. Or perhaps run Wireshark to see if the receiver is perhaps still sending out IP traffic while hung.
Comment 3 Mick 2009-02-18 01:20:34 UTC
Created attachment 4825 [details]
Wireshark screen grab

I notice that all of the messages captured in the Wireshark screengrab attached have disappeared post FW54.

Maybe that's by design. I've put FW59 back on my receiver and will see how long it lasts before getting a Blue LED
Comment 4 Felix Mueller 2009-02-21 20:34:36 UTC
Hello Mick

Thanks for the wireshark capture. Unfortunately I cannot see any DHCP messages in there. Is that your point?

Rereading your initial post, it's not clear to me which 'TCP change' you are referring to checking SC's release notes (could you please post a bug number?).

Anyways, DHCP is not using TCP but UDP which makes it very unlikely that the 'TCP change' has caused any additional DHCP issues.

Felix
Comment 5 Mick 2009-02-23 04:07:36 UTC
I'm sure I saw entry relating to TCP in a FW change, but can't find it anywhere.

The only thing I can see that might be causing the Receiver (only) to hang with a blue LED is this.

https://bugs-archive.lyrion.org/show_bug.cgi?id=8647

If the wireless signal is marginal and for some reason the Receiver cannot communicate with the server, would it believe it is asleep and try to send WOL? Would it sit there waiting for the users to press the button?

My Receiver has been far more stable over last few days and was back on FW59, now FW60.


It's been left running for 24hours and has worked fine, only stopping when the controller hung. Somehow switching off the hung controller got the Receiver to start playing music again.

I've reverted the controller to r3476 which is far more stable then r3993
Comment 6 Chris Owens 2009-02-23 09:11:16 UTC
for the future, note that 'blocker' is for developer use to indicate that it is blocking work.

Dean speculates that there's 'something wrong with DHCP and we don't understand it and it causes a crash in this particular case.'

Related to bug 10774.  Dean says we should focus on 10774 and see if this gets fixed automagically.  This bug might be useful in finding a way to reproduce that bug.

Mick, are you trying to use your router to constantly assign the same IP address to your squeezebox?  Or are you just using regular DHCP to assign whatever address it wants?
Comment 7 Mick 2009-02-23 09:48:57 UTC
Sorry about incorrectly rating the importance of this issue Chris.

My router is a fairly basic Belkin job. It uses vanilla DHCP. You cannot even do static assignments by MAC address, however it would re-assign the same address over an over to the receiver.

Since switching to 1/2 hour leases things seem more stable. To be honest things are frustratingly stable for me now on FW60 and r3476 on the controller, when at the time I logged the bug, I couldn't get more then 24 hours without having to power cycle the receiver which is on top of my kitchen units.

I can't reproduce the issue, but I know something was wrong and others are seeing the exact same thing. I think in places where the wireless is marginal, the receiver is no longer recovering the way it used to. Perhaps something has changed which has just unmasked an issue I've had all along.

I'm going away for a few days and will leave the receiver playing a playlist to see if it completes a torture test.

If successful, I'll change the DHCP lease out to a week to see if that has an effect.
Comment 8 Mick 2009-02-23 13:40:06 UTC
Created attachment 4844 [details]
Server log during blue led

Maybe I spoke too soon.
White light on receiver. Turned on controller and hit play. No response. Now blue led on receiver. I pushed the front button, but nothing happened. Ran upstairs to capture logs and I hear music start downstairs.

Attached is the log.
Comment 9 James Richardson 2009-02-23 16:08:58 UTC
*What version of SqueezeCenter are you running, please include the Build
Number.  Go to Settings > Status.  Cut and Paste the SqueezeCenter and Player
Information sections to this bug

*What is the brand of your Network Router

*What is the Model number

*What is the firmware revision in the router

*What type of encryption (if any) is enabled

*Is the SSID Hidden or Broadcast

*Is Static or Dynamic IP address assignment used

*Are your SB Receivers connecting to the network Wireless or are they Wired to
your Router (or a hub then to the router)

*What other devices do you have attaching to your network

*What is the system hardware you have SqueezeCenter running on.  Make, Model,
CPU, Memory, Operating system, as much info as you can give me

*When you saw this error, were you connected to SqueezeCenter or SqueezeNetwork
Comment 10 Mick 2009-03-01 14:43:22 UTC
Created attachment 4878 [details]
Server log files

*What is the brand of your Network Router : Belkin 802.11n

*What is the Model number : F5D8631-4 v3 (01) 

*What is the firmware revision in the router : 3.01.06 (Apr 10 2008 17:34:55)

*What type of encryption (if any) is enabled : WEP

*Is the SSID Hidden or Broadcast : Broadcast

*Is Static or Dynamic IP address assignment used : DHCP (Receiver default)

*Are your SB Receivers connecting to the network Wireless or are they Wired to
your Router (or a hub then to the router) : Wireless

*What other devices do you have attaching to your network : SB3, Boom, Jive controller, PS3, Thinkpad laptop

*What is the system hardware you have SqueezeCenter running on.  Make, Model,
CPU, Memory, Operating system, as much info as you can give me : Dell Dimension 2.4Ghz P4 with 3Gb RAM, Win XP SP3 - same model since SC6.1

*When you saw this error, were you connected to SqueezeCenter or SqueezeNetwork
 : Only ever use SC.


I think the following is significant.

Receiver has an assigned IP 192.168.2.201, but is not visible in SC. SC is getting discovery requests but somehow failing to recognise the player.


[09-03-01 22:24:17.4556] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8
[09-03-01 22:24:17.4566] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-03-01 22:24:17.4577] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk
Comment 11 Mick 2009-03-01 14:51:32 UTC
Sorry post 10 was truncated.

[09-03-01 22:24:18.9148] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-03-01 22:24:18.9156] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk length: 17
[09-03-01 22:24:18.9169] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-03-01 22:24:19.5613] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8
[09-03-01 22:24:22.0232] Slim::Networking::Discovery::gotTLVRequest (194) sending response
[09-03-01 22:24:22.0241] Slim::Networking::Discovery::Server::gotTLVResponse (161) discovery response packet:
[09-03-01 22:24:23.1723] Slim::Networking::Slimproto::slimproto_close (243) connection closed
[09-03-01 22:24:24.4533] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8

[09-03-01 22:24:27.4606] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8
[09-03-01 22:24:27.4612] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-03-01 22:24:27.4617] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk length: 17
[09-03-01 22:24:27.4626] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-03-01 22:24:28.1556] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31001]
[09-03-01 22:24:28.1562] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection
[09-03-01 22:24:28.1573] Slim::Networking::Slimproto::client_readable (427) Error reading from client: Unknown error
[09-03-01 22:24:28.1578] Slim::Networking::Slimproto::slimproto_close (243) connection closed
[09-03-01 22:24:28.2055] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8
[09-03-01 22:24:28.2060] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-03-01 22:24:28.2066] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk


[09-03-01 22:24:37.4524] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8
[09-03-01 22:24:37.4530] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-03-01 22:24:37.4535] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk  length: 17
[09-03-01 22:24:37.4544] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-03-01 22:24:38.1657] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31002]
[09-03-01 22:24:38.1667] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection
[09-03-01 22:24:38.1680] Slim::Networking::Slimproto::client_readable (427) Error reading from client: Unknown error
[09-03-01 22:24:38.1688] Slim::Networking::Slimproto::slimproto_close (243) connection closed
[09-03-01 22:24:38.1829] Slim::Networking::Slimproto::slimproto_accept (187) Accepted connection from: [192.168.2.201:31003]
[09-03-01 22:24:38.1837] Slim::Networking::Slimproto::slimproto_accept (190) Setting timer in 5 seconds to close bogus connection
[09-03-01 22:24:38.1851] Slim::Networking::Slimproto::_hello_handler (946) Killing bogus player timer.
[09-03-01 22:24:38.1873] Slim::Networking::Slimproto::_hello_handler (980) Squeezebox says hello:  Deviceid: 7 revision: 60 mac: 00:04:20:16:01:b8 uuid: f7977586ed9da685c379841e359b1a8b bitmapped: 32768 reconnect: 0 wlan_channellist: 0000 lang: EN
[09-03-01 22:24:38.1890] Slim::Networking::Slimproto::_hello_handler (992) Squeezebox also says: bytes_received: 0
[09-03-01 22:24:38.1903] Slim::Networking::Slimproto::_hello_handler (1171) Hello from existing client: 00:04:20:16:01:b8 on ipport: 192.168.2.201:31003
[09-03-01 22:24:38.1926] Slim::Player::Squeezebox::stream (934) strm-q
[09-03-01 22:24:38.1955] Slim::Player::Squeezebox::stream (934) strm-q
[09-03-01 22:24:38.2129] Slim::Networking::Discovery::gotTLVRequest (194) sending response
[09-03-01 22:24:38.2142] Slim::Networking::Discovery::Server::gotTLVResponse (161) discovery response packet:
Comment 12 Mick 2009-03-06 01:42:14 UTC
I'm trying to figure out when this issue started https://bugs-archive.lyrion.org/show_bug.cgi?id=11109 and like others have found that 7.2.1 fixes the issue.

I'm prepared to do some root cause analysis, but to do this I'll need a least 1 nightly for each week from Oct to Feb. I'm pretty certain that my 7.3 system was stable at Xmas when it was used quite a bit. 

Does anyone have a 7.3 Windows nightly from around the 20th Dec?

I'm also interested in testing before and after this bug https://bugs-archive.lyrion.org/show_bug.cgi?id=10634 was fixed. I wonder does this have some unintended consequences for people like me with a receiver that has 45%-60% signal strength.
Comment 13 James Richardson 2009-03-28 10:12:43 UTC
Ross: does this sound familiar to anything you are already investigating?
Comment 14 Charles Razzell 2009-04-26 09:58:18 UTC
May I suggest a test case for QA to try to reproduce this problem?  

Set up a Duet system using on a SC that is programmed to go to sleep/suspend after 15 minutes inactivity.  Add in a SB3 as a control. With the system working properly allow the SC to go to sleep. Wake up the SC using a WoL packet (or otherwise). Check whether the blue light on the SBR turns white and check if the SB3 comes alive. If only the SB3 comes alive, you know you have a problem.

(In general, it would be good if standard testing included the suspend-awake sequence for the SC, which many of your users take for granted will be supported.)
Comment 15 Mick 2009-05-01 09:00:56 UTC
I went back to 7.21 some time as consensus on the forums is that that version does not see the Blue LED issue on the receiver.

In my case, it seems to make little difference, so I went back to the latest 7.33.

I've also assigned static IP's to both the controller and receiver.
What I see now is that I don't get a blue LED, the led stays white.
So the issue wouldn't appear to be DHCP related.

However the receiver has been "forgotten" by SC and is no longer visible in the web UI. The player also no longer responds to commands sent by the controller.

So I think this is due to marginal wireless signal and how SC copes with players that momentarily "disappear" off a wireless network. There's obviously some reason why SC won't accept a reconnect. I still see all of these messages while the player has been forgotten.....


00:04:20:16:01:b8
[09-05-01 06:53:22.4416] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-05-01 06:53:22.4422] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk length: 17
[09-05-01 06:53:22.4431] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-05-01 06:53:25.4407] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8 
[09-05-01 06:53:25.4413] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-05-01 06:53:25.4418] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk length: 17
[09-05-01 06:53:25.4427] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-05-01 06:53:26.1568] Slim::Networking::Slimproto::slimproto_accept (190) Accepted connection from: [192.168.2.7:26721]
[09-05-01 06:53:26.1574] Slim::Networking::Slimproto::slimproto_accept (193) Setting timer in 5 seconds to close bogus connection
[09-05-01 06:53:26.1592] Slim::Networking::Slimproto::client_readable (430) Error reading from client: Unknown error
[09-05-01 06:53:26.1597] Slim::Networking::Slimproto::slimproto_close (246) connection closed
[09-05-01 06:53:26.2085] Slim::Networking::Discovery::gotDiscoveryRequest (87) gotDiscoveryRequest: deviceid = 7, revision = 3.12, MAC = 00:04:20:16:01:b8

... repeated over and over in the server log file until hours later the server allows the receiver to reconnect.

[09-05-01 12:35:09.3204] Slim::Networking::Discovery::gotDiscoveryRequest (99) It's a Squeezebox
[09-05-01 12:35:09.3210] Slim::Networking::Discovery::serverHostname (53)  calculated oreillymj-desk length: 17
[09-05-01 12:35:09.3219] Slim::Networking::Discovery::gotDiscoveryRequest (110) gotDiscoveryRequest: Sent discovery response.
[09-05-01 12:35:09.4237] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: grfe, length: 1284
[09-05-01 12:35:09.9387] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: grfe, length: 644
[09-05-01 12:35:10.0399] Slim::Networking::Slimproto::slimproto_accept (190) Accepted connection from: [192.168.2.7:26880]
[09-05-01 12:35:10.0409] Slim::Networking::Slimproto::slimproto_accept (193) Setting timer in 5 seconds to close bogus connection
[09-05-01 12:35:10.0419] Slim::Networking::Slimproto::client_readable (391) Slimproto frame: HELO, len: 36
[09-05-01 12:35:10.0425] Slim::Networking::Slimproto::_hello_handler (949) Killing bogus player timer.
[09-05-01 12:35:10.0433] Slim::Networking::Slimproto::_hello_handler (983) Squeezebox says hello:  Deviceid: 7 revision: 60 mac: 00:04:20:16:01:b8 uuid: f7977586ed9da685c379841e359b1a8b bitmapped: 32768 reconnect: 0 wlan_channellist: 0000 lang: EN
[09-05-01 12:35:10.0438] Slim::Networking::Slimproto::_hello_handler (995) Squeezebox also says: bytes_received: 0
[09-05-01 12:35:10.0445] Slim::Networking::Slimproto::_hello_handler (1174) Hello from existing client: 00:04:20:16:01:b8 on ipport: 192.168.2.7:26880
[09-05-01 12:35:10.0454] Slim::Player::Squeezebox::sendFrame (1055) sending squeezebox frame: vers, length: 5
[09-05-01 12:35:10.0470] Slim::Player::Squeezebox::stream (931) strm-q
[09-05-01 12:35:10.0481] Slim::Player::Squeezebox::stream (976) sending strm frame of length: 24
Comment 16 Ross Levine 2009-05-01 12:14:45 UTC
Felix do you see anything useful in the log?
Comment 17 Charles Razzell 2009-05-02 09:44:24 UTC
(In reply to comment #15)

> I've also assigned static IP's to both the controller and receiver.
> What I see now is that I don't get a blue LED, the led stays white.
> So the issue wouldn't appear to be DHCP related.

I don't get your logic there: if changing from DHCP to static IP's changed the behavior from blue light to staying white, that seems like a massive clue to me. Yet you say, the issue wouldn't appear to be DHCP related?

My observation is that IP parameters obtained by DHCP appear to get corrupted whenever a discover request goes from the SBR goes unanswered because the server has gone to sleep.  I can't actually peek inside the RAM of the SBR to prove it is the case, but symptoms are definitely consistent with that theory. Namely, SBR obtains good IP parameters using DHCP on first power up and will play fine for hours, but the light turns blue and stays blue if the SC is allowed to suspend/resume. Other SD players(e.g. SB3) cope with the same scenerio just fine.
Comment 18 James Richardson 2009-10-30 13:56:00 UTC
Felix: your thoughts?
Comment 19 Felix Mueller 2009-11-03 03:29:50 UTC
No new ideas, really. The initial bug talks about a fw change that might have caused this issue (and only on SBR) which is hard to believe, since all TCP related fw changes would affect all ip3k player. Also I wasn't able to reproduce.

I am going to close this bug for now and suggest that people having this issue please retry with the latest soft- and firmware and if the problem still exists please reopen this bug.

Thanks
Felix