Bug 13237 - DNS issues persist after removing OpenDNS
: DNS issues persist after removing OpenDNS
Status: NEW
Product: SB Receiver
Classification: Unclassified
Component: General
: 62
: All All
: P3 major with 3 votes (vote)
: Future
Assigned To: Felix Mueller
: Support-Important
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-04 17:10 UTC by Dan Evans
Modified: 2011-01-19 09:56 UTC (History)
10 users (show)

See Also:
Category: Task


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Evans 2009-08-04 17:10:21 UTC
Since the release of 7.3.3 firmwares we no longer have issues with setting up Duets where the Controller and Receiver end up on different servers.

But we do have a new kind of failure where the player cannot resolve DNS to SN.  We see this on:

 * SB Duet
 * SB Boom
 * SB Classic

Behavior on IP3K players is during setup there will be an error about "cannot resolve IP address for SN".  Resetting the Squeezebox does not fix this.  However, in most cases simply power cycling the router and trying setup again solves the problem.

There is no pattern with routers.  We see this with multiple router makes and models.  We also do not see any pattern with OS.

During our weekly software meeting when this was brought up, Brandon and Andy speculated that our devices are timing out too early-- that our DNS code was not as robust as it should be and is failing under conditions that other devices were able to weather.

More investigation is needed here.
Comment 1 James Richardson 2009-09-11 14:40:38 UTC
*** Bug 13981 has been marked as a duplicate of this bug. ***
Comment 2 henrik.ekman 2009-09-11 21:47:15 UTC
Dear Andy, Brandon and the Squeezebox development team,

First of all, I'm amazed and positively surprised the openeness of your bug report system and I would like to thank you for possibility as and end user to file a bug in your bugzilla system!

I filed yesterday a bug 13981 and after seeing the the analysis of this case (being a duplicate of mine, 13981) I would like to comment the following:

* I can confirm that the problems appeared right after upgradign the Logitech Squeezebox to 127 firmware. No observations at all prior that.

* I am not an SW expert, but I can confirm that when booting up the squeezebox from cold start (mains voltage disconnected) the normal bootup sequence passes 50% probability nicely (obtaining IP address...) but sometimes very rapidly after receiving the IP address the system says not being able to connect to the Squeezenetwork. This makes me thinking, that there is far too fast timeout to stop the process connecting to the Squeezenetwork. I do say this because if the system fails to connect to the Squeezenetwork, then after configuring the connection by just accepting all the previous settings and letting to Squeezebox again to connect to the Squeezenetwork, the connection is always established.

* Something has changed since the last one or two months, because the system worked perfectly 6 months with my current broadband network and DLink DIR-855 wireless router (that has not been upgraded lately).

* In case you would like to receive a video clip when the problem appears, please let me know and I try to capture that with my digital camera.

* If there would be any way to downgrade my Squeezebox Classic back to the level 125, I willing to do it and to report back whether the problem disappears. I was searching how could I do it, but I could not find any information to do it by myself.

Rgrds, henrik.ekman@kolumbus.fi , bug 13981
Comment 3 henrik.ekman 2009-09-13 10:08:21 UTC
Dear all Squeezebox developers,

While you and I was having a wonderful weekend, I made some further observations related to the bug 13981 (a duplicate of this one).

1) When the Squeezebox can't fide the Squeezenetwork - waiting extra 30 seconds, it finally finds the Squeezenetwork.

2) Seems like a short power off does not trig the problem, but 0.5h disconnection does it.

In case you are interested in, I captured a small 2 minutes video how it goes. Due to the file size (more than 230Mbytes), I uploaded it into my personal website for you to download. It may take quite some time for you to download it, so I would recommend you to download it on Monday business time in USA. I will delete the file from the folder on Tuesday morning (European time).

The file can be found from here - due to potentially slow connection, download the file to your hard drive.

http://www.kolumbus.fi/ekman/squeezebox

You can see all what happens when I start up the device by connecting it to the mains voltage. In the end I scroll thru with the remote controller certain menus for you to gather more information (I was not aware of ways to export any error logs for you...) on the display of the Squeezebox.

Henrik Ekman
henrik.ekman@kolumbus.fi
http://www.henrikekman.com
Comment 4 Dan Evans 2009-10-01 10:44:40 UTC
Since the 7.4 release, we are getting reports of Booms, Duets and Transporters that cannot connect to mySB.com.  This is after they updated to latest 7.4 firmware.  We've tried power-cycling the router and player when we can, but it did not fix.  

Per Brandon's request, we're going to go back to these customers and try cycling again multiple times, to rule out an intermittent fail.

While still investigating these issues, I want to bump this bug up in priority so it gets reviewed and/or gets attention.
Comment 5 Andy Grundman 2009-10-01 10:55:19 UTC
We won't be able to do new ip3k firmware for 7.4.1.  I think the only solution may be to rewrite the ip3k DNS client.
Comment 6 Felix Mueller 2009-10-02 03:04:00 UTC
There is a DNS timeout value currently set to 5 seconds in the ip3k DNS client.

The help text for that value reads as follows: "This is the value of the amount of seconds that the DNS server must respond by."

Andy / Brandon: Do you think increasing the DNS timeout would help in this particular situation? I.e. do you think the issue is that ip3k DNS client is not waiting for an answer long enough?

Or do you think ip3k DNS client is struggling with some particular answer packet from a DNS server and that is why you, Andy, suggested to rewrite it?

If we increase the DNS timeout value, what would be a sensitive value - 10 seconds or even more?

All: How can we test if whatever we change actually fixes the issue? Do we have a test case other than reports from users / support?
Comment 7 Andy Grundman 2009-10-02 05:06:02 UTC
5 seconds seems long enough.  From looking at the code I think the issue could be that the query is not retried if it's lost.
Comment 8 Brandon Black 2009-10-02 08:25:51 UTC
When in doubt, emulate the behavior of known-working stuff.  I'd suggest we take a peek at the standard behavior of the DNS backend for glibc's gethostbyname() as a model for how to retry and timeout.
Comment 9 Chris Owens 2009-10-02 12:02:36 UTC
If the DNS lookup of the network is not working, then the PC should also not be able to ping mysqueezebox.com, true?  That would be a diagnostic support could perform.
Comment 10 Dan Evans 2009-10-02 12:55:11 UTC
We have been testing this.  80% of the time the computer _can_ ping mySB.com, and it can talk to mySB.com in their web browser.  

We've had one or two cases where it couldn't, in which case that's likely not our problem but the users' networks.
Comment 11 henrik.ekman 2009-10-06 12:05:08 UTC
Dear Sirs in Squeezebox team,

Firstly I still would like to thank you for opening your bug databes for the customers to follow your investigations.

At the moment, I can confirm that my Logitech Squeezebox Classic - the one I reported earlier (bug 13981)behaves still systematically as before. Every time when waking up the unit with cold start (connect the mains plug) it says most of the time "Cannot connect to mysqueezebox.com...". But what I do is that I just leave the unit like that for a while with the error message. After some time, it blanks the display (completely black) and after few seconds it suddenly illuminates the display and everyhthing is ok. And the unit works.

Short power interruption does not trig the problem.

Since last time reporting this, I upgraded my Squeezebox with new firmware 130, but it seems not to correct the issue.

As I said before, the se problems appeared after the upgrade I reported in the bug 13981 and there were no firmware upgrades for the Dlink DIR-855 WLAN router that time. The system worked perfectly several months.

If I would know any way to export an error log for you, I would do it if that would help your investigation. Or please let me know if you want me to make the test at certain precise timing so you could track the events/errors at the server side.

Rgrds, Henrik
Comment 12 Felix Mueller 2009-11-18 23:09:21 UTC
Dan: It would be helpful to gather the following information in the "Current Settings" menu when a customer gets the "Cannot connect to mysqueezebox.com...":

- ethernet / wireless
- static ip / DHCP
- ip address
- subnet mask
- gateway
- DNS server 1 and 2
- Squeezebox Server
- mac address
- host name
- firmware version

BTW: I tried with a Boom firmware 50 using DHCP against every router I have to test (about 7) and also connected Boom directly to the cable modem (i.e. without router) but wasn't able to make it fail.

As I said in comment #6 it would be helpful to have a reproducible case.

Thanks
Felix
Comment 13 Felix Mueller 2009-11-21 10:58:56 UTC
7.5 P1's are only for Touch related bugs. Moving to P2.
Comment 14 henrik.ekman 2009-12-24 00:29:35 UTC
Dear Sirs,

May I ask whether there will be a fix for this issue - I'm still running the the firmware 130 with Logitech Squeezebox Classic and systematically facing this problem? Unfortunately every day.

In case you want to set up something special (test SW, online chat/call while reproducung the problem) - always willing to help you with that.

Merry Christmas!

Rgrds, Henrik
Comment 15 Chris Owens 2010-01-04 16:00:36 UTC
Changing priorities due to management guidance.
Comment 16 Chris Owens 2010-03-08 11:28:09 UTC
Moving lower-priority bugs to next target
Comment 17 henrik.ekman 2010-03-09 10:32:40 UTC
An update to this issue - no updates to my Dlink WLAN router nor SB (frimware 130) the issue has nearly disappeared now. Without any fixing or changing any settings.

Rgrds, Henrik
Comment 18 Thomas Lackey 2010-05-17 08:44:46 UTC
I noticed the last updates mentioned the Touch.

I'm having consistent problems accessing any mySB dependent services (Pandora, Sirius, etc.) with SB 7.5.0 and an SB Touch.  In fact, I've never succeeded in using any of them.

The server reports:

Couldn't resolve IP address for: www.mysqueezebox.com

And the logs say:

[10-05-17 09:56:37.5043] Slim::Formats::XML::gotErrorViaHTTP (332) Error: getting http://www.mysqueezebox.com/api/pandora/v1/opml
Couldn't resolve IP address for: www.mysqueezebox.com
[10-05-17 09:56:39.0516] Slim::Networking::SqueezeNetwork::_error (477) Unable to login to SN: Couldn't resolve IP address for: www.mysqueezebox.com
[10-05-17 09:56:39.0519] Slim::Networking::SqueezeNetwork::_init_error (204) Unable to login to mysqueezebox.com, sync is disabled: Couldn't resolve IP address for: www.mysqueezebox.com (http://www.mysqueezebox.com)
[10-05-17 09:56:39.0524] Slim::Networking::SqueezeNetwork::_init_error (218) mysqueezebox.com sync init failed: Couldn't resolve IP address for: www.mysqueezebox.com, will retry in 300 (http://www.mysqueezebox.com)
[10-05-17 09:57:11.3317] Slim::Networking::SqueezeNetwork::_error (477) Unable to login to SN: Couldn't resolve IP address for: www.mysqueezebox.com
[10-05-17 10:01:51.0602] Slim::Networking::SqueezeNetwork::_error (477) Unable to login to SN: Couldn't resolve IP address for: www.mysqueezebox.com
[10-05-17 10:01:51.0610] Slim::Networking::SqueezeNetwork::_init_error (204) Unable to login to mysqueezebox.com, sync is disabled: Couldn't resolve IP address for: www.mysqueezebox.com (http://www.mysqueezebox.com)
[10-05-17 10:01:51.0616] Slim::Networking::SqueezeNetwork::_init_error (218) mysqueezebox.com sync init failed: Couldn't resolve IP address for: www.mysqueezebox.com, will retry in 600 (http://www.mysqueezebox.com)

The DNS setup is fine.  I can run nslookup against either of the configured DNS servers and get the address for www.mysqueezebox.com promptly every time.  I can also paste any of the URLs ever printed in the SB log in a web browser on the same computer and open them without issue.  I can telnet to www.mysqueezebox.com on all the relevant ports, and the control panel tab confirms that as well.

I've tried restarting without success.

Info:

Version: 7.5.0 - r30464 @ Thu Apr 1 05:51:56 PDT 2010
Hostname: XXXXXXXXX
Server IP Address: XXXXXX.118
Server HTTP Port Number: 9000
Operating system: Windows Server 2003 - EN - cp1252 
Platform Architecture: 586
Perl Version: 5.10.0 - MSWin32-x86-multi-thread
MySQL Version: 5.0.22-community-nt
Total Players Recognized: 1

Player Model: Squeezebox Touch
Firmware: 7.5.0-r8673
Player IP Address: XXXXXXXXX.133
Player MAC Address: XXXXXXXXXXX
Wireless Signal Strength: 56%


If you'd like to have someone "in the wild" to test, I'd be glad to.  I'm a software engineer, so this is pretty much daily stuff for me too.
Comment 19 Thomas Lackey 2010-05-26 10:05:04 UTC
This was irritating me, so I dug a little deeper.

I sniffed the traffic and could see no evidence that it was even attempting a DNS query for www.mysqueezebox.com.

I decided to see how it read the DNS settings (since it seems it doesn't just call gethostbyname or getaddrinfo or some other OS-based DNS lookup).

I noticed in DNS.pm that even on Windows it supports reading a resolv.conf file when one is specified in the environment.

I added one at c:/etc/resolv.conf with my nameservers, set the env variable on my account, and started up the server.  Everything worked immediately.

I didn't track down the exact location of the problem, but it must be failing to read the server info from Perl and/or from ipconfig in such a way that it gives up entirely.

At first glance in the code you'd think it would use the hard-coded fallback servers, but I saw no evidence in the packet capture that it actually tried to do so.
Comment 20 y360 2010-09-04 01:36:12 UTC
I have not been able to use my local SC since I upgraded to 7.3.3
Direct access via mysb.com is fine

I can see no misconfiguration in my network

I'm surprised this is still an open bug with only P3 severity
For those of us affected this means no SC
Comment 21 Felix Mueller 2011-01-19 09:56:17 UTC
Thomas Lackey, y360,

may I kindly ask you to open separate bugs for your DNS issues as this bug is strictly about ip3k devices (SB Classic, Boom, Receiver and TR).

Thanks
Felix