Bug 12475 - DNS name server testing causes DNS queries to fail consistently
: DNS name server testing causes DNS queries to fail consistently
Status: CLOSED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: Misc
: 7.3.3
: PC Windows XP
: -- normal (vote)
: 7.3.4
Assigned To: Andy Grundman
:
Depends on: 12487
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-20 20:50 UTC by Jan Mikkelsen
Modified: 2009-10-05 14:33 UTC (History)
7 users (show)

See Also:
Category: ---


Attachments
server.log with network.asyncdns set to "debug" (3.17 KB, text/x-log)
2009-06-24 10:58 UTC, tc_hansen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Mikkelsen 2009-06-20 20:50:56 UTC
After upgrading to Squeezecenter 7.3.3, I found that last.fm scrobbling was not working.  Looking at the logs and at old logs, I see that DNS was failing on the earlier version, but it was falling back to OpenDNS. OpenDNS then successfully resolved the names.

Stopping and restarting squeezeserver resolves the problem with 7.3.3.

DNS on the local network is provided by a caching resolver on a Netcomm NB5 ADSL modem. It is connected via ethernet, and the interface is configured using DHCP.

After looking at the source code in /7.3/trunk/server/Slim/Networking/Async/DNS.pm in the online svn repository, there is a bunch of code that provides a DNS name cache and also caches resolver IP addresses after "testing" them. The DNS cache never deals with changes to the list of configured responders. It also considers failures to with local responders during configuration to be configuration errors rather than considering that they could just be transient failures.

This attempt to work around user configuration errors actually causes problems on systems with transient failures that are correctly configured. An example of transient failures like this is network interfaces being configured during startup or nameservers that do not have Internet connectivity at the time of the test. These problems were previous masked by a second level of "assistance", namely falling back to OpenDNS.

This module needs to deal with changes with the resolver list during operation. Once it does that, given that it already does parallel queries against all known resolvers, there is no need to test resolvers. If a resolver is failing, it doesn't matter anyway and if the resolver had a transient failure during initial testing it will not be falsely ignored.

The test code can consider a server "failed" for many reasons. For example, the code queries for the names a.root-servers.net through to m.root-servers.net. This will work if connectivity is available to the resolver even if the resolver doesn't have internet connectivity if the resolver is configured to have that name. That is true for BIND, but it is not true for some other DNS servers like djbdns.
Comment 1 Jim McAtee 2009-06-20 23:01:08 UTC
Why doesn't Logitech run its own resolving DNS servers for either fallback or primary use by Squeezeboxes?
Comment 2 Brandon Black 2009-06-21 00:48:35 UTC
(In reply to comment #1)
> Why doesn't Logitech run its own resolving DNS servers for either fallback or
> primary use by Squeezeboxes?

We did at one time, and it's a bad idea because then we're hardcoding IP addresses in software and/or firmware that might outlive our address assignments.  This has already bit us once, causing us to keep a machine alive in a datacenter we longer use for anything else, just to serve that outdated IP address for old firmware.  Further than that inconvenience and cost, in theory the ISP and/or IANA could yank back that IP at any time if they had a need, which would really screw those firmwares.

In more recent SC's and firmwares, the old hardcoded slimdevices fallback IP for service and DNS resoution mentioned above was replaced with just a DNS fallback to OpenDNS's public servers, and now in 7.3.3 we've removed that as well because it causes us headaches with split resolution of our datacenters (OpenDNS uses anycast addressing to reach geographically dispersed DNS caches, so depending on random internet latency/drop factors, the customer can appear to be just about anywhere from a DNS point of view).

Removing the OpenDNS fallback has exposed DNS issues with our code that were being masked by both fallbacks in the past that need to be addressed.  A working local DNS resolver configuration of some kind is pretty essential to functioning in any reasonable capacity on the internet, so I don't think it's unreasonable that we require functioning DNS resolution on the customer's network.  We just need to make sure we don't have bugs related to using it properly.
Comment 3 James Richardson 2009-06-22 10:43:03 UTC
*** Bug 12486 has been marked as a duplicate of this bug. ***
Comment 4 James Richardson 2009-06-22 10:46:00 UTC
maybe related to bug 12487
Comment 5 Andy Grundman 2009-06-23 05:09:55 UTC
Please try 7.3.4 change 27205.  As long as /etc/resolv.conf is available when SC starts it should fix the problem.
Comment 6 Andy Grundman 2009-06-23 05:10:13 UTC
*** Bug 12487 has been marked as a duplicate of this bug. ***
Comment 7 tc_hansen 2009-06-23 13:56:00 UTC
I udgraded SC to 7.3.4 change 27205 as suggested, but it didn't solve the problem for me. Log entries after fresh reboot:

2009-06-23 22:08:47 squeezecenter_safe started.
[09-06-23 22:08:52.7661] main::init (270) Starting SqueezeCenter (v7.3.4, r27205, Tue Jun 23 11:50:45 PDT 2009)
[09-06-23 22:08:57.4069] Slim::Networking::Async::DNS::init (93) No DNS servers responded, you may have problems with network requests.
[09-06-23 22:09:06.4903] Slim::Utils::Firmware::downloadAsyncError (546) Warning: Firmware: Failed to download http://update.squeezenetwork.com/upd...4/jive.version (Couldn't resolve IP address for: update.squeezenetwork.com), will try again in 10 minutes.
[09-06-23 22:09:26.2598] Slim::Utils::Scanner::Remote::__ANON__ (223) Error: Can't connect to remote server to retrieve playlist: Couldn't resolve IP address for: www.dr.dk.
[09-06-23 22:09:31.2776] Slim::Utils::Scanner::Remote::__ANON__ (223) Error: Can't connect to remote server to retrieve playlist: Couldn't resolve IP address for: wmscr1.dr.dk.
[09-06-23 22:09:31.4908] Slim::Networking::SqueezeNetwork::_error (455) Unable to login to SN: Couldn't resolve IP address for: www.squeezenetwork.com
[09-06-23 22:09:31.4910] Slim::Networking::SqueezeNetwork::_init_error (182) Unable to login to SqueezeNetwork, sync is disabled: Couldn't resolve IP address for: www.squeezenetwork.com
[09-06-23 22:09:31.4913] Slim::Networking::SqueezeNetwork::_init_error (192) SqueezeNetwork sync init failed: Couldn't resolve IP address for: www.squeezenetwork.com, will retry in 1200

The contents of my /etc/resolv.conf:

# Generated by NetworkManager
nameserver 89.150.129.4
nameserver 89.150.129.10

System:
Ubuntu 9.04
Version: 7.3.4 - 27205 @ Tue Jun 23 11:50:45 PDT 2009
Hostname: thomas-desktop
Server IP Address: 192.168.1.35
Server HTTP Port Number: 9000
Operating system: Debian - EN - utf8
Platform Architecture: i686-linux
Perl Version: 5.10.0 - i486-linux-gnu-thread-multi
MySQL Version: 5.0.75-0ubuntu10.2
Total Players Recognized: 1
Comment 8 yeswork 2009-06-23 13:58:29 UTC
7.3.4 - 27205

I still have DNS errors

Fedora 11  

ping works fine
Comment 9 Andy Grundman 2009-06-23 14:12:40 UTC
Please post a log with "--debug network.asyncdns" enabled.
Comment 10 Jan Mikkelsen 2009-06-23 15:59:52 UTC
Thanks for the update.  I will probably not be able to test whether this resolves the problem on XP until tomorrow Australian time.

However, after a quick inspection of the change, I don't think it will resolve my problem, and I can speculate about why it could still fail for others.  The response says "as long as /etc/resolv.conf is available when SC starts".  A big part of the problem is that /etc/resolv.conf (or its XP equivalent) is not available when SC starts.

There are these basic cases on SC startup:

1. Resolvers are correct and functional when - No problem.

2. Resolvers are correct and non-functional - Previous problem.

3. Resolvers are not configured - Problem

4. Resolvers are incorrectly configured (or validly change during SC operation) - Problem.

This patch solves case 2, but does not solve case 3 or case 4.

My view of the correct change is to get rid of the $LocalDNS entirely and to get rid of the "nameservers" parameter when initialising Net::DNS::Resolver.  I find it hard to imagine that caching resolvers like that has any measurable performance effect and the process of doing that caching leads to incorrect operation.  The cost of the resolver library doing a stat(2) on /etc/resolv.conf to see whether it has changed is not worth worrying about on a modern machine.

You can make the argument that if there are no DNS servers at startup, then DNS servers can be redetected and rechecked later during operation, and thereby preserve the $LocalDNS mechanism.  That would solve problem 3, but it would not solve problem 4.

The correct approach is to let the DNS resolver library do its job and just call it when you need a name resolved.
Comment 11 Brandon Black 2009-06-23 16:53:22 UTC
(In reply to comment #10)

> The correct approach is to let the DNS resolver library do its job and just
> call it when you need a name resolved.

If by "the DNS resolver library" you mean gethostbyname(), the whole reason we're down this path is because gethostbyname() is not asynchronous, and the rest of our code is.
Comment 12 Jan Mikkelsen 2009-06-23 18:43:06 UTC
No, I didn't mean gethostbyname, I meant Net::DNS::Resolver, although I take your point about asynchronous operation.  Asynchronous operation is a separate issue to keeping cached copies of system configuration details.

(It is a long time since I used Perl regularly and I don't know the implementation details of their resolver; a quick look at the Perl resolver source shows it doesn't cache configuration details.  I don't think that really matters; I shouldn't have raised it in my previous comment.)

My basic suggestion is populate the list of nameservers to use for each query from the result of the nameservers() method on a default constructed Net::DNS::Resolver object rather than keeping a cached copy from some earlier point in the process execution.

A change on line 148 from:

		my $servers = $args->{servers} || $LocalDNS;

to:

		my $servers = $args->{servers} || Net::DNS::Resolver->new->nameservers();

Is the essence of my suggestion.

There are probably issues with Perl scalar vs. array context rules; I really can't remember those anymore.  I would also suggest removing the internal name cache.  That kind of thing can surprise administrators who expect restarting their resolver or calling "ipconfig /flushdns" on windows to clear things out.  But I'm not really concerned about that because it isn't causing me a problem.
Comment 13 Andy Grundman 2009-06-23 19:00:45 UTC
I'll see if there's a way to check for valid resolvers at post-startup time.
Comment 14 tc_hansen 2009-06-24 10:58:45 UTC
Created attachment 5360 [details]
server.log with network.asyncdns set to "debug"

Attached is a server.log copied immediately after reboot with network.asyncdns debug enabled. I hope I did it right (Settings - Advanced - Logging).
Comment 15 Jan Mikkelsen 2009-06-25 05:37:04 UTC
I have just tested 7.3.4/27205 on XP.  There are DNS failure entries in the log for 10s after Squeezecenter startup, and then things have settle down.  It looks up last.fm fine and scrobbles correctly.

So: this fixes my immediate problem, thanks.  My comments about nameservers changing are still valid, and I suspect there are other cases where the problem will still be present.
Comment 16 James Richardson 2009-10-05 14:33:50 UTC
This bug has been marked as fixed in the 7.4.0 release version of SqueezeBox Server!
    * SqueezeCenter: 28672
    * Squeezebox 2 and 3: 130
    * Transporter: 80
    * Receiver: 65
    * Boom: 50
    * Controller: 7790
    * Radio: 7790  

Please see the Release Notes for all the details: http://wiki.slimdevices.com/index.php/Release_Notes

If you haven't already, please download and install the new version from http://www.logitechsqueezebox.com/support/download-squeezebox-server.html

If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look.