Bugzilla – Bug 14345
Baby loses ability to use menus intermittently on SqueezeNetwork (DNS-related)
Last modified: 2009-10-22 11:23:40 UTC
I am seeing this issue with baby hardware version 4 and r7779 connected directly to squeezenetwork running off battery. I don't think running off battery matters but I wanted to mention it just the same. I am on my home network served by Comcast and baby is wirelessly connected via an Apple Airport. There is no local squeezecenter running on the network. For example I can successfully navigate to Pandora and start a station playing without issue. The Pandora station will continue to play without issue however after a time accessing the more menu will result in a blank more screen. While baby is in this state navigating to any squeezenetwork served menu does not work and the user is presented with a spinny next to the menu selection. Pandora will continue to play and update successfully while in this state. The diagnostics screen does not reveal anything out of order. After a time baby will recover on its own and the menus will function properly again.
This may also be a server problem but I can't spot anything wrong.
Try to get a full /var/log/messages when this is seen.
I saw a few snippets of logs in cf from Saturday. Can you provide a full /var/log/messages when you see this occur?
I am only able to reproduce this at home. When I get a chance I will also switch out access points to see if it makes a difference in my home environment. Is there any useful debug options that I can enable that might help diagnose the issue? The other thing is I don't see these issues when connected to a local server.
I'm seeing this here in the office right now.
Debug to try on baby, thanks Richard! return { appender={ stdout="debug", syslog="debug" }, category={ ["net.comet"]="debug", ["net.slimproto"]="DEBUG" } } and put that in /etc/squeezeplay/userpath/logconf.lua on your baby
Created attachment 6041 [details] Baby log taken while menu usability varied I was not able to find or notice anything in the log that seem to correlate with the ability to use menus or not. Are there any other debug option I can try? Perhaps a video of the device while the log is running might help. I was also wondering if having a large number of players attached to the MySb account might make a difference.
Created attachment 6048 [details] More baby logs including netstat There is some interesting port activity in the netstat logging. It seems at least two 9000 ports are always open and when these two ports are connected to different MySb is when I can no longer navigate menus. I originally was thinking maybe it had something to do with port 3483 and 9000 being on different MySb but that did not seem to cause issues with menus. I was I have no idea why baby would be connected to different MySb at the same time or even why we need more than one port open on 9000. Netstat logging was suggested by Felix, thanks Felix!
So the suspicion at this point is that we're seeing the same basic problem as the past OpenDNS issues, but caused by widely-distributed ISP DNS caching infrastructure. In short, your local router might get two different datacenters as answers to the DNS query for mysb.com 5 minutes apart, because the requests were routed differently by the ISP. We can test this theory by using the debugging hostname "reflect.mysqueezebox.com". This is a special hostname which resolves (with 10 second TTL) to the address our DNS servers perceive your DNS cache exit point to be at. In other words, your DNS requests may route as player -> your_router -> ISP cache A -> ISP cache B -> ISP cache C -> SN DNS servers. The reflect lookup tells you the public IP address of "ISP cache C". Pinging this address or using it in a web browser will not work, you need to do an nslookup on it. If you do "nslookup reflect.mysqueezebox.com" every 30 seconds or so and monitor the returned IP address, and the theory above about the source of the problem is correct, the reflect address will occasionally change (probably no more often than once every 5 minutes). By plugging these addresses into http://www.maxmind.com/app/locate_ip and looking at the GeoIP locations of them, the problem should become apparent (that your DNS requests appear to be moving cross-country from one request to the next at times). The short term fixup (assuming the above makes sense as the cause) for 7.4.1 is to have SP cache the IP it gets for the long-term connection and use that for the parallel short-term connections, so that it always uses one IP for all connections at any given time. You should still do a fresh lookup every time the long-term connection is disconnected/reconnected. The long term fix is already covered by some things being planned for altering the SN architecture to scale better, which as a side effect guarantees players/users persistently connecting to an assigned datacenter rather than connecting to a GeoIP-based guess at the closest available one.
Created attachment 6063 [details] Log of nslookup and resulting GeoIP This is a log of running nslookup on baby every 15 seconds or so over a short period of time. It also includes GeoIP lookup of the resulting IP. Notice that nslookup changed to a completely different IP five out of six consecutive tries at one point. The result is that at any given moment MySb, as a result of GeoIP, thinks my player is in San Jose, CA or Marlton, NJ!
== Auto-comment from SVN commit #7841 to the jive repo by tom == == https://svn.slimdevices.com/jive?view=revision&revision=7841 == Fixed Bug: 14345 Description: - cache chttp address so rhttp isn't on different host
*** Bug 14492 has been marked as a duplicate of this bug. ***
This bug has been marked as fixed in the 7.4.1 release version of SqueezeBox Server! Please see the Release Notes for all the details: http://wiki.slimdevices.com/index.php/Release_Notes If you haven't already, please download and install the new version from http://www.logitechsqueezebox.com/support/download-squeezebox-server.html If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look.