Bug 14345 - Baby loses ability to use menus intermittently on SqueezeNetwork (DNS-related)
: Baby loses ability to use menus intermittently on SqueezeNetwork (DNS-related)
Status: CLOSED FIXED
Product: SB Radio
Classification: Unclassified
Component: SqueezeNetwork
: Include FW version in comment
: PC Other
: P1 major (vote)
: 7.4.1
Assigned To: Wadzinski Tom
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-27 10:59 UTC by Spies Steven
Modified: 2009-10-22 11:23 UTC (History)
6 users (show)

See Also:
Category: Bug


Attachments
Baby log taken while menu usability varied (274.27 KB, text/plain)
2009-10-06 11:11 UTC, Spies Steven
Details
More baby logs including netstat (240.24 KB, text/plain)
2009-10-07 08:15 UTC, Spies Steven
Details
Log of nslookup and resulting GeoIP (4.67 KB, text/plain)
2009-10-07 22:36 UTC, Spies Steven
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Spies Steven 2009-09-27 10:59:22 UTC
I am seeing this issue with baby hardware version 4 and r7779 connected directly to squeezenetwork running off battery.  I don't think running off battery matters but I wanted to mention it just the same.

I am on my home network served by Comcast and baby is wirelessly connected via an Apple Airport.  There is no local squeezecenter running on the network.

For example I can successfully navigate to Pandora and start a station playing without issue.  The Pandora station will continue to play without issue however after a time accessing the more menu will result in a blank more screen.  While baby is in this state navigating to any squeezenetwork served menu does not work and the user is presented with a spinny next to the menu selection.  Pandora will continue to play and update successfully while in this state.  The diagnostics screen does not reveal anything out of order.  After a time baby will recover on its own and the menus will function properly again.
Comment 1 Andy Grundman 2009-09-27 11:02:44 UTC
This may also be a server problem but I can't spot anything wrong.
Comment 2 Wadzinski Tom 2009-09-27 13:50:12 UTC
Try to get a full /var/log/messages when this is seen.
Comment 3 Wadzinski Tom 2009-09-28 09:48:14 UTC
I saw a few snippets of logs in cf from Saturday.  Can you provide a full
/var/log/messages when you see this occur?
Comment 4 Spies Steven 2009-10-01 11:29:13 UTC
I am only able to reproduce this at home.  When I get a chance I will also switch out access points to see if it makes a difference in my home environment.

Is there any useful debug options that I can enable that might help diagnose the issue?  The other thing is I don't see these issues when connected to a local server.
Comment 5 Ross Levine 2009-10-01 16:11:15 UTC
I'm seeing this here in the office right now.
Comment 6 Spies Steven 2009-10-02 18:53:56 UTC
Debug to try on baby, thanks Richard!

return {
appender={ stdout="debug", syslog="debug" },
category={ ["net.comet"]="debug", ["net.slimproto"]="DEBUG" } 
}
  
and put that in /etc/squeezeplay/userpath/logconf.lua
  
on your baby
Comment 7 Spies Steven 2009-10-06 11:11:03 UTC
Created attachment 6041 [details]
Baby log taken while menu usability varied

I was not able to find or notice anything in the log that seem to correlate with the ability to use menus or not.  Are there any other debug option I can try?  Perhaps a video of the device while the log is running might help.  I was also wondering if having a large number of players attached to the MySb account might make a difference.
Comment 8 Spies Steven 2009-10-07 08:15:00 UTC
Created attachment 6048 [details]
More baby logs including netstat

There is some interesting port activity in the netstat logging.  It seems at least two 9000 ports are always open and when these two ports are connected to different MySb is when I can no longer navigate menus.  I originally was thinking maybe it had something to do with port 3483 and 9000 being on different MySb but that did not seem to cause issues with menus.  I was I have no idea why baby would be connected to different MySb at the same time or even why we need more than one port open on 9000.

Netstat logging was suggested by Felix, thanks Felix!
Comment 9 Brandon Black 2009-10-07 11:39:41 UTC
So the suspicion at this point is that we're seeing the same basic problem as the past OpenDNS issues, but caused by widely-distributed ISP DNS caching infrastructure.  In short, your local router might get two different datacenters as answers to the DNS query for mysb.com 5 minutes apart, because the requests were routed differently by the ISP.

We can test this theory by using the debugging hostname "reflect.mysqueezebox.com".  This is a special hostname which resolves (with 10 second TTL) to the address our DNS servers perceive your DNS cache exit point to be at.  In other words, your DNS requests may route as player -> your_router -> ISP cache A -> ISP cache B -> ISP cache C -> SN DNS servers.  The reflect lookup tells you the public IP address of "ISP cache C".

Pinging this address or using it in a web browser will not work, you need to do an nslookup on it.

If you do "nslookup reflect.mysqueezebox.com" every 30 seconds or so and monitor the returned IP address, and the theory above about the source of the problem is correct, the reflect address will occasionally change (probably no more often than once every 5 minutes).  By plugging these addresses into http://www.maxmind.com/app/locate_ip and looking at the GeoIP locations of them, the problem should become apparent (that your DNS requests appear to be moving cross-country from one request to the next at times).

The short term fixup (assuming the above makes sense as the cause) for 7.4.1 is to have SP cache the IP it gets for the long-term connection and use that for the parallel short-term connections, so that it always uses one IP for all connections at any given time.  You should still do a fresh lookup every time the long-term connection is disconnected/reconnected.

The long term fix is already covered by some things being planned for altering the SN architecture to scale better, which as a side effect guarantees players/users persistently connecting to an assigned datacenter rather than connecting to a GeoIP-based guess at the closest available one.
Comment 10 Spies Steven 2009-10-07 22:36:03 UTC
Created attachment 6063 [details]
Log of nslookup and resulting GeoIP

This is a log of running nslookup on baby every 15 seconds or so over a short period of time.  It also includes GeoIP lookup of the resulting IP.  Notice that nslookup changed to a completely different IP five out of six consecutive tries at one point.  The result is that at any given moment MySb, as a result of GeoIP, thinks my player is in San Jose, CA or Marlton, NJ!
Comment 11 SVN Bot 2009-10-12 06:12:51 UTC
 == Auto-comment from SVN commit #7841 to the jive repo by tom ==
 == https://svn.slimdevices.com/jive?view=revision&revision=7841 ==

Fixed Bug: 14345
Description:
- cache chttp address so rhttp isn't on different host
Comment 12 Chris Owens 2009-10-12 10:17:04 UTC
*** Bug 14492 has been marked as a duplicate of this bug. ***
Comment 13 James Richardson 2009-10-22 11:23:40 UTC
This bug has been marked as fixed in the 7.4.1 release version of SqueezeBox
Server! 

Please see the Release Notes for all the details:
http://wiki.slimdevices.com/index.php/Release_Notes

If you haven't already, please download and install the new version from
http://www.logitechsqueezebox.com/support/download-squeezebox-server.html

If you are still experiencing this problem, feel free to reopen the bug with
your new comments and we'll have another look.