Bugzilla – Bug 4067
HTTP server: Address already in use
Last modified: 2011-03-16 04:34:11 UTC
Solaris SPARC 9 U5 SlimServer 6.5 2006-09-08 Perl v5.8.7 With the latest nightly build, the slimserver startup fails with the message can't setup the listening port 6998 for the HTTP server: Address already in use at /usr/local/src/SlimServer_6.5_v2006-09-08/server/Slim/Web/HTTP.pm line 167. Now, the interesting thing is that regardless of what TCP port I tell it to use, that error is generated. And even if it is running as user root. Additionally, "netstat -a | grep 6998" reveals no one bound to that port, listening or otherwise. I've tried this with about eight different ports all above 4096 and no luck. Switched back to running the 2006-09-03 build and all is well (i.e., it can bind to all of those ports just fine, and repeatedly too).
BTW, looking at Slim/Web/HTTP.pm, I don't see any changes which would seem relevant. That would then point a finger at my Perl's HTTP::Daemon. However, that has not been changed by me in over a month. But, I notice that the directory structure for the SlimServer install has changed a bit. When I unpackaged the nightly, I had SlimServer_6.5_v2006-09-08/server/CPAN and after I ran SlimServer_6.5_v2006-09-08/server/Bin/build-perl-modules.pl I then had the new directory SlimServer_6.5_v2006-09-08/CPAN/ So, I have two CPAN directories now. Moreover, when I tried to start slimserver.pl, it couldn't find YAML's dump.al anywhere. I dealt with that. However, that all now leads me to wonder if something is cockeyed with how SlimServer is building and/or looking for its CPAN dependencies. And, if so, it could mean that SlimServer is now seeing the wrong version of HTTP::Daemon. Thoughts?
P.S. When I rean build-perl-modules, I gave it the path SlimServer_6.5_v2006-09-08 rather than SlimServer_6.5_v2006-09-08/server as the path to my "SlimServer directory". I've since rerun it giving it SlimServer_6.5_v2006-09-08/server and now I don't get the extra CPAN directory. However, still the same underlying issue of being told that it cannot bind to a port that is most definitely not in use according to netstat.
Our local solaris gurus suggest netstat is lying, and to run 'lsof -i: 6998'
I'm inclined to agree. The more I've looked at things this morning, the more I'm convinced that the problem isn't with SlimServer or Perl but rather with the TCP/IP stack. While I have written C code to bind to the port, ran it and it succeeded, I simply cannot see why the Perl code is failing. Net, net, I suspect the TCP/IP stack, especially since I'm running it with multipathing and virtual interfaces -- two more layers of potential trouble. I suppose I could persue the bug databases (I work for Sun as a software engineer), however, I'll take the more expedient path of 1. Pulling a copy of lsof (not shipped with Solaris) and see if it sheds any light, and 2. If I don't get satisfaction from 1, then scheduling a reboot for tonight.... I suppose I could also update my patches for the system too...
FWIW, lsof turns up nothing on those ports.... For example, # lsof -i :6998 # But lsof is working as shown by running it when SlimServer 6.5 2006-09-03 is running and bound to port 8998 # lsof -i :8998 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME perl 23149 slimbeta 10u IPv4 0x300044e1b70 0t0 TCP *:8998 (LISTEN) perl 23149 slimbeta 15u IPv4 0x30609e8e1f0 0t618836 TCP mtbaldy.us:8998->dhcp-10.30.0.4.mtbaldy.us:49569 (ESTABLISHED) # And, yes, I tried running SlimServer 6.5 2006-09-08 on 8998 also: as with the other ports I tried, it couldn't bind. (No other SlimServer was running at that time.) Beats me why this 2006-09-08 build cannot bind to any ports and the 2006-09-03 can. At this point I'll probably try a reboot this evening.
In light of http://sunsolve.sun.com/search/document.do?assetkey=1-26-101834-1&searchclause=101834 I'll be installing Solaris patch 118305-05 tonight. Since that patch needs to be followed by a reboot, it won't be obvious what the "real" fix was for this issue with SlimServer as I will have changed two variables at once: installed a patch AND rebooted the system. I'll update this bug after the reboot.
Closing as INVALID. This is indeed a (known) Solaris 9 (and 10) bug. Spoke with a/the responsible engineer at Sun, stopped and restarted some interfaces, and all works now (but will again get into a bad state until such time that I apply patches 112233-12 and 118305-08). My situation was apparently exacerbated by use of virtual interfaces and IP multipathing so hopefully it's less likely that others may see this.
Wanted to add one additional bit of debugging info. If a machine is running the network backup system Legato Networker / EMC Networker / Sun EBS then the backup system may be usurping TCP ports 7937-9936 such that processes may not be able to bind to them even if they don't show up in use with lsof or netstat. After I installed the Sun patches mentioned previously in this report, I still had difficulty with some ports in that range. Use nsrports -S to resolve any issues with that port range.
Thanks for the additional info, Dan!