Bugzilla – Bug 11569
Fab4 fails to connect to last known connection if AP is turned on after booting (same for ethernet)
Last modified: 2009-09-08 09:21:20 UTC
How to reproduce: - Setup Fab4 to use wireless - Power down wireless router - Reboot Fab4 Fab4 tries to reconnect, fails and eventually reboots repeatedly. Observations: If after booting up, I kill the jive process via "/etc/init.d/squeezeplay stopwdog" the rebooting does _not_ happen. Restarting the jive process (w/o watchdog) also doesn't show reboots. Assumption: Something in the jive process is taking a bit too long and that triggers the watchdog. Next step: Richard suggests debugging Task.lua to see what takes too long. Version used: r5015
Retried with r5037 (and after a factory reset) and now it doesn't happen anymore. So maybe false alarm or some watchdog timeout just on the edge?
ok, i do see this with r5066. this looks like the problem: 003919:41422 INFO (SlimProto.lua:460) - connect to fab4.squeezenetwork.com
The slimproto code was not using async DNS lookups. The crasher is fixed in r5067. However it seems that Fab4 can't connect to anything after the AP is turned on. Forcing the interface up using 'ifup wlan0 -f' seems to fix it. I can't spot anything obviously wrong, and the network problem is recreatable even if the jive process is not started in rcS (testing using ping fab4.squeezenetwork.com). The DNS lookups work, it just seems to be routing to other networks. Felix, any ideas?
Hmm, what about the arp tables on the router/AP side? How would they get filled in again properly? Can we do something in that regard?
I have no idea what's going wrong here. Same behavior can be seen when using wired instead of wireless. (192.168.144.1 is my router.) Regardless whether I boot with ethernet connected or connect it after booting up the following looks the same and sane to me. # cat /etc/resolv.conf nameserver 192.168.144.1 # # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.144.0 * 255.255.255.0 U 0 0 0 eth0 default 192.168.144.1 0.0.0.0 UG 0 0 0 eth0 # # ifconfig eth0 Link encap:Ethernet HWaddr 00:04:20:22:01:21 inet addr:192.168.144.79 Bcast:192.168.144.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:248 errors:0 dropped:0 overruns:0 frame:0 TX packets:313 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:45699 (44.6 KiB) TX bytes:42914 (41.9 KiB) Base address:0x8000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wlan0 Link encap:Ethernet HWaddr 00:04:20:22:01:21 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) # arp ? (192.168.144.1) at 00:1E:2A:55:F4:33 [ether] on eth0 ==== Also pinging the router at 192.168.144.1 works in both scenarios: # ping -c 4 192.168.144.1 PING 192.168.144.1 (192.168.144.1): 56 data bytes 64 bytes from 192.168.144.1: seq=0 ttl=64 time=0.501 ms 64 bytes from 192.168.144.1: seq=1 ttl=64 time=0.437 ms 64 bytes from 192.168.144.1: seq=2 ttl=64 time=0.367 ms 64 bytes from 192.168.144.1: seq=3 ttl=64 time=0.433 ms ==== However if I add ethernet after booting I cannot reach anything behind my router, although DNS resolves just fine. Doing a 'ifdown eth0', 'ifup eth0' fixes the issue.
Richard: I think I found the issue in the networking scripts causing the reconnection fail. I fixed the zcip_action script. Now I can ping the world again. However Fab4 doesn't reconnect all the time, but the same is true if I just reboot a fully setup Fab4 (wired or wireless), sometimes it reconnects and sometimes it does not. Any ideas?
With r5156 and a wireless connection I was dealing with this last night. Even if you have an established connection, power down the WAP, then power up the WAP again, it fails to reconnect. If you go into the "enter a new password" screen after a wireless failure it doesn't give you a keyboard - just goes directly to attempting to connect. Then after rebooting it fails to connect. I think the wireless password for the SSID is being erased.
Felix, I've tested with 5217 and don't see any problems. Can you still recreate this?
Restested with r5247 and I don't see the issue anymore either.