Bug 11569 - Fab4 fails to connect to last known connection if AP is turned on after booting (same for ethernet)
: Fab4 fails to connect to last known connection if AP is turned on after booti...
Status: RESOLVED FIXED
Product: SB Touch
Classification: Unclassified
Component: OS
: unspecified
: PC Other
: -- normal (vote)
: MP
Assigned To: Felix Mueller
: TestCase
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-31 06:42 UTC by Felix Mueller
Modified: 2009-09-08 09:21 UTC (History)
2 users (show)

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Mueller 2009-03-31 06:42:48 UTC
How to reproduce:

- Setup Fab4 to use wireless
- Power down wireless router
- Reboot Fab4

Fab4 tries to reconnect, fails and eventually reboots repeatedly.

Observations:

If after booting up, I kill the jive process via "/etc/init.d/squeezeplay stopwdog" the rebooting does _not_ happen.

Restarting the jive process (w/o watchdog) also doesn't show reboots.

Assumption:

Something in the jive process is taking a bit too long and that triggers the watchdog.

Next step:
Richard suggests debugging Task.lua to see what takes too long.

Version used:
r5015
Comment 1 Felix Mueller 2009-03-31 07:38:09 UTC
Retried with r5037 (and after a factory reset) and now it doesn't happen anymore.

So maybe false alarm or some watchdog timeout just on the edge?
Comment 2 Richard Titmuss 2009-04-01 03:44:53 UTC
ok, i do see this with r5066. this looks like the problem:

003919:41422 INFO (SlimProto.lua:460) - connect to fab4.squeezenetwork.com
Comment 3 Richard Titmuss 2009-04-01 04:45:36 UTC
The slimproto code was not using async DNS lookups. The  crasher is fixed in r5067. 

However it seems that Fab4 can't connect to anything after the AP is turned on. Forcing the interface up using 'ifup wlan0 -f' seems to fix it. I can't spot anything obviously wrong, and the network problem is recreatable even if the jive process is not started in rcS (testing using ping fab4.squeezenetwork.com). The DNS lookups work, it just seems to be routing to other networks.

Felix, any ideas?
Comment 4 Felix Mueller 2009-04-01 07:56:33 UTC
Hmm, what about the arp tables on the router/AP side? How would they get filled in again properly? Can we do something in that regard?
Comment 5 Felix Mueller 2009-04-03 05:05:52 UTC
I have no idea what's going wrong here.

Same behavior can be seen when using wired instead of wireless. (192.168.144.1 is my router.)

Regardless whether I boot with ethernet connected or connect it after booting up the following looks the same and sane to me.

# cat /etc/resolv.conf
nameserver 192.168.144.1
#
# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.144.0   *               255.255.255.0   U     0      0        0 eth0
default         192.168.144.1   0.0.0.0         UG    0      0        0 eth0
#
# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:04:20:22:01:21
          inet addr:192.168.144.79  Bcast:192.168.144.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:248 errors:0 dropped:0 overruns:0 frame:0
          TX packets:313 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:45699 (44.6 KiB)  TX bytes:42914 (41.9 KiB)
          Base address:0x8000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr 00:04:20:22:01:21
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

# arp
? (192.168.144.1) at 00:1E:2A:55:F4:33 [ether]  on eth0

====

Also pinging the router at 192.168.144.1 works in both scenarios:

# ping -c 4 192.168.144.1
PING 192.168.144.1 (192.168.144.1): 56 data bytes
64 bytes from 192.168.144.1: seq=0 ttl=64 time=0.501 ms
64 bytes from 192.168.144.1: seq=1 ttl=64 time=0.437 ms
64 bytes from 192.168.144.1: seq=2 ttl=64 time=0.367 ms
64 bytes from 192.168.144.1: seq=3 ttl=64 time=0.433 ms

====

However if I add ethernet after booting I cannot reach anything behind my router, although DNS resolves just fine.

Doing a 'ifdown eth0', 'ifup eth0' fixes the issue.
Comment 6 Felix Mueller 2009-04-07 02:41:46 UTC
Richard: I think I found the issue in the networking scripts causing the reconnection fail. I fixed the zcip_action script. Now I can ping the world again.

However Fab4 doesn't reconnect all the time, but the same is true if I just reboot a fully setup Fab4 (wired or wireless), sometimes it reconnects and sometimes it does not.

Any ideas?
Comment 7 Jim McAtee 2009-04-07 13:19:26 UTC
With r5156 and a wireless connection I was dealing with this last night.  Even if you have an established connection, power down the WAP, then power up the WAP again, it fails to reconnect.

If you go into the "enter a new password" screen after a wireless failure it doesn't give you a keyboard - just goes directly to attempting to connect.  Then after rebooting it fails to connect.  I think the wireless password for the SSID is being erased.
Comment 8 Richard Titmuss 2009-04-08 02:44:58 UTC
Felix, I've tested with 5217 and don't see any problems. Can you still recreate this?
Comment 9 Felix Mueller 2009-04-08 12:56:24 UTC
Restested with r5247 and I don't see the issue anymore either.