Bugzilla – Bug 11455
DHCP takes a long time to reconnect (was Firmware update does not recover if WiFi is switched off)
Last modified: 2010-04-08 17:25:47 UTC
(using r4875) After successfully connecting Fab4 to my WiFi, I started the firmware update, but... midway through the update I switched off the WiFi in my router. Approx. 60 seconds later I got an error message about this. That's fine. But after switching the WiFi back on, pressing "Try Again..." continues to fail. I would think after the WiFi was returned that Fab4 would reconnect. Is this not true? ...Update... Several minutes later I tried again and it succeeded. I had left the Fab4 in that state-- waiting to continue. It must have reconnected during that time. But it really was several several minutes. Slow to reconnect under those conditions?
I have tested this here, and within 30 seconds after turning the router back on the firmware update can be restarted. It probably takes at least 15-20 seconds for the router to boot and fab4 to reconnect, longer probably if the router also supports dsl.
(using r5156) I am still seeing this as broken. I switched OFF the WiFi and the fw update failed. No problem. I switched ON the WiFi and waited until I knew it was live and working again. (other devices successfully connected.) Then I pressed "Try again" and it still failed with, "There was a problem installing this update. Please try again..." Pressing "Try again" just jumps immediately back to this error. I tried this for about a minute. ...Several minutes later... Again, I left it sitting for a few minutes and tried again-- it reconnected and re-downloaded the fw update. I guess I just don't understand why Fab4 is taking several minutes to renegotiate the connection. ---Update--- I tested this on a different Fab4, to be sure this wasn't a hardware thing, and I saw the same behavior. This time I timed it. First, I began the firmware update, and then: 02:42:00 - Turned OFF the WiFi 02:43:00 - Fab4 correctly detected a problem (time-out seems long though: 60s?) 02:43:30 - Turned ON the WiFi 02:44:00 - Other SBs successfully connected to WiFi, 02:44:30 - I began pressing "Try Again" on Fab4 02:46:00 - Firmware update restarted on Fab4 and went smoothly from there. Granted it does recover, but after 2 full minutes. If I were a customer, I'd have given up much earlier.
Dan, I need logs to understand this. Have you tried with a different router?
I think this bug is caused because of the increasing interval between DHCP requests. So if fab4 is not connected (AP turned off, ethernet disconnected) for several minutes, it can take a while for the DHCP to resolve itself. For wireless i wonder if the wpa_cli action script is still needed to send a SIGUSR1 to udhcpc to restart things quicker. For wired udhcpc probably needs to monitor the eth0 link status and restart when the link is attached. Felix what do you think? Any easier solutions? We should probably at least fix wireless for MP.
Busybox' udhcpc defaults to a fix 20s retry timeout. I.e. the interval is not increasing. This means it can take a maximum of 20s after the network is available until Fab4 will have an ip address again. The retry timeout can be modified with the '-A' parameter. Do you suggest to decrease the retry timeout?
I am not sure reducing the retry timeout makes a lot of sense here. There must be something else that it takes about 2 minutes until it recovers. If I reduce the retry timeout from the standard 30s to let's say 10s that only brings the above scenario down to about 1m 40s which isn't really better, isn't it.
So far I found this: If I power down the router during fw update the first failed DNS takes 20s all the following ones fail instantaneously. When the network is back it takes about 2 minutes until the DNS is successful again. Richard's comment taken from campfire: ah, yes. the dns timeout in libc is long. ok, so i don't think there is an easy solution to that then.
Another thing I saw related to this: Using SN, I took it out of wireless range. After returning to wireless range it couldn't reconnect to SN for a few minutes with SlimProto.lua:532 dns lookup failed for fab4.squeezenetwork.com I tested again watching the Diagnostics screen. Here is the order of what I see: a) first wireless comes back, then b) dhcp completes with SN dns failed, then c) SC ping succeeds, then after a few minutes d) SN DNS succeeds The thinking is that when dns lookup occurs with no connection, the dns connect doesn't timeout until 120 seconds. Suggestions from campfire regarding the DNS timeout. 1) Only do a toip when the network is up 2) Hold onto a "backup cache", such that the last good dns value si saved and if there is a dns failure, use the "last good" value.
this is an administrative shuffle on priority fields to help make better judgment on the top end of the priority list. P4->P5, P3->P4, and P2->P3.
Administrative move of 7.5 bugs. All P2, P3, P4 being downgraded one level. Will then split P1s.
Bug meeting team has a consensus that this bug may be related to a number of issues.
== Auto-comment from SVN commit #8319 to the jive repo by felix == == https://svn.slimdevices.com/jive?view=revision&revision=8319 == Bug: 11455 Description: - Reduced DNS timeout to 10 seconds (was 2 minutes). My tests on Jive, Baby and Touch showed that a 10 seconds timeout is still enough to allow the message pipe to empty even if a lot of DNS resolve requests are queued up while the network is down. - Fixed slow memory leak in DNS resolver thread (occured while the network was down)
Update hours worked.
Change is in 7.4 r8319 and 7.5 r8321.
This bug has been marked fixed in a released version of Squeezebox Server or the accompanying firmware or mysqueezebox.com release. If you are still seeing this issue, please let us know!