Bug 11826 - Fails to connect - stuck
: Fails to connect - stuck
Status: RESOLVED FIXED
Product: SB Touch
Classification: Unclassified
Component: Setup
: unspecified
: PC Windows XP
: -- critical (vote)
: MP
Assigned To: Wadzinski Tom
: TestCase
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-16 10:51 UTC by James Richardson
Modified: 2009-09-08 09:31 UTC (History)
6 users (show)

See Also:
Category: ---


Attachments
serial capture (371.02 KB, text/plain)
2009-04-16 10:51 UTC, James Richardson
Details
wireshark packet capture (16.45 KB, text/plain)
2009-04-16 13:33 UTC, Ross Levine
Details
Error Log (77.56 KB, application/octet-stream)
2009-04-17 10:05 UTC, James Richardson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Richardson 2009-04-16 10:51:48 UTC
Created attachment 5133 [details]
serial capture

Fab4 will get stuck at 'connecting to mysqueezebox.com' pre/post firmware upgrade, and not continue past that state.

While testing poor internet connection states, I was able to get the Fab4 to fail with a constant spinny on 'connecting to mysqueezebox.com'

See attached log for detail

to repo do the following

1) setup AP with open connection (no encryption)
2) factory reset fab4 with r5316
3) during the setup process, at various stages, disconnect the WLAN (internet) cable from the AP
4) continue with fab4 setup, get 'problem connecting' page
5) re-connect WLAN to AP
6) continue with fab4 setup
NOTE: it is at this state that the following may happen
 a) setup continues to next step with no errors
 b) setup halts, goes back to 'problem connecting' page
 c) fab4 gets into bad state, with 'connecting to..' spinny forever

IF (a) continue from 3
IF (b) soft reboot and continue
IF (c) factory reset start over
Comment 1 Ross Levine 2009-04-16 13:33:50 UTC
Created attachment 5134 [details]
wireshark packet capture

Packets 64 - 75 occur while fab4 shows update failed immediately after I select try again.
Comment 2 Wadzinski Tom 2009-04-16 17:17:24 UTC
I have confidence that the "forever spinny" will not occur for real users at pld using the factory firmware. On a restart after successfully completing network setup, the inSetup hook to tell SN to upgrade is not set, since the setup process doesn't keep track of the FW upgrade status. 

Currently SN doesn't look for a FW version difference to decide whether to force an upgrade, so the code assumes that the FW is good and move on to the SN registration step. On pld, SN will use version checking and thus will force an upgrade an the spinny will be replaced by the FW upgrade page.

In the current version , where SN says no forced upgrade neede (because inSetup is false), at that point there is a bug, in that the spinny should go away and be relaced by the SN registration. However, by that time the user will have gotten a FW upgrade (with this issue fixed), so I don't see this bug a showstopper bug for MP, though it would be nice to prove out that the fw upgrade based on SN fw comparison will work, which is the untested path I'm relying on to work.
Comment 3 Ben Klaas 2009-04-16 17:57:52 UTC
Just had a discussion with Tom on the phone about this one, and I am in agreement with his conclusions.

what's happening here is this--

the "forever spinny" has this timer associated with it which should, but in some cases does not, push on another window and cause the popup (and its timer) to leave the screen

      popup:addTimer(1000, function()
                -- wait until we know if the player is linked
                if _squeezenetworkConnected(self, squeezenetwork) then
                        step8(self, squeezenetwork)
                end

                timeout = timeout + 1

                if timeout > 30 then
                        _squeezenetworkFailed(self, squeezenetwork)
                end
        end)

the key part of that timer is this:
 if _squeezenetworkConnected(self, squeezenetwork) then
                        step8(self, squeezenetwork)
                end

when this bug is hit, in fact the Fab4 *is* connected to SN, but step8() fails to push anything on the screen.


function step8(self, squeezenetwork)
        local url, force = squeezenetwork:getUpgradeUrl()
        local pin = squeezenetwork:getPin()

        log:info("squeezenetwork pin=", pin, " url=", url)

        if force then
                log:info("firmware upgrade from SN")
                appletManager:callService("firmwareUpgrade", squeezenetwork)
        elseif pin then
                self:_registerRequest(squeezenetwork)
        else
                self:_resetRequest(squeezenetwork)
        end
end

that is because when step8() is hit in this situation, the variable force is not true, and thus _registerRequest() is called.

_registerRequest appears to be returning without doing anything because of that initial clause in the method

function _registerRequest(self, squeezenetwork)
        if self.registerRequest then
                return
        end
        self.registerRequest = true

        log:info("registration on SN")
        appletManager:callService("squeezeNetworkRequest", { 'register', 0, 100, 'service:SN' })

        self.locked = true -- don't free applet
        jnt:subscribe(self)
end

Working back up the ladder, the popup spinny never goes away, and after 30 seconds _squeezenetworkFailed() is called, even though we are in fact connected to SN.


so the bug fix here will be that step8() should always push something on to the screen.
Comment 4 Ben Klaas 2009-04-16 18:02:11 UTC
The other thing Tom and I discussed on the phone was that won't be fixed tonight and it would benefit from some Richard insight.

assigning to Richard for discussion tomorrow.

Though we're arguing that it's unlikely this is going to be a problem for MP firmware, this has been deemed an MPQ bug and is holding up a qualified build for the assembly line. Targetting as such.
Comment 5 Blackketter Dean 2009-04-16 18:16:02 UTC
I spoke to chris and randy today and we're going to go ahead and release the MPQ firmware without this fix so that SZ can prepare for the build.  There will need to be another MP version that actually goes to customers and the MPQ devices may need to be upgraded.
Comment 6 Felix Mueller 2009-04-17 00:10:59 UTC
I am wondering if part of the failed reconnection issue is related to the long DNS timeout in libc mentioned by Richard?

Also see comment #7 of bug 11455.
Comment 7 Wadzinski Tom 2009-04-17 04:09:06 UTC
The is talk of a case where this happen even before the first reboot.  All of the logs that I've seen point to the issue being what I described, and I am unable to replicate the no-reboot scenerio (though I have several SC on my network, which may be interfering) so I ask that this "no reboot case" be shown again on a factory reset unit running the candidate firmware.


I noticed in one of the logs on campfire (5316.spinny2.txt) that the fw being used appears to actually be a post MP build (r5325) because the "currentStep" code on line 390 in that log was put in r5325...

Felix suggested DNS might be a factor here, but looking at the code in _squeezenetworkFailed(), I believe we'd see a log message to that effect.  I think what's really happening in that method is that SN is really connected, so this section is call

		-- have we connected while looking up the DNS?
		if _squeezenetworkConnected(self, squeezenetwork) then
			return
		end

which then never forces another window over the popup
Comment 8 James Richardson 2009-04-17 10:05:36 UTC
Created attachment 5141 [details]
Error Log

Steps taken in attached log.

1) Insert SD card with /log directory
2) Factory reset FAB4
3) Walk through setup proccedure
4) Attach to wireless AP - with or with out encryption
5) At the Firmware Upgrade page, remove WLAN (internet) from the Access Point
6) Click 'firmware upgrade' on Fab4
7) At failure notice, reconnect WLAN to AP
8) Click continue or back
Comment 9 James Richardson 2009-04-17 10:38:35 UTC
Could bug 11766 be related to this issue?
Comment 10 Ben Klaas 2009-04-17 10:43:30 UTC
IMO, unrelated to bug 11766
Comment 11 Chris Owens 2009-04-20 10:08:40 UTC
Tom reports that he can't reproduce this.  James, please talk to Tom to understand the issue.

Steven notes he has seen a similar issue, but will open a new bug for it.
Comment 12 James Richardson 2009-04-20 13:34:52 UTC
OK, comment 8, if I wait (for my AP) for ~2 min, the AP will reconnect and I can click continue.  The firmware upgrade will then go through.  Maybe we need to improve the help text message or increase the time out between re-try.

The other thing to note, once on the firmware upgrade page, I can not go back to 'select an AP'
Comment 13 Spies Steven 2009-04-20 15:56:28 UTC
Well I am no longer able to reproduce the similar issue I was seeing before.  Perhaps an update on test.sn fixed it?  If I do see it again I will report about here.
Comment 14 Wadzinski Tom 2009-04-21 08:58:17 UTC
Dean, reassigning this to you to make a call about what's left that we care about for MP.

This bug ended up being a multi-bug, with the main issues coming out of it being:

1) On restart, infinite spinny due to inSetup not being passed to setup. The theory is that this won't happen in production because SN will be doing a FW version check. We decided on the bug call yesterday that Andy would add that check in now so we are truly testing that pathway. This is being tracked in bug 11861, so I no longer consider it part of this bug.

2) Can't get past FW upgrade even without reboot when internet disconnected then reconnected. I found that when the internet reconnects, you can still reconnect after waiting long enough. James confirmed this as well (or at least was not able to reproduce any other scenario)

So, it seems the only remaining question on this bug that James had was:

Is it an MP bug that the FW ugrade process has no means to offer going back to network selection?  Presumably it is a rare scenario where the user would need to do this.  The user would need to factory reset at this point (or escape setup and go into settings), if they wanted to change to a different AP (or switch to Wired -not sure about that, actually).

I went over this with Richard, and he suggested if Dean wants something different to happen here to have Dean discuss this with Richard.
Comment 15 Wadzinski Tom 2009-04-21 14:22:22 UTC
Good news (sort of). Turns out that Ben had already made it so back brings you back to network selection page, but due to an apparent branch snafu, those changes aren't in the fab4-MP branch. The branch issue is being tracked in bug 11874.
Comment 16 Blackketter Dean 2009-05-11 09:27:28 UTC
So, tom, is this all resolved?
Comment 17 Wadzinski Tom 2009-05-27 06:01:00 UTC
Resolved a while back, but never cleared.