Bugzilla – Bug 16617
Controller (SqueezePlay) may never reconnect fully to MySB.com after connection loss
Last modified: 2011-05-18 13:37:36 UTC
If SqueezeNetwork responds with a reconnect=none advice to a Comet request, then the Comet module will go into state DISCONNECTED but the SlimServer module, which manages connections, will think it is in state CONNECTING. Not even a new user request, such as "Try again" from the UI, will force a reconnection attempt.
Created attachment 7011 [details] Proposed patch
Fix in special firmware to try out
Note, this bug is specifically about the blue-icon problem when connected (connecting) to mysqueezebox.com and when the player network-level connection remains ok. That is, it is possible to ping the player and to connect to it over ssh. Bug 16575 covers the case where the network connection no longer works.
Created attachment 7013 [details] Log where SB Radio gets into this state
Some observations from the log just attached: 1. SlimProto reconnected at 17:53: Oct 18 17:53:37 squeezeplay: INFO net.slimproto - SlimProto.lua:754 connection error: closed, reconnecting in 2.17 seconds Oct 18 17:53:39 squeezeplay: INFO net.slimproto - SlimProto.lua:577 connect to baby.squeezenetwork.com (79.125.21.137) Oct 18 17:53:39 squeezeplay: INFO net.slimproto - SlimProto.lua:135 Send HELO: reconnect-bit=true bytesReceived(H,L)=0,75755591 2. 18:01:51 was probably the next time I initiated any user action (Pause). I do not recall observing the state of the wireless icon. At that time we had: Oct 18 18:01:51 squeezeplay: INFO applet.SlimMenus - SlimMenusApplet.lua:441 _menuSink(1) nil menuDirective: add isCurrentServer:true Oct 18 18:01:51 squeezeplay: INFO applet.SlimMenus - SlimMenusApplet.lua:713 hiding any 'connecting to server' popup after menu response from current server, SlimServer {mysqueezebox.com} Oct 18 18:01:51 squeezeplay: INFO applet.ChooseMusicSource - ChooseMusicSourceApplet.lua:543 Hiding popup, exists?: nil Oct 18 18:02:01 squeezeplay: INFO applet.ScreenSavers - ScreenSaversApplet.lua:292 activating Clock screensaver Oct 18 18:02:51 squeezeplay: WARN net.thread - NetworkThread.lua:146 network thread timeout for Task(SocketHttp {mysqueezebox.com_Request}(R)) Oct 18 18:02:53 squeezeplay: WARN net.thread - NetworkThread.lua:146 network thread timeout for Task(SocketHttp {baby.squeezenetwork.com}(R)) Oct 18 18:04:58 squeezeplay: INFO applet.ScreenSavers - ScreenSaversApplet.lua:292 activating Clock screensaver I think that this suggests that the Comet connection has been lost. 3. Over the next 2h 45m various message which are probably consistent with not being connected are logged. 4. At 20:51 we get: Oct 18 20:51:22 squeezeplay: WARN net.comet - Comet.lua:876 Comet {mysqueezebox.com}: _response, /meta/reconnect id=nil failed: invalid clientId That is the first "invalid clientId" message. 5. At 21:04 SlimProto reconnects again: Oct 18 21:04:36 squeezeplay: INFO net.slimproto - SlimProto.lua:754 connection error: closed, reconnecting in 4.743 seconds Oct 18 21:04:41 squeezeplay: INFO net.slimproto - SlimProto.lua:577 connect to baby.squeezenetwork.com (79.125.21.137) Oct 18 21:04:41 squeezeplay: INFO net.slimproto - SlimProto.lua:135 Send HELO: reconnect-bit=0 bytesReceived(H,L)=0,86593044 6. There is little activity for several hours until another "/meta/reconnect id=nil failed: invalid clientId", and then another a few hours later. Then there is: Oct 19 20:27:03 squeezeplay: WARN net.comet - Comet.lua:876 Comet {mysqueezebox.com}: _response, /meta/reconnect id=nil failed: invalid clientId Oct 19 20:27:06 squeezeplay: WARN net.comet - Comet.lua:876 Comet {mysqueezebox.com}: _response, /slim/subscribe id=nil failed: invalid clientId Oct 19 20:28:06 squeezeplay: WARN net.thread - NetworkThread.lua:146 network thread timeout for Task(SocketHttp {mysqueezebox.com_Request}(R)) 7. From then on, the following recurs with 60s between each sequence: Oct 19 20:28:06 squeezeplay: WARN net.thread - NetworkThread.lua:146 network thread timeout for Task(SocketHttp {mysqueezebox.com_Request}(R)) Oct 19 20:28:13 squeezeplay: WARN net.thread - NetworkThread.lua:146 network thread timeout for Task(SocketHttp {mysqueezebox.com_Chunked}(R)) Oct 19 20:28:14 squeezeplay: INFO applet.SlimMenus - SlimMenusApplet.lua:441 _menuSink(50) SlimServer {oz} menuDirective: nil isCurrentServer:nil Oct 19 20:28:14 squeezeplay: INFO applet.SlimMenus - SlimMenusApplet.lua:556 Eliminated myMusic node from server, since now handled locally Oct 19 20:28:14 squeezeplay: WARN net.comet - Comet.lua:876 Comet {mysqueezebox.com}: _response, /slim/subscribe id=nil failed: invalid clientId 8. Finally, we get: Oct 19 21:14:31 squeezeplay: WARN net.comet - Comet.lua:876 Comet {mysqueezebox.com}: _response, nil id=nil failed: POST body too large (max=51200) Oct 19 21:14:31 squeezeplay: WARN net.comet - Comet.lua:956 Comet {mysqueezebox.com}: _response, unknown error: POST body too large (max=51200) I think that this results in a reconnect=none message from SN and so no more retry attempts occur.
Related to bug 15919
*** Bug 15919 has been marked as a duplicate of this bug. ***
In addition to the earlier error of the possibility of getting into the advice reconnect=none state, with no way out, several others have been found, including one that will provoke that state. 1. If the packet containing the /meta/connect and /meta/subscribe requests is lost (presumably due to a network error or server process failure)then the following /meta/reconnect request does not include the /meta/subscribe. As a consequence, none of the /slim/subscribe subscriptions are honoured, and the connection repeatedly times out due to no network traffic. 2.1 If the packet containing the /slim/subscribe subscription requests that follows on after a successful handshake, or their acknowledgements, are lost (presumably due to a server process failure) and a new handshake is then required, then the set of active subscriptions are added as new /slim/subscribe requests to the set of outstanding requests (which already contains the old /slim/subscribe requests) to be sent when the new connection is established. That is, the /slim/subscribe requests are duplicated. 2.2. After 2.1, when the new connection is established, the clientId strings in the outstanding requests are replaced with the new clientId from the successful handshake. The regular expression used to do this is insufficient to handle the syntax of clientId strings used by SqueezeNetwork (SN, mysqueezebox.com) and so, in this case (there is no such problem with Squeezebox Server), the requests are sent with invalid clientIds. This causes SN to reject the requests and force a new handshake. This leads to further duplicates /slim/subscribe requests in the next connection attempt. 2.3 The build-up of duplicate /slim/subscribe requests caused by 2.1 and 2.2 results in a request that is too large for the server (the limit is 5 KiB) which in turn results in it sending a error contain advice for reconnect=none, the originally reported problem.
Created attachment 7018 [details] Proposed patch Fix incorrect use of self.state (should be oldstate) in setEndpoint(). Not actually provoking a bug at present because the only call to setEndpoint() is bracketed by disconnect()/connect() calls. Server may send reconnection advice of reconnect=none under certain error conditions. This causes Comet.lua to enter UNCONNECTED state and not attempt a retry, but SlimServer.lua will assume that it is CONNECTING state (retrying). In order for it to be possible to escape from this situation, in the case that whatever provoked the reconnect=none was actually a transient condition, we allow a new request to force a reconnection attempt. When re-enabling subscriptions that were previously active as part of a new handshake, also remove them from the set of sent requests that have not yet been acknowledged, and that will get resent when the new connection is established; otherwise duplicate subscription requests can be sent. The RE used to match clientId strings in the response channels of subscription requests in the queue was insufficient. It needs also to match 'X' as well as hexadecimal characters. Include /meta/subscribe along with /meta/reconnect in case this reconnect is happening after a successful handshake but failed /meta/connect.
== Auto-comment from SVN commit #9464 to the network repo by agrundman == == http://svn.slimdevices.com/network?view=revision&revision=9464 == Bug 16617, handle the case where a /meta/reconnect comes in without an existing /meta/subscribe having been setup
== Auto-comment from SVN commit #9470 to the network repo by agrundman == == http://svn.slimdevices.com/network?view=revision&revision=9470 == Bug 16617, workaround for bad /slim/subscribe packet issue
*** Bug 16645 has been marked as a duplicate of this bug. ***
== Auto-comment from SVN commit #9209 to the jive repo by ayoung == == http://svn.slimdevices.com/jive?view=revision&revision=9209 == Fixed bug 16617: Controller (SqueezePlay) may never reconnect fully to MySB.com after connection loss Fix incorrect use of self.state (should be oldstate) in setEndpoint(). Not actually provoking a bug at present because the only call to setEndpoint() is bracketed by disconnect()/connect() calls. Server may send reconnection advice of reconnect=none under certain error conditions. This causes Comet.lua to enter UNCONNECTED state and not attempt a retry, but SlimServer.lua will assume that it is CONNECTING state (retrying). In order for it to be possible to escape from this situation, in the case that whatever provoked the reconnect=none was actually a transient condition, we allow a new request to force a reconnection attempt. When re-enabling subscriptions that were previously active as part of a new handshake, also remove them from the set of sent requests that have not yet been acknowledged, and that will get resent when the new connection is established; otherwise duplicate subscription requests can be sent. The RE used to match clientId strings in the response channels of subscription requests in the queue was insufficient. It needs also to match 'X' as well as hexadecimal characters. Include /meta/subscribe along with /meta/reconnect in case this reconnect is happening after a successful handshake but failed /meta/connect.
Update?
Not seeing messages in server log 7.6.0 32398. Going to close.