Bugzilla – Bug 2081
SB2 fails to respond to remote when Slimserver busy
Last modified: 2009-09-08 09:31:47 UTC
I guess this is an 'enhancement' rather than a bug, but to me it's the single biggest ball-ache with the SB/Slimserver interface. If SS is busy on a task and the user presses the key on the remote, the command is queued at the SB/SS interface. However, nothing happens; so the user has no idea if the remote key press was seen by the SB ... or not - so he presses again ... and again ... Eventually SS becomes free and the 85 remote commands now queued up are executed in sequence. SS goes busy again (sometimes for a l-o-n-g time if the random sequence of commands happens to do something like; generate a random playlist of the whole library), and the SB display shuttles up, down and left and right at great speed until the final command is executed. The problem is more severe if the user is running multiple SBs, or using a low powered server. It's a 'really bad thing' (TM) from the users ptv. The average user thinks the device is broken and IME some hit the reset on the server rather than wait. As a sometime RT programmer in a previous life, I understand that the problem stems from the single threaded nature of SS, and that steps are being taken to address this in the future, but I suggest that in the meantime there needs to be some kind of handshake, or a 'ready' signal from SS to SB to prevent commands being queued whilst SS is busy. This could be coupled with the Slim Devices version of the Linux watch or the MS egg timer on the SB display, so that the user knows he has to be patient.
couldn't agree more. we need to switch to a multithreaded model for this to happen.
Err - no, that's my point! It's too important to wait for the multithread approach, so I'm proposing that the SB2 firmware / SS interface be modified to handshake the commands. Yes it's a bodge (and multithread is the real solution), but it's totaly unacceptable (IMHO) for a consumer product to have a user interface that queues commands for several minutes without even notifying the user! I'm suggesting that when SB2 sends a command to SS, it goes 'busy' and (after a second) displays a busy icon; when SS completes the command it sends 'ready' to SB2, and SB2 clears 'busy' and icon.
patches welcome
I can't patch it because: 1) it probably requires a firmware change and the firmware is not open source. 2) the learning curve for me to figure out a) PERL and b) the slimserver code would render the timescale too long. If you'd written it in C I might have more of a chance (but not much more). I'm more of a 'management' type you see ...
Patrick's experience is exactly mine. Two clear problems: 1) Certain Browse Music menu items cause Slimserver to become unresponsive for large periods of time (e.g. Browse Music > Genres > A Gentre With a Lot of Albums > All Albums) 2) The user cannot escape from the wait once SlimServer "goes off on one" and is not informed as to what is happening. I can't see how this problem could be worse when using the remote (and Windows?), but to me it seems as if it is. It does almost entirely ruin the SlimServer experience for me.
Simon - what version of SlimServer are you running?
Dan, at the time I was running this: 813 Flac Albums Windows XP SlimServer Version: 6.1.2 - 4429 - Windows XP - EN - cp1252 Squeezebox 2
In general, the response to button presses is indeed somewhat "random". Even when the server is not particularly busy, the response runs from immediate to a good second or so, and it does create a disconcerting feeling (that funny anticipation of "did it get the button press or not this time?"). Some button presses seem to get lost entirely, but of course those could be due to other factors. And of course SoftSqueeze does the same thing. I agree that this has an effect on the experience of using an otherwise very very cool product. I still love the SB, and no one could pry it from my hands, but I'd like to see this fixed somehow.
Best to fix this in the server. Worst case, throw away queued button presses more than a few seconds old
this sounds a bit like a dupe of another bug that triode has been working against (bug 2957)
The problem is that key response is totally dependent on server performance and server load. Hence when the server is under-powered or busy (or both), the user is left not knowing whether the button-press is being action, or has not been seen. It is surprisingly easy, to unwittingly select a playlist of 'ALL' songs (over 10,000 in my case), where upon the server will go away and build the playlist for several minutes, without any opportunity to interrupt or cancel the opperation! Even for someone who understands what is actually happening this is quite frustrating .... ;-) .... most users will think the server or player has crashed. There are a couple of approaches that could be taken to make the 'uneven' responsiveness more acceptable to the user: 1) Some kind of 'Busy' or 'Hourglass' indication on the display, to show that slimserver has accepted the key press and is doing something about it, and 2) The opportunity to cancel erronously entered commands and/orcommands that are taking a long time to implement.
I think Patrick is highlighting a slightly different issue. The recent change I made to 6.5 was to attempt to ensure that the server processed key presses rapidly if it was able to do so as part of the main execution loop. Previous code could occasionally take some time to process a set of queued key presses as in some corner cases would only look at the IR queue relatively infrequently even if it was idle. Please try the latest 6.5 to see if this improves responsiveness for normal key presses for you. I'd also be interested in posts of the IR response time graph from the health page in 6.5 if you think it is taking a long time. Patrick's issue sounds like he is most concerned about the times when the server blocks processing something complex and hence is not visiting the main loop. This could occur due to complex database query, page build etc or if something is loaded via the http api in Slim::Formats::RemoteStream. In this case we can't do much with the current architecture as the main loop is not visited during this time. I think the problem with the idea of an hourglass indication is that its difficult to predict when it is necessary to dislay it. With a single threaded design, we would probably need to display the hourglass for every action just in case the code which is about to be executed takes some time. I don't really see this as viable. Once we have a multithreaded architecture this would definately be something to add. As Dean mentions, the one thing we could add is something to discard IR presses which have been stored for more than X seconds. I would propose measuring this in the same way as the health IR response time graph is calculated. Could you provide feedback on how long you think X should be - ideally based on what you are currently seeing from the health graph. [Patrick - this does need to be implemented in the server as the 'slim' client architecture means the player doesn't even know when a key has been pressed and released - this is all done in the server, the player just sends ir codes to the server to decode. Hence it can't really be added to the client without a major change.]
My prediction is that this is going to cause increasing user frustration as the SB3 hits a less techie/tolerant user base! I think the hourglass would have to be implemented in the client, because in multi-SB environments, SS processing a request for one client blocks ANY response to ALL other clients. Thus, when sending an IR request the slim client would have to acknowledge (to the user) that it was waiting for a server response, and the server would clear the icon when the request had been processed. From an ergonomic ptv, how long do you think the average, man-in-the-street user will wait after pressing the remote before something happens on the display or to the audio? One second maybe? After that, he/she will press it again, and then quite possibly some other buttons too - before walking away in frustration! Even rebooting the client doesn't seem to dump pending actions at the server, so the only way to kill a request that's taking a long time is to kill the server - which is a real pita! It would seem reasonable to expect that SS running on the mimimum SD recommended HW spec, would be able to respond in all situations and circumstances within (say) one second to any client request, and so even with multi-threading, SS will need to send a "building list ..." , "searching db ..." , "building random playlist ..." type response to the client - for those requests that can't be quickly responded to. On my (fairly under-powered) server, with a single SB2 client and ~10K songs, it takes 4-5 seconds to respond to a 'Browse Artists' request, and it can take 10 minutes to build a playlist of 'All' songs! So it would help if playing 'All' songs or even 'All' Various Artists' songs was less easy to do by mistake ;-) and also if the 'Browse' lists could be cached in some way to avoid re-building the list each time, since they can't have changed if the db hasn't been updated.
OK - lets try to deal with the case of pending IR key codes if they are stored too long - I would suggest we modify the server to drop IR key codes that have been stored for more than 2 seconds? Does this seem reasonable? Patrick - I totally agree with you from a user point of view, but for the moment we need to do stuff which fits within the current slimserver archiecture. [Well that's all people like me can do!] Its useful to understand that the client doesn't know the difference between a new key press and a repeat of the existing key press with the current architecture. Hence whatever we do needs to be done in the server. At present I believe the easy win is to avoid prolonged storage of IR key presses.
Adrian - I completely understand. It's SD's product, and really they are the only people who can take the lead with the server/client comms structure. Ditching 'old' IR requests should definitely help a little and is (presumably) relatively easy to do. What about also adding some display messages to those ops that are obviously server intensive? Starting Random play and anything which involves building a long playlist spring to mind: I guess some dB operations too. AFICS, these will be necessary even with a multi-threaded arch. Maybe, even ditch all pending IR requests for that particular client on their completion too? I appreciate that the wide variety of server hardware and setups makes it difficult to judge where the performance bottlenecks will occur, but from converstations I've had off-list, I don't reckon I'm the only one who has problems with the man/machine interface! Thanks for looking in to it.
Latest 6.5 (svn 6283) should discard IR presses if queued for more than 5 seconds. Let me know if this is an improvement.
Does anyone have feedback on Triode's changes in 6.5?
I guess Triode's change probably helps in the extreme cases where the IR request queue builds up, but it doesn't really impact on the fundamental issue of the bug: that server busy-ness prevents feedback to the user. The real killer is if you accidentally play 'All Songs' (which on my system takes about 10 mins to build the playlist) - because the remote is non-responsive, there is no way to cancel the command, and no indication of what's happening. I'm now such a seasoned SS user, that I know what menu options will take a while to respond ;-), but when I demonstrate SBs to unfamiliar people, I always find myself appologising for the lack of 'snappiness' of the interface.
I've been thinking more on this, and it strikes me that Triode's change might be better if ditched anything q'd for more than about 1 second (rather than 5). Most users will expect to see a response within a second of pressing a key, and if they don't they'll probably just press it again! If it's an easy patch to a 6.5 nightly, I'm happy to try it/test it here offline and report back.
Patrick - assuming you are happy to edit the code: Edit Slim/Hardware/IR.pm - look for the line saying "my $maxIRQTime = 5.0;" and change the 5.0 to 1.0. NB if you were using windows (which I don't think you are) then you would need to run the server manually using ActivePerl rather than using the slim.exe file.
Thanks, I will try it over the next few days and let you know how I get on (I'm on Linux)
Having played with this for a few days, I'm not convinced it's ditching IR commands older than 1 second. I've changed 'my $maxIRQTime' to 1, but I still seem (sometimes) to get q'd commands happening after more than a second. Which debug would be useful for me to run?
What does the IR graph in the Health plugin show? This should log the time between receiving the IR code and executing the function. I would hope you notice a difference on this graph when you change $maxIRQTime.
The IR graph in the Health plugin shows:- Control Connection This graph shows the number of messages queued up to send to the player over the control connection. A measurement is taken every time a new message is sent to the player. Values above 1-2 indicate potential network congestion or that the player has become disconnected. < 1 : 118 :100% ################################################## < 2 : 0 : 0% < 5 : 0 : 0% < 10 : 0 : 0% < 20 : 0 : 0% >=20 : 0 : 0% max : 0.000000 min : 0.000000 avg : 0.000000 as you'd expect. However, if I 'Browse Artists' it typically takes several seconds to build the list, and multiple key presses (after more than 1 second of waiting) are still being q'd and actioned once the 'Browse Artist' command returns. So my experience doesn't quite tally with the numbers for some reason.
Further debug from Patrick has shown that the addition in svn 6283 does not discard old key presses on all occasions. This is because it adds a timestamp when the server receives the key presses and checks this against current time to see how old the key code is. Hence queueing prior to the key press being processed by the server is not noticed. Will look at again...
Patrick - I've put an updated version of the test code I sent you in svn r7456. This currently has $maxIRQTime set to 3 seconds. I would like feedback from people on setting this any lower.
Thanks Adrian. I've being trying it set to 1 second - on a 300MHz/382MB PII server and an Athlon 2GHz/512MB. Anything greater than 1 second still gives me 'repeats' on the slow server (when trying to act like a normal user), but still seems fine on the fast server. I think, providing 1 sec doesn't cause anybody else a problem, it's preferable for slow servers. I think it will help those on NAS boxes with reasonable size libraries.
I think 1 second is best if the round trip time from client to server is low, but I worry if we set it too low for users accessing their server over slow internet links as additional delay due to network queuing could cause IR commands to be lost. One for Dean - do we want to optimise for slow servers or remote internet connections over slow links? I can't think of a way of automatically doing this so may want a server pref?
Dean? Thoughts? Should I be the owner of this? Adrian has done most of the work/investigation.
First, let's get 6.5 with the splitscanner and new DB going and see how much of a problem this remains to be. Then we can tune the age threshold for discarding the IR.
split-scanner has been merged. I've been hearing reports that interactivity while scanning is a lot better. Can anyone confirm? Thanks.
*** Bug 2630 has been marked as a duplicate of this bug. ***
Is there any chance this could get into 6.3 too? IIUC, it also fixes a Network Heath bug too ...
Patrick - the split-scanner changes are the crux of 6.5. They can't be backported to 6.3. Sorry.
Dan, I understand that. I wasn't clear, but I actually meant Triode's catches to ditch 'old' key-presses. Could that code get into 6.3? I think it would be worthwhile if it's do-able.