Bugzilla – Bug 9393
Buffer underun doesn't recover itself & SB doesn't respond anymore
Last modified: 2011-11-06 23:25:20 UTC
output SB3: "rebuffering failed" press&hold left key recovers the SB (it reboots)
*grr* pressed the wrong key and entered a rather short first message... the full version: I'm running a SB3 through a 192kBit/s link. If something else is stealing bandwidth it occurs that the player's internal buffer doesn't receive any more data. The player then tries to rebuffer but fails if there's no bandwidth available for a longer time. SB3 Output (Now Playing screensaver display buffer fullness): "rebuffering failed" Being in this state my SB3 doesn't respond to IR codes anymore (even if bandwidth is available again). I have to press&hold the left key to recover the player (a automatic reset is performed) Don't know if this is related to transcoding and/or the new streaming code. kind regards, Markus
Created attachment 3945 [details] server.log - player.source=debug / player.streaming=debug the first 10 lines show normal traffic from a playing track. a few lines later i've skipped to the next track and immediately started to steal as much bandwidth as possible (e.g. browsing through my music library). After the buffer has reached about 8% the SB3 starts playing but not for very long. Soon the buffer is down at 0% and the Player tries to "rebuffer" but fails...
Looking at your logfile (thank you) it looks like this is another manifestation of the problems with transcoding (bug 9473). You would probably have seen the SC process soaking up a whole CPU in a spin loop on select. I think that I have just checked in sufficient change to fix this. Please try again and let us know how you get on.
Created attachment 4022 [details] server.log - player.source=debug / player.streaming=debug thanks for working on this case! I'm sorry to report that i still have this problem although it seems more "robust" now. The SB3 skipped/repeated a track two times until it finally ended in "rebuffering failed" and required a reset. chronicals of the logfile 1) music was playing (buffer of the SB3: about 20%) 2) started logging with player.source=debug/player.streaming=debug 3) waited for about 10 seconds 4) started to use bandwidth for other things 5) buffer dropped down to 0% 6) SB3 tried to rebuffer, but failed 7) skipped the track (without any manual intervention) ...repeat 6+7 once more 8) SB3 out-of-order (rebuffering failed) I'm running SC 7.3r23212. I didn't notice any unusual CPU usage in this case, but maybe your fix will help my other bug 9482 (will have a look tomorrow) thanks! Markus
Hmm, did this ever really work (with SC 7.2)? I get the impression that the link is so congested that there is no chance of streaming really working, but I guess it should fail better ... Anyway, I could do with some more logging to diagnose just what is going on. Can you also enable network.protocol.slimproto=info (not debug) when you next reproduce the problem please? Also, I cannot work out what might have triggered the following log entry at the end of you last log: [08-09-18 00:20:03.0079] Slim::Player::StreamingController::_Stop (518) ...
Created attachment 4026 [details] server.log - player.source/streaming=debug / player.streaming=debug
Created attachment 4027 [details] server.log SC 7.2 - player.source+streaming=debug / network.slimproto=debug
Created attachment 4028 [details] server.log SC 7.3 - player.source+streaming=debug / network.slimproto=debug
(In reply to comment #5) > Hmm, did this ever really work (with SC 7.2)? I get the impression that the > link is so congested that there is no chance of streaming really working, but I > guess it should fail better ... i have compared 7.2r23218 and 7.3r23218, logs attached, with slimproto=info, sorry for using with the wrong mime type - next time i wont use auto-detect :-( chronicals of my 7.3 testrun 1) music was playing (buffer of the SB3: about 10%) 2) logging started 3) waited for about 10 seconds 4) started to use bandwidth for other things 5) buffer dropped down to 0% 6) SB3 tried to rebuffer, went up to about 10%... 7) and then failed (rebuffering failed) - reset of my SB3 was required chronicals of my 7.2 testrun 1) music was playing (buffer of the SB3: about 10%) 2) logging started 3) waited for about 10 seconds 4) started to use bandwidth for other things 5) buffer dropped down to 0% 6) SB3 tried to rebuffer 7) and then failed (rebuffering failed) BUT 8) restarted the track, after some time it plays again 9) ...not for very long until the buffer went down to 0% again 10) repeat 6-9... summary: SC 7.2 didn't left the SB3 in state which needed a manual reset. As soon as bandwidth was available again the recovery was done automatically. > > Anyway, I could do with some more logging to diagnose just what is going on. > Can you also enable network.protocol.slimproto=info (not debug) when you next > reproduce the problem please? done. > > Also, I cannot work out what might have triggered the following log entry at > the end of you last log: > > [08-09-18 00:20:03.0079] Slim::Player::StreamingController::_Stop (518) ... > i've no idea either. it happened again in this logfile...
Thanks very much for all the info. I cannot really see what is happening that is different. I believe you that it is different, but I cannot see why. It is clearly the case that the network connection is very distressed and that the (slimproto) connection gets dropped (apparently by the player). In the 7.2 case it immediately reconnects but in the 7.3 case it does not. Can you describe in more detail the 'hung' state that the SB gets into. Are you able to use press-and-hold-left on the IR-remote to get into the setup menu, or can you press-hold-power to force a reconnect, or do you have to power-cycle? With the SC 7.2 log, you would need to set network.protocol.slimproto=debug to really see the full interaction (I changed the log-levels around with 7.3) but I am not sure that it would really help much. The player firmware should not lock up, whatever.
(In reply to comment #10) Thanks you for your ongoing support! I really appreciate it. > Can you describe in more detail the 'hung' state that the SB gets into. Are you > able to use press-and-hold-left on the IR-remote to get into the setup menu, or > can you press-hold-power to force a reconnect, or do you have to power-cycle? The SB3 just sits there with "rebuffering failed" in the upper line of text and the second line displays the current track. Brightness of the display has changed to "idle-level". Being in this state i cannot control the unit anymore using the remote or SC Webinterface. press-and-hold-left a few seconds, changes the brighness level to "poweron/active" and again a few seconds later the SB3 reboots (logitech logo, squeezebox logo, setup menu, ...). I never got to the setup menu which otherwise works very reliable. I'm away from my SB3 this weekend so i can't test if press-hold-power works too.
I will try and reproduce this.
With those modifications to the streaming engine done today (09/24/08) the handling of a buffer underun has improved quite a lot! (decreased the severity to minor now) what's different to before: - i've not seen SB showing the "rebuffer x%" message anymore - SB doesn't lock up anymore - if the congestion is really high, SB may loose the connection to my SC (display darkens) for a few seconds Performing a stress test while playing a transcoded flac-track either - plays the track with a lot of stuttering (with long pauses sometimes) - eventually skips the track and moves to the next one, buffers the usual 8% and then starts playback (until the buffer's flushed again) thanks for the good work so far. I'll do some more tests the next days.
after a few tests and some days of normal operation i haven't run into this problem again. i'll leave it to @slimdevices.com to close the bug in case you'd still like to perform some tests yourself. regards, Markus btw. since 09/24 some "anomalies(*)" i had with my boom disappeared, too :-) *snooze mode didn't reliably unpause a transcoded bitrate limited stream.
Change 23346 checked in yesterday should provide a more-comprehensive fix to the issue about knowing the stream-bitrate and thereby using the appropriate value for the rebuffering threshold.
Created attachment 4088 [details] server.log SC 7.3 - player.source+streaming=debug / network.slimproto=info After updating to 7.3r23371 the behaviour from comment #4 has returned again. (i've done two test-runs so far - with idendical results) difference: 7) resume track from the current position as long as data is available/buffer ist filled p.s. my "anomaly" is back again, too...(experienced this morning)
just in case you ask: what lousy network connection does this guy have? my SC is connected to a DSL line with max 192kbit/s upload speed, all those clients (including this SB3 and my boom, transcoded to 128kbit/s) are connected through a VPN tunnel. Browsing SC and playing music in parallel is too much so i try to avoid it whenever possible - although it happens. Unfortunately my carrier doesn't support any higher (upload) speeds in this area :-(
Andy added a change to limit the expected buffer size when bitrate limiting might be in effect. Actually this change was probably in error as Prefs::maxRate returns a value kb/s but the code treated it as b/s (that is, forgot to multiply by 1000). This change seemed to help you but would have had the effect of unpausing streaming, after an output-buffer underrun, after only a few bytes have been received by the player, and not the intended 5s-worth of data. Somehow this seems to mask the underlying problem. In change 23346, I put in a more-comprehensive, and hopefully correct, calculation of the stream-bitrate in use and this is now used in the rebuffering calculation. Your log confirms that the correct rebuffering threshold is being used (80000 bytes = 5s at 128kb/s). Your problem has returned. So I still think that there is an underlying problem in the player firmware which causes it to lock up under these conditions. Andy is looking at this.
Markus: Please test the latest 7.3, it contains many transcoding changes.
Created attachment 4181 [details] server.log SC 7.3r23734 - player.source+streaming=debug / network.slimproto=info Unfortunately the problem didn't go. I've attached an updated server.log, result: SB3 hangs with "rebuffering failed". press&hold left and/or press&hold power resets the box. player mac address: 00:04:20:06:73:61. thanks for not forgetting this issue :-) Markus
I've tested the latest version (r23742) and must try really, really hard (it buffers, plays a few seconds, rebuffers, and so on) to enter the feared "rebuffering failed" state and even then it didn't hang. I'll continue to do some testing...(i hope it's not just a coincidence) Has anything changed between r23734 and r23742 which would explain this? I know some "maybe related" bugs (#9517, #9565) have been fixed on the firmware side, but does this affect transcoding, too? thanks, Markus
> I know some "maybe related" bugs (#9517, #9565) have been fixed on the firmware > side, but does this affect transcoding, too? No, I don't think that those should be relevant to transcoding to MP3.
you're right, although i've have to work really hard to get this problem now. previously it was really easy: one, max two, buffer underruns and the player was down with "rebuffering failed", now it takes at least 5-10 rebuffering sessions to hang up the player...hmm.
Could not reproduce this. I didn't duplicate your transcoded files at 192k situation, because it's more difficult to test, but I did test with a radio stream: Routed SB3 through a VM running Linux with 2 interfaces doing IP masq. Hacked Slim::Player::Player line 1131 so it would give up rebuffering after 1 second. This would force the server into the "Rebuffering failed" case. Played a RadioIO stream using direct streaming. Force a rebuffer by using tc on the Linux VM to reduce the quality of the network connection. Specifically, I used this command: "tc qdisc add dev eth0 root handle 1:0 netem delay 300ms 100ms 50%" which kills the stream in a matter of seconds. The player was able to go through multiple "rebuffering failed" cases without a problem. The stream is auto-restarted when this happens. So, if there is a problem, it may be something in the server code that only happens when transcoding.
I guess we need to nail down whether this "player-lockup" is actually a player firmware problem, or an SC problem. When the player gets into this state, is it possible to use SC in general, for other players or via the Web-UI? What about using the Web-UI to try to control the errant player?
(In reply to comment #25) > I guess we need to nail down whether this "player-lockup" is actually a player > firmware problem, or an SC problem. When the player gets into this state, is it > possible to use SC in general yes you can > for other players or via the Web-UI? both, i can control other players and the Web-UI works just fine > What about using the Web-UI to try to control the errant player? after a little the player is removed from SC activ player's list, i.e. you can't control it anymore - neither by IR...
Markus, is this still an issue for you?
Created attachment 5317 [details] server.log SC 7.4trunk r26939 - player.source+streaming=debug / network.slimproto=info (In reply to comment #27) > Markus, is this still an issue for you? yes it is. attached is an updated server.log. use case: - started to play music on the player "stuttgart", mac: 00:04:20:06:73:61 - waited until the buffer was filled a bit (about 5%) - loaded squeezecenter, aeh squeezebox server and selected "full album list with covers" (player and server share the same network connection) - bufffer dropped rather fast to 0% and the messages "rebuffering X%" appeared - it went up to 92% until the SB3 hung and required a reset. kind regards, Markus
I was having WiFi bandwidth problems caused by too many co-channel WiFi signals in the 2.4GHz ISM band. Twice during one listening session the SBR locked up hard after a buffer under-run. The light on the front panel was still bright white, but the impacted SBR had dropped from the SC list of recognized players. Holding the button down for more than 6 seconds did not cause a reset to occur - I had to remove the power to get the player back on the network. Watching a realtime network bandwidth graph showed that data stopped flowing to the SBR in the seconds before the hard lock-up occurred. I don't have any SC logs to show for it, but it sounds very much like the situation reported in this bug. I have since switched to 5GHz 802.11n draft 2 connections using Apple Airport Express as the wireless client and an the SBRs are now wired via short lengths of CAT5e to the the Airport Express units in each location. No further problems, due to very high network bandwidth and lack of co-channel interference.
I should have mentioned the lock-ups occurred with the following SC version. Version: 7.3.4 - 27488 @ Sat Jul 18 03:01:58 PDT 2009 Hostname: shuttle Server IP Address: 192.168.1.60 Server HTTP Port Number: 9000 Operating system: Windows XP - EN - cp1252 Platform Architecture: 586 Perl Version: 5.8.8 - MSWin32-x86-multi-thread MySQL Version: 5.0.22-community-nt Total Players Recognized: 2 And this receiver: Player Model: receiver Firmware: 62 Player IP Address: 192.168.1.68 Player MAC Address: 00:04:20:16:xx:yy (Full MAC address obfuscated)
Unassigned bugs cannot have a priority.