Bug 9393 - Buffer underun doesn't recover itself & SB doesn't respond anymore
: Buffer underun doesn't recover itself & SB doesn't respond anymore
Status: NEW
Product: Logitech Media Server
Classification: Unclassified
Component: Streaming From SlimServer
: 7.3.0
: PC Linux (other)
: -- minor with 1 vote (vote)
: ---
Assigned To: Unassigned bug - please assign me!
: new_streaming
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-04 10:20 UTC by Markus Schiegl
Modified: 2011-11-06 23:25 UTC (History)
4 users (show)

See Also:
Category: Bug


Attachments
server.log - player.source=debug / player.streaming=debug (38.52 KB, text/plain)
2008-09-04 10:48 UTC, Markus Schiegl
Details
server.log - player.source=debug / player.streaming=debug (209.05 KB, text/plain)
2008-09-17 15:47 UTC, Markus Schiegl
Details
server.log - player.source/streaming=debug / player.streaming=debug (23.24 KB, application/octet-stream)
2008-09-18 13:14 UTC, Markus Schiegl
Details
server.log SC 7.2 - player.source+streaming=debug / network.slimproto=debug (23.24 KB, application/octet-stream)
2008-09-18 13:16 UTC, Markus Schiegl
Details
server.log SC 7.3 - player.source+streaming=debug / network.slimproto=debug (162.15 KB, application/octet-stream)
2008-09-18 13:16 UTC, Markus Schiegl
Details
server.log SC 7.3 - player.source+streaming=debug / network.slimproto=info (255.62 KB, text/plain)
2008-10-01 10:48 UTC, Markus Schiegl
Details
server.log SC 7.3r23734 - player.source+streaming=debug / network.slimproto=info (140.56 KB, text/plain)
2008-10-29 12:45 UTC, Markus Schiegl
Details
server.log SC 7.4trunk r26939 - player.source+streaming=debug / network.slimproto=info (179.52 KB, application/octet-stream)
2009-06-09 11:41 UTC, Markus Schiegl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Schiegl 2008-09-04 10:20:01 UTC
output SB3:
"rebuffering failed"

press&hold left key recovers the SB (it reboots)
Comment 1 Markus Schiegl 2008-09-04 10:36:28 UTC
*grr* pressed the wrong key and entered a rather short first message...

the full version:

I'm running a SB3 through a 192kBit/s link. If something else is stealing bandwidth it occurs that the player's internal buffer doesn't receive any more data. The player then tries to rebuffer but fails if there's no bandwidth available for a longer time.

SB3 Output (Now Playing screensaver display buffer fullness): 
"rebuffering failed"

Being in this state my SB3 doesn't respond to IR codes anymore (even if bandwidth is available again). I have to press&hold the left key to recover the player (a automatic reset is performed)

Don't know if this is related to transcoding and/or the new streaming code.

kind regards,
Markus
Comment 2 Markus Schiegl 2008-09-04 10:48:32 UTC
Created attachment 3945 [details]
server.log - player.source=debug / player.streaming=debug

the first 10 lines show normal traffic from a playing track. a few lines later i've skipped to the next track and immediately started to steal as much bandwidth as possible (e.g. browsing through my music library).
After the buffer has reached about 8% the SB3 starts playing but not for very long. Soon the buffer is down at 0% and the Player tries to "rebuffer" but fails...
Comment 3 Alan Young 2008-09-17 09:55:22 UTC
Looking at your logfile (thank you) it looks like this is another manifestation of the problems with transcoding (bug 9473). You would probably have seen the SC process soaking up a whole CPU in a spin loop on select. I think that I have just checked in sufficient change to fix this. Please try again and let us know how you get on.
Comment 4 Markus Schiegl 2008-09-17 15:47:59 UTC
Created attachment 4022 [details]
 server.log - player.source=debug / player.streaming=debug

thanks for working on this case! I'm sorry to report that i still have this problem although it seems more "robust" now. The SB3 skipped/repeated a track two times until it finally ended in "rebuffering failed" and required a reset.

chronicals of the logfile
1) music was playing (buffer of the SB3: about 20%)
2) started logging with player.source=debug/player.streaming=debug
3) waited for about 10 seconds
4) started to use bandwidth for other things
5) buffer dropped down to 0%
6) SB3 tried to rebuffer, but failed
7) skipped the track (without any manual intervention)
...repeat 6+7 once more
8) SB3 out-of-order (rebuffering failed)

I'm running SC 7.3r23212. I didn't notice any unusual CPU usage in this case, but maybe your fix will help my other bug 9482 (will have a look tomorrow)

thanks!
Markus
Comment 5 Alan Young 2008-09-18 02:48:33 UTC
Hmm, did this ever really work (with SC 7.2)? I get the impression that the link is so congested that there is no chance of streaming really working, but I guess it should fail better ...

Anyway, I could do with some more logging to diagnose just what is going on. Can you also enable network.protocol.slimproto=info (not debug) when you next reproduce the problem please?

Also, I cannot work out what might have triggered the following log entry at the end of you last log:

[08-09-18 00:20:03.0079] Slim::Player::StreamingController::_Stop (518) ...
Comment 6 Markus Schiegl 2008-09-18 13:14:06 UTC
Created attachment 4026 [details]
 server.log - player.source/streaming=debug / player.streaming=debug
Comment 7 Markus Schiegl 2008-09-18 13:16:02 UTC
Created attachment 4027 [details]
server.log SC 7.2 - player.source+streaming=debug / network.slimproto=debug
Comment 8 Markus Schiegl 2008-09-18 13:16:52 UTC
Created attachment 4028 [details]
server.log SC 7.3 - player.source+streaming=debug / network.slimproto=debug
Comment 9 Markus Schiegl 2008-09-18 13:41:41 UTC
(In reply to comment #5)
> Hmm, did this ever really work (with SC 7.2)? I get the impression that the
> link is so congested that there is no chance of streaming really working, but I
> guess it should fail better ...

i have compared 7.2r23218 and 7.3r23218, logs attached, with slimproto=info, sorry for using with the wrong mime type - next time i wont use auto-detect :-(

chronicals of my 7.3 testrun
1) music was playing (buffer of the SB3: about 10%)
2) logging started 
3) waited for about 10 seconds
4) started to use bandwidth for other things
5) buffer dropped down to 0%
6) SB3 tried to rebuffer, went up to about 10%...
7) and then failed (rebuffering failed) - reset of my SB3 was required

chronicals of my 7.2 testrun
1) music was playing (buffer of the SB3: about 10%)
2) logging started
3) waited for about 10 seconds
4) started to use bandwidth for other things
5) buffer dropped down to 0%
6) SB3 tried to rebuffer
7) and then failed (rebuffering failed) BUT
8) restarted the track, after some time it plays again
9) ...not for very long until the buffer went down to 0% again
10) repeat 6-9...

summary: SC 7.2 didn't left the SB3 in state which needed a manual reset. As soon as bandwidth was available again the recovery was done automatically.

> 
> Anyway, I could do with some more logging to diagnose just what is going on.
> Can you also enable network.protocol.slimproto=info (not debug) when you next
> reproduce the problem please?

done.

> 
> Also, I cannot work out what might have triggered the following log entry at
> the end of you last log:
> 
> [08-09-18 00:20:03.0079] Slim::Player::StreamingController::_Stop (518) ...
> 

i've no idea either. it happened again in this logfile...
Comment 10 Alan Young 2008-09-19 08:36:05 UTC
Thanks very much for all the info. I cannot really see what is happening that is different. I believe you  that it is different, but I cannot see why. It is clearly the case that the network connection is very distressed and that the (slimproto) connection gets dropped (apparently by the player). In the 7.2 case it immediately reconnects but in the 7.3 case it does not.

Can you describe in more detail the 'hung' state that the SB gets into. Are you able to use press-and-hold-left on the IR-remote to get into the setup menu, or can you press-hold-power to force a reconnect, or do you have to power-cycle?

With the SC 7.2 log, you would need to set network.protocol.slimproto=debug to really see the full interaction (I changed the log-levels around with 7.3) but I am not sure that it would really help much. The player firmware should not lock up, whatever.
Comment 11 Markus Schiegl 2008-09-19 11:10:34 UTC
(In reply to comment #10)

Thanks you for your ongoing support! I really appreciate it.

> Can you describe in more detail the 'hung' state that the SB gets into. Are you
> able to use press-and-hold-left on the IR-remote to get into the setup menu, or
> can you press-hold-power to force a reconnect, or do you have to power-cycle?

The SB3 just sits there with "rebuffering failed" in the upper line of text and the second line displays the current track. Brightness of the display has changed to "idle-level". Being in this state i cannot control the unit anymore using the remote or SC Webinterface.

press-and-hold-left a few seconds, changes the brighness level to "poweron/active" and again a few seconds later the SB3 reboots (logitech logo, squeezebox logo, setup menu, ...).
I never got to the setup menu which otherwise works very reliable. I'm away from my SB3 this weekend so i can't test if press-hold-power works too.
Comment 12 Andy Grundman 2008-09-24 10:33:53 UTC
I will try and reproduce this.
Comment 13 Markus Schiegl 2008-09-24 14:46:06 UTC
With those modifications to the streaming engine done today (09/24/08) the handling of a buffer underun has improved quite a lot! (decreased the severity to minor now)

what's different to before:
- i've not seen SB showing the "rebuffer x%" message anymore
- SB doesn't lock up anymore
- if the congestion is really high, SB may loose the connection to my SC (display darkens) for a few seconds

Performing a stress test while playing a transcoded flac-track either
- plays the track with a lot of stuttering (with long pauses sometimes)
- eventually skips the track and moves to the next one, buffers the usual 8% and then starts playback (until the buffer's flushed again)

thanks for the good work so far. I'll do some more tests the next days.
Comment 14 Markus Schiegl 2008-09-26 12:44:42 UTC
after a few tests and some days of normal operation i haven't run into this problem again. i'll leave it to @slimdevices.com to close the bug in case you'd still like to perform some tests yourself.

regards,
Markus

btw. since 09/24 some "anomalies(*)" i had with my boom disappeared, too :-)

*snooze mode didn't reliably unpause a transcoded bitrate limited stream.
Comment 15 Alan Young 2008-10-01 06:24:16 UTC
Change 23346 checked in yesterday should provide a more-comprehensive fix to the issue about knowing the stream-bitrate and thereby using the appropriate value for the rebuffering threshold.
Comment 16 Markus Schiegl 2008-10-01 10:48:58 UTC
Created attachment 4088 [details]
server.log SC 7.3 - player.source+streaming=debug / network.slimproto=info

After updating to 7.3r23371 the behaviour from comment #4 has returned again. (i've done two test-runs so far - with idendical results)

difference:
7) resume track from the current position as long as data is available/buffer ist filled

p.s. my "anomaly" is back again, too...(experienced this morning)
Comment 17 Markus Schiegl 2008-10-01 12:00:30 UTC
just in case you ask: what lousy network connection does this guy have?

my SC is connected to a DSL line with max 192kbit/s upload speed, all those clients (including this SB3 and my boom, transcoded to 128kbit/s) are connected through a VPN tunnel. Browsing SC and playing music in parallel is too much so i try to avoid it whenever possible - although it happens.

Unfortunately my carrier doesn't support any higher (upload) speeds in this area :-(
Comment 18 Alan Young 2008-10-02 03:04:48 UTC
Andy added a change to limit the expected buffer size when bitrate limiting might be in effect. Actually this change was probably in error as Prefs::maxRate returns a value kb/s but the code treated it as b/s (that is, forgot to multiply by 1000).

This change seemed to help you but would have had the effect of unpausing streaming, after an output-buffer underrun, after only a few bytes have been received by the player, and not the intended 5s-worth of data. Somehow this seems to mask the underlying problem.

In change 23346, I put in a more-comprehensive, and hopefully correct, calculation of the stream-bitrate in use and this is now used in the rebuffering calculation. Your log confirms that the correct rebuffering threshold is being used (80000 bytes = 5s at 128kb/s). Your problem has returned.

So I still think that there is an underlying problem in the player firmware which causes it to lock up under these conditions. Andy is looking at this.
Comment 19 Andy Grundman 2008-10-29 08:36:51 UTC
Markus: Please test the latest 7.3, it contains many transcoding changes.
Comment 20 Markus Schiegl 2008-10-29 12:45:57 UTC
Created attachment 4181 [details]
 server.log SC 7.3r23734 - player.source+streaming=debug / network.slimproto=info

Unfortunately the problem didn't go. I've attached an updated server.log, result: SB3 hangs with "rebuffering failed". press&hold left and/or press&hold power resets the box. player mac address: 00:04:20:06:73:61.

thanks for not forgetting this issue :-)
Markus
Comment 21 Markus Schiegl 2008-10-30 11:01:30 UTC
I've tested the latest version (r23742) and must try really, really hard (it buffers, plays a few seconds, rebuffers, and so on) to enter the feared "rebuffering failed" state and even then it didn't hang. I'll continue to do some testing...(i hope it's not just a coincidence)

Has anything changed between r23734 and r23742 which would explain this?

I know some "maybe related" bugs (#9517, #9565) have been fixed on the firmware side, but does this affect transcoding, too?

thanks,
Markus
Comment 22 Alan Young 2008-10-31 01:23:16 UTC
> I know some "maybe related" bugs (#9517, #9565) have been fixed on the firmware
> side, but does this affect transcoding, too?

No, I don't think that those should be relevant to transcoding to MP3.
Comment 23 Markus Schiegl 2008-11-02 13:36:00 UTC
you're right, although i've have to work really hard to get this problem now. previously it was really easy: one, max two, buffer underruns and the player was down with "rebuffering failed", now it takes at least 5-10 rebuffering sessions to hang up the player...hmm.
Comment 24 Andy Grundman 2008-11-03 11:44:29 UTC
Could not reproduce this.  I didn't duplicate your transcoded files at 192k situation, because it's more difficult to test, but I did test with a radio stream:

Routed SB3 through a VM running Linux with 2 interfaces doing IP masq.
Hacked Slim::Player::Player line 1131 so it would give up rebuffering after 1 second.  This would force the server into the "Rebuffering failed" case.
Played a RadioIO stream using direct streaming.
Force a rebuffer by using tc on the Linux VM to reduce the quality of the network connection.  Specifically, I used this command: "tc qdisc add dev eth0 root handle 1:0 netem delay 300ms 100ms 50%" which kills the stream in a matter of seconds.

The player was able to go through multiple "rebuffering failed" cases without a problem.  The stream is auto-restarted when this happens.

So, if there is a problem, it may be something in the server code that only happens when transcoding.
Comment 25 Alan Young 2008-11-14 01:41:23 UTC
I guess we need to nail down whether this "player-lockup" is actually a player firmware problem, or an SC problem. When the player gets into this state, is it possible to use SC in general, for other players or via the Web-UI? What about using the Web-UI to try to control the errant player?
Comment 26 Markus Schiegl 2008-11-14 09:28:35 UTC
(In reply to comment #25)
> I guess we need to nail down whether this "player-lockup" is actually a player
> firmware problem, or an SC problem. When the player gets into this state, is it
> possible to use SC in general
yes you can

> for other players or via the Web-UI?
both, i can control other players and the Web-UI works just fine

> What about using the Web-UI to try to control the errant player?

after a little the player is removed from SC activ player's list, i.e. you can't control it anymore - neither by IR...
Comment 27 Alan Young 2009-06-09 02:07:43 UTC
Markus, is this still an issue for you?
Comment 28 Markus Schiegl 2009-06-09 11:41:31 UTC
Created attachment 5317 [details]
server.log SC 7.4trunk r26939 - player.source+streaming=debug / network.slimproto=info

(In reply to comment #27)
> Markus, is this still an issue for you?
yes it is. attached is an updated server.log.

use case:
- started to play music on the player "stuttgart", mac: 00:04:20:06:73:61
- waited until the buffer was filled a bit (about 5%)
- loaded squeezecenter, aeh squeezebox server and selected "full album list with covers" (player and server share the same network connection)
- bufffer dropped rather fast to 0% and the messages "rebuffering X%" appeared
- it went up to 92% until the SB3 hung and required a reset.

kind regards,
Markus
Comment 29 Charles Razzell 2009-07-28 01:49:10 UTC
I was having WiFi bandwidth problems caused by too many co-channel WiFi signals in the 2.4GHz ISM band.

Twice during one listening session the SBR locked up hard after a buffer under-run. The light on the front panel was still bright white, but the impacted SBR had dropped from the SC list of recognized players. Holding the button down for more than 6 seconds did not cause a reset to occur - I had to remove the power to get the player back on the network.

Watching a realtime network bandwidth graph showed that data stopped flowing to the SBR in the seconds before the hard lock-up occurred. I don't have any SC logs to show for it, but it sounds very much like the situation reported in this bug.

I have since switched to 5GHz 802.11n draft 2 connections using Apple Airport Express as the wireless client and an the SBRs are now wired via short lengths of CAT5e to the the Airport Express units in each location. No further problems, due to very high network bandwidth and lack of co-channel interference.
Comment 30 Charles Razzell 2009-07-28 02:05:58 UTC
I should have mentioned the lock-ups occurred with the following SC version.

Version: 7.3.4 - 27488 @ Sat Jul 18 03:01:58 PDT 2009
Hostname: shuttle
Server IP Address: 192.168.1.60
Server HTTP Port Number: 9000
Operating system: Windows XP - EN - cp1252
Platform Architecture: 586
Perl Version: 5.8.8 - MSWin32-x86-multi-thread
MySQL Version: 5.0.22-community-nt
Total Players Recognized: 2

And this receiver:

Player Model: receiver
Firmware: 62
Player IP Address: 192.168.1.68
Player MAC Address: 00:04:20:16:xx:yy (Full MAC address obfuscated)
Comment 31 Alan Young 2011-11-06 23:25:20 UTC
Unassigned bugs cannot have a priority.