Bugzilla – Bug 277
Resume from pause broken
Last modified: 2009-09-08 09:24:03 UTC
For me resuming from a pause of over about 25 seconds works for a further 4-5 seconds then stutters, jumps back to a couple of seconds before pause started then stops (audio & counter). Track can be restarted from the beginning by pressing play or rewind. Wires-only DB, firmware 21, server 5.1.4 with >192kbit stereo mp3, not tried any other file format (no interest). Server 2GHz Celery, 1GB RAM, XP Pro SP1. Server and SB cabled to the same router (Buffalo G54 Wireless).
*** Bug 278 has been marked as a duplicate of this bug. ***
Occured under following combinations (others not tested): Version 5.1.1 with firmware 8, Version 5.1.2 (nightly builds 26.3., 2.4.) with firmware 10 and 20, 5.1.3 with firmware 21, 5.1.5 with firmware 21 WinXP + SP1, Athlon XP 2600+ 512MB; Epia 6000 + 512MB mp3 / 56kbit Stereo / Sampling @32kHz Pause for 30 secs is ok, but pausing for 45 secs not mp3 / 32kbit Stereo / 32kHz: 55 secs is ok, 60 secs not, buffer empty within 58 secs. see comment for bug #32
Confirmed by Rich Siegel, this is still happening in firmware 23.
*** Bug 335 has been marked as a duplicate of this bug. ***
I did some debugging on this, since I was able to replicate it occasionally. One potential problem is that the slim TCP stack is not replying to window probes with ACKs, causing the server TCP stack to think that the slim device is dead: Last packets flowing: 22:09:17.755113 squeezebox.34470 > slimserver.9000: . ack 998741 win 1416 22:09:22.753548 slimserver.9000 > squeezebox.34470: . 998741:1000157(1416) ack 301 win 58400 (DF) 22:09:22.859484 squeezebox.34470 > slimserver.9000: . ack 1000157 win 0 Window probes start: 22:09:27.856150 slimserver.9000 > squeezebox.34470: . 1000157:1000158(1) ack 301 win 58400 (DF) 22:09:32.858744 slimserver.9000 > squeezebox.34470: . 1000157:1000158(1) ack 301 win 58400 (DF) 22:09:37.861319 slimserver.9000 > squeezebox.34470: . 1000157:1000158(1) ack 301 win 58400 (DF) 22:09:42.863885 slimserver.9000 > squeezebox.34470: . 1000157:1000158(1) ack 301 win 58400 (DF) note no ACKs from the squeezebox, so the dead timers on the slimserver's tcp stack are running. Finally, the server gives up - first packet is a window probe, second is a RST: 22:17:16.894214 slimserver.9000 > squeezebox.34470: . 1000157:1000158(1) ack 301 win 58400 (DF) 22:17:55.513794 slimserver.9000 > squeezebox.34470: R 1000158:1000158(0) ack 301 win 58400 (DF) 22:17:55.617780 squeezebox.34470 > slimserver.9000: . ack 1000157 win 0 22:17:55.617831 slimserver.9000 > squeezebox.34470: R 98469381:98469381(0) win 0 When I unpause, the squeezebox now re-establishes the connection; however, when I was able to replicate this problem the squeezebox did not re-establish the connection. I wasn't able to determine why the squeezebox did not re-establish the connection when the resume from pause failed; however, if the squeezebox responded to window probes with the proper ACK, the connection wouldn't need to die in the first place, possibly avoiding this problem altogether.
BTW, I see that I forgot to mention the environment under which I was replicating the pause problem. FreeBSD with slimserver from CVS, and I have tcpdump's of it happening with firmware 23; with firmware 24 I see the reset occur but then the slim successfully re-establishes the connection. I can't be sure that there was any true difference in the behavior because of the firmware version, since I am not able to reliably replicate this, just occasionally. Neither 23 nor 24 acks window probes, though, so they are definitely misbehaving.
I still see this with firmware 26. The squeezebox window goes to zero, the slimserver starts sending window probes and getting no ACKs so resets the connection, and resuming from pause fails after the connection is reset. Player Firmware Version: 26 The IP address for this player is: 135.197.17.55:27173 I hit pause around here, so the squeeze window goes to zero: 07:32:50.453345 slimserver.9000 > squeezebox.27397: . 1194927:1196387(1460) ack 301 win 58400 (DF) 07:32:50.475230 squeezebox.27397 > slimserver.9000: . ack 1196387 win 566 07:32:50.495370 squeezebox.27397 > slimserver.9000: . ack 1196387 win 932 07:32:55.474482 slimserver.9000 > squeezebox.27397: . 1196387:1197319(932) ack 301 win 58400 (DF) 07:32:55.498611 squeezebox.27397 > slimserver.9000: . ack 1197319 win 0 server starts sending probes and getting no responses: 07:33:00.494364 slimserver.9000 > squeezebox.27397: . 1197319:1197320(1) ack 301 win 58400 (DF) 07:33:05.494267 slimserver.9000 > squeezebox.27397: . 1197319:1197320(1) ack 301 win 58400 (DF) ... 07:41:12.644951 slimserver.9000 > squeezebox.27397: . 1197319:1197320(1) ack 301 win 58400 (DF) 07:41:21.804769 slimserver.9000 > squeezebox.27397: . 1197319:1197320(1) ack 301 win 58400 (DF) 07:41:30.964609 slimserver.9000 > squeezebox.27397: R 1197320:1197320(0) ack 301 win 58400 (DF) 07:41:30.986705 squeezebox.27397 > slimserver.9000: . ack 1197319 win 0 07:41:30.986750 slimserver.9000 > squeezebox.27397: R 2464605313:2464605313(0) win 0 server gives up; unpause results in buffer emptying and no more data flowing. I'd suggest looking at the other side of this bug too - namely, why doesn't the squeeze reconnect to the stream URL after the connection is reset? If you fix either half, then you should be able to ship without the pause bug. (Of course, it'd be ideal to fix both bugs, in case there are other reasons that the connection gets closed unexpectedly!)
Bill, thanks very much for the packet traces. Very helpful. I confirmed the behavior on Windows XP, and found a bug in our TCP stack in how it handles out-of-window data (in this case one byte of data past the right edge). Fixed it to do the correct calculations, and now it reaches the proper (existing) case which handles it as a window probe (sends ack immediately). Unpause is working fine on XP for me now.
Sean, was this fixed in firmware 27?
bug still present for me on WinXP in 5.2 with firmware 27.
Sorry, the fix went into version 28, which will be available in tonight's nightly, it didn't make it into 5.2.
Well done - the ACKs come now, and resuming from a long pause works peachy. 10:42:18.830547 10.42.77.100.9000 > 10.42.77.80.37635: . 1002271:1002272(1) ack 301 win 58400 (DF) 10:42:18.856549 10.42.77.80.37635 > 10.42.77.100.9000: . ack 1002271 win 0 10:42:23.830462 10.42.77.100.9000 > 10.42.77.80.37635: . 1002271:1002272(1) ack 301 win 58400 (DF) 10:42:23.856299 10.42.77.80.37635 > 10.42.77.100.9000: . ack 1002271 win 0 10:42:28.830378 10.42.77.100.9000 > 10.42.77.80.37635: . 1002271:1002272(1) ack 301 win 58400 (DF) 10:42:28.856041 10.42.77.80.37635 > 10.42.77.100.9000: . ack 1002271 win 0 10:42:33.830284 10.42.77.100.9000 > 10.42.77.80.37635: . 1002271:1002272(1) ack 301 win 58400 (DF) 10:42:33.861561 10.42.77.80.37635 > 10.42.77.100.9000: . ack 1002271 win 0 I do think that there's still a second bug here (that the squeezebox doesn't establish a new connection to the server when the old one went away), and actually this fix makes the other one harder to invoke but certain network problems (e.g., having the wireless network go away for just the right amount of time) would invoke it. However, it's a low enough probability that it's worth just keeping in mind when diagnosing network problems but may not be worth trying to track down without a specific trigger mechanism.
Indications are on XP with 5.2 and firmware 28 the pause is fixed. Volume up/down seems a little more erratic but otherwise ok.
resume from paused fixed. Bill: please file another bug for the reconnect issue you described.
This bug appears to have been fixed in the latest release! If you are still experiencing this problem, feel free to reopen the bug with your new comments and we'll have another look. Make sure to include the version number of the software you are seeing the error with.