Bug 16881 - Sync is less tolerant of network or server interruptions in 7.5.3 - r31792
: Sync is less tolerant of network or server interruptions in 7.5.3 - r31792
Status: ASSIGNED
Product: Logitech Media Server
Classification: Unclassified
Component: Sync
: 7.5.3
: PC Windows Home Server
: P2 major with 4 votes (vote)
: 7.5.x
Assigned To: Andy Grundman
:
Depends on: 16898
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-02 23:29 UTC by Kurt Poulsen
Modified: 2011-03-13 08:48 UTC (History)
6 users (show)

See Also:
Category: ---


Attachments
Logfile with sync-info attached (13.49 KB, text/plain)
2011-02-02 23:29 UTC, Kurt Poulsen
Details
Logfile with new sync-info (5.31 KB, text/plain)
2011-02-17 01:55 UTC, Kurt Poulsen
Details
Synclog from 2.18.2011 (24.68 KB, text/plain)
2011-02-18 01:24 UTC, Kurt Poulsen
Details
Logfile from 7.5.2 (599.63 KB, text/plain)
2011-02-20 09:07 UTC, Kurt Poulsen
Details
Logfile with sync-info and info from player.source and network.slimproto (5.45 MB, text/plain)
2011-02-24 10:49 UTC, Kurt Poulsen
Details
Possible patch (792 bytes, patch)
2011-02-25 02:32 UTC, Alan Young
Details | Diff
Log file before/after un-sync occured. (16.37 KB, text/plain)
2011-03-02 05:58 UTC, sle118
Details
Screenshot of plugins (10.54 KB, application/pdf)
2011-03-02 08:06 UTC, Kurt Poulsen
Details
Logfile from 2.3.2011 with some plugins disabled (5.60 MB, text/plain)
2011-03-02 13:30 UTC, Kurt Poulsen
Details
Log File with Slimproto and StreamingController active (449.88 KB, text/plain)
2011-03-03 05:15 UTC, sle118
Details
Active plugins (120.73 KB, image/png)
2011-03-03 05:26 UTC, sle118
Details
Network setup (23.88 KB, image/png)
2011-03-03 05:53 UTC, sle118
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kurt Poulsen 2011-02-02 23:29:42 UTC
Created attachment 7105 [details]
Logfile with sync-info attached

When I sync either my boom or my second receiver with my other receiver the
sync-slave stops after some minutes of synchronisation. I have attached a
logfile with information.
Comment 1 Kurt Poulsen 2011-02-02 23:35:06 UTC
Addtional info: All players is wired. This is the radioprogram I am syncing: http://www.dr.dk/netradio/Metafiler/asx/dr_p3_128.asx
Comment 2 Kurt Poulsen 2011-02-03 01:05:20 UTC
More info: Even though the slave-synced-player is not playing they are still sync'ed. If I pause either one of them then both are paused and so on.
Comment 3 Alan Young 2011-02-03 02:07:12 UTC
I'm not quite sure why one player goes silent but it appears that you have something odd going on with your server clock.

1. The large number of 'adjust jiffies epoch' log messages with quite large values (> 10ms) suggests that your server clock is not running at a smooth speed that reflects real time.

2. The set of adjust jiffies epoch messages at 08:23:33 of around +1296622298s (41.1 years) suggests that a large clock adjustment was made (setting system time properly from its default startup value of 01 Jan 1970?). This probably confused things.

Sync will only work if the server clock is stable and runs at real time speed.
Comment 4 Kurt Poulsen 2011-02-03 03:02:55 UTC
I can see in the eventlog of the server that for the past few days, the server has had some trouble with connecting to a time server.It says that it hasn't
had any contact with a timeserver for 86400 sec or 24 hours. Don't know why it suddenly has this problem. Could this also affect the alarmclock?
Comment 5 Alan Young 2011-02-03 04:12:44 UTC
yes
Comment 6 Kurt Poulsen 2011-02-03 04:23:41 UTC
Okay. Thanks. That explains why we didn't get up today in time :) I will close this bug.
Comment 7 Kurt Poulsen 2011-02-17 01:55:37 UTC
Created attachment 7161 [details]
Logfile with new sync-info
Comment 8 Kurt Poulsen 2011-02-17 01:57:22 UTC
This still happens. And I am sure that the servers time is correct.
Comment 9 Kurt Poulsen 2011-02-18 01:24:07 UTC
Created attachment 7162 [details]
Synclog from 2.18.2011

This is more of the same. The sync'ing is going fine for several hours and then suddenly the slave stops and is completely dead until it is turned off.
Comment 10 sle118 2011-02-18 04:41:04 UTC
I am having similar issues since the last update was installed.

This is causing major issues, since this is also affecting our rooster alarm clock.

We have 2 duet receivers synchronized permanently on a wired network. 

Yesterday, synchronization failed and sound dropped from one of the receivers while streaming some web content.  I removed synchronization, synchronized again and streamed/listened to music all day.  This morning, the alarm clock didn't go on.  Attempting to stream our usual web source also failed to produce any sound.  The fix was to remove synchronization and synchronize again.

Nothing special is showing in the log.
Comment 11 Kurt Poulsen 2011-02-20 09:07:06 UTC
Created attachment 7168 [details]
Logfile from 7.5.2

I downgraded to 7.5.2 because I was tired of the sync doesn't work in 7.5.3. As you can see there are still some logentries where one of the players says it is bailing, but in 7.5.2 the players stays synced and just continues as nothing happened. So something is wrong in 7.5.3!!
Comment 12 Kurt Poulsen 2011-02-23 05:17:21 UTC
Is there any chance that this will be looked at soon? My wimp app stopped working when I downgraded.
Comment 13 Alan Young 2011-02-24 08:43:05 UTC
Kurt, if you could add player.source=info and networking.slimproto=info to your logging then that might help see what is happening. Either your 7.5.2 or 7.5.3 installation would probably do.
Comment 14 Kurt Poulsen 2011-02-24 08:58:41 UTC
Okay. I have set up on my 7.5.2 installation I am using for now. For how long do you want it to run?
Comment 15 Kurt Poulsen 2011-02-24 10:49:14 UTC
Created attachment 7169 [details]
Logfile with sync-info and info from player.source and network.slimproto

Here is the new logfile as requested.
Comment 16 Alan Young 2011-02-25 02:17:41 UTC
Kurt, I am pretty sure that I see what is going on.

In the various logs we see evidence that, from time to time, there is some sort of short-term disruption in the network connectivity between your players and your server. The status packets that should arrive from each playing player every second dry up for a while, and then come all at once. It is clear from the data in the burst of packets that they are the ones that should have come every second.

The change in the player firmware in 7.5.3 makes the player disconnect after 10s of not receiving anything. This replaces a different keep-alive mechanism which, evidently, was more tolerant of network interruptions. Also, the other keep-alive mechanism in place in SbS has an effective timeout of 15-20s. After such a disconnect, whether initiated by the player or SbS, the player will try to reconnect automatically.

The code in SbS that handles reconnection of players while playing will not restart playback for a player in a sync-group if another player was still considered connected when it disconnected.

Obviously, if you could solve the problem that causes the network interruptions, then that would be ideal. I will see if we can do something about restarting reconnecting synced players.
Comment 17 Alan Young 2011-02-25 02:22:46 UTC
Andy, I think you should increase the firmware timeout to at least 20s for the SbS case. Probably best to use the same value as for the SN case.
Comment 18 Alan Young 2011-02-25 02:32:15 UTC
Created attachment 7170 [details]
Possible patch

This patch would improve matters some of the time. In the case that a single player in a sync-group disconnects and reconnects, then it will restart playback for the sync-group.

If an output underrun occurs while the network is blocked, and the player had disconnected already, then it is likely that the player will enter the paused state but the server will not be notified of the underrun and will never unpause the player. SP-based players do not suffer from this problem (from 7.6) because they resent status packets that were dropped.
Comment 19 Andy Grundman 2011-02-25 05:41:37 UTC
There is a 5s cushion for the firmware timeout.  SBS sends a stat packet every 5s, the firmware will disconnect after 10s of inactivity.  I think this is pretty fair, if your LAN is this slow, sync is probably the least of your problems.
Comment 20 sle118 2011-02-25 05:49:09 UTC
I just thought it would be a good idea for me to write some news about my install. 

The first week after the update to 7.5.3 was horrible, with multiple failures of the sync.  I am not sure if this is related or not, but this is around the time when I re-wired my entire network, adding a patch panel and patch cables. Reading the symptoms below, I tend to think this could be the root cause.

I did reboot the server since and the sync problems seem to be gone now. 

Regardless, the receivers should be more fault resistant to the network going down.  It may happen when a router is rebooted, etc.
Comment 21 Andy Grundman 2011-02-25 06:12:37 UTC
Actually they are *more* tolerant than before. Previously they would disconnect within 2 seconds (by using an extremely low TCP keep-alive threshold), now it's 10.
Comment 22 Kurt Poulsen 2011-02-25 11:39:13 UTC
I must admit that I don't have any idea how to solve this problem. Every player I have is connected with wire. After I installed 7.5.3 and the problems began I restarted the server, my switch and the router from my ISP, but that didn't help. So I have absolutely no clue on how to solve this.
The only difference there is between my 7.5.2 and 7.5.3 installation was I had a wimp account set up for 7.5.3. Obviously that can't cause any problems for 7.5.2 since it is not in use.
Comment 23 Andy Grundman 2011-02-25 11:42:39 UTC
Do you have any symptoms of a poor network? It should be pretty obvious on Boom.  How is the response time when you use the remote or front panel?
Comment 24 Kurt Poulsen 2011-02-25 13:27:02 UTC
Well the only thing I have noticed is that my old SB3 sometimes shows that it has lost connection with the server. It is only for a few seconds and then it is gone. But after I have switched back to 7.5.2 that too has gone.
My boom works like it should, although I did have a problem with the alarm clock on 7.5.3 one morning. But again that went away when I downgraded.
Comment 25 Kurt Poulsen 2011-02-25 13:30:03 UTC
And no. My Boom is as responsive as ever.
Comment 26 Kurt Poulsen 2011-02-25 14:11:50 UTC
How do you apply the patch you have attached?
Comment 27 sle118 2011-03-02 05:58:05 UTC
Created attachment 7179 [details]
Log file before/after un-sync occured.

2 players were synchronized when the slave failed.  Quickly after, I had to un-sync and re-sync to get the slave zone to play again.
Comment 28 sle118 2011-03-02 05:59:03 UTC
Comment on attachment 7179 [details]
Log file before/after un-sync occured.

Hi.

I need to come back on what I wrote in comment 20.  Synchronization failed multiple times since then.  Typically, our "slave" zone stops playing while master continues.  Both zones are receivers on firmware version 67. 

My network is wired and both zones are on 2 different gigabit switches.

Adding attachment "Un-sync log - sle118 11-03-02.txt"

I just activated additional log info as suggested in comment 13.

Thank you
Comment 29 Andy Grundman 2011-03-02 07:12:44 UTC
Can anyone with this problem please provide the following?

System specs (platform/cpu speed)
List of 3rd party plugins in use
Comment 30 Kurt Poulsen 2011-03-02 08:06:41 UTC
Created attachment 7180 [details]
Screenshot of plugins

My server is a Fujitsu Scaleo Intel Celeron CPU 420 1,6 GHz with 2 GB RAM. 
I have attached a screenshot of plugins in use. Hope it helps.
Comment 31 Andy Grundman 2011-03-02 08:12:15 UTC
Try disabling all 3rd party plugins, some of the ones you are using might be poorly written and could cause this problem.
Comment 32 Kurt Poulsen 2011-03-02 13:30:19 UTC
Created attachment 7182 [details]
Logfile from 2.3.2011 with some plugins disabled

I have diasbled Dynamic Playlists, Playlist Manager and SQL Playlists and this is the logfile after 2 hours of playing. Only one bailing logged
Comment 33 Kurt Poulsen 2011-03-02 13:33:28 UTC
sle118@hotmail.com: If you look at my plugins do you have the same plugins installed?
Comment 34 Andy Grundman 2011-03-02 13:51:58 UTC
So would you say one or more of those plugins was the problem?
Comment 35 Kurt Poulsen 2011-03-02 23:53:55 UTC
it is difficult to say. In the last logfile there is only one bailing-message, but the one there is, have two Checksync messages before it with 3 seconds between them. So there is still a problem.

I think it is a problem that SBS suddenly can't work with plugins that have been around for several years. I have had these installed for a long time without any problems before now.

I will try to test some more today.
Comment 36 SVN Bot 2011-03-03 00:02:29 UTC
 == Auto-comment from SVN commit #31979 to the slim repo by ayoung ==
 == http://svn.slimdevices.com/slim?view=revision&revision=31979 ==

bug 16881: Sync is less tolerant of network interruptions in 7.5.3 - r31792 
Ensure that reconnecting players, with the reconnect bit set, are made active (if powered on).
For synced players, this will mean that the stream is restarted when the player reconnects.
Comment 37 sle118 2011-03-03 04:47:30 UTC
(In reply to comment #33)
> sle118@hotmail.com: If you look at my plugins do you have the same plugins
> installed?

I will attach my own list.  I have very few third party plugins activated. Most are from Logitech, and none of the third party match yours.
Comment 38 sle118 2011-03-03 05:15:54 UTC
Created attachment 7184 [details]
Log File with Slimproto and StreamingController active

The slave receiver stopped playing at around 18:35 in the log.
Comment 39 sle118 2011-03-03 05:26:06 UTC
Created attachment 7185 [details]
Active plugins

Here is a list of active plugins. Few are third party, and no recent change was made to that list.   After taking that snapshot, I noticed that the Album Review Plugin has an update and I installed it. I will see if this makes any difference.
Comment 40 sle118 2011-03-03 05:53:54 UTC
Created attachment 7186 [details]
Network setup

(In reply to comment #29)
> Can anyone with this problem please provide the following?
> 
> System specs (platform/cpu speed)
> List of 3rd party plugins in use

List of plugins attached.  

3rd party are:
Biography (v2.9.4) 
iPeng (v0.5.4) 
Network Test (v1.0) 
Song Scanner (v2.0) 
Tablet Skin (v1.1) 
Windows: Prevent System Standby (v2.0) 

System Specs
Intel Core i3 530 @ 2.93HGz (dual core)
Windows 7 Home Premium 64 bits 
6GB Ram

Attached is a drawing of the network setup
Comment 41 Kurt Poulsen 2011-03-03 06:17:21 UTC
(In reply to comment #39)
> Created an attachment (id=7185) [details]
> Active plugins
> 
> Here is a list of active plugins. Few are third party, and no recent change was
> made to that list.   After taking that snapshot, I noticed that the Album
> Review Plugin has an update and I installed it. I will see if this makes any
> difference.

Actually I don't have many third party plugins either (6 or 7). And I haven't installed any new ones for a very long time. I think i have had these plugins for at least 2 years! But never the less it seems that after I have disabled Dynamic Playlists and SQL playlist it has made the system more stable. At least according to the logfile. 
I will try to upgrade to 7.5.3 this weekend to see if it works now.

But even if it has helped I think there are a major problem i SBS if plugins that have worked for years suddenly breaks the whole SB system, because Logitech have tried to minimize network traffic. Maybe you went to far?
Comment 42 sle118 2011-03-03 07:21:34 UTC
After updating the Album review plugin, the server was unable to connect to
MySqueezeBox.com, and I could not access the server anymore.   I had to save my
prefs folder, uninstall 7.5.3 and reinstall completely.  What a pain.  

Not sure if this is related to the sync issue, but I will leave this plugin
disabled for now and see if the sync issue is still present after a complete
reinstall.
Comment 43 sle118 2011-03-03 07:43:56 UTC
I performed a network test on both receivers and they stayed in the greeen at up to 3000kbps
Comment 44 Eric Jung 2011-03-04 21:53:41 UTC
I'm on Ubuntu Linux. Same problem.

No non-logitech plugins, and many out-of-the-box logictech plugins are disabled.

squeezeboxserver version is 7.5.3 - r31792 @ Mon Jan 24 07:13:35 PST 2011
(4) Squeezebox Booms - firmware 52
(1) Squeezebox Radio - firmware 7.5.3-r9283
Comment 45 Kurt Poulsen 2011-03-13 08:48:09 UTC
Since my upgrade to 7.5.4 my sync problems has gone away. Should I close this bug or is there anyone who wants to keep it open?