Bug 2336 - XPL enabled causes "Insane number of timers" crash
: XPL enabled causes "Insane number of timers" crash
Status: RESOLVED FIXED
Product: Logitech Media Server
Classification: Unclassified
Component: xPL
: 6.1.2
: PC SuSE Linux
: P4 normal (vote)
: ---
Assigned To: Blackketter Dean
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-22 07:00 UTC by Bryan Boatright
Modified: 2008-09-15 14:37 UTC (History)
1 user (show)

See Also:
Category: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bryan Boatright 2005-10-22 07:00:52 UTC
When XPL support is enabled, I've noticed that I eventually get an "Insane
number of timers: 501" error which causes slimserver to die.

I added the following debug code to the end of sub setTimer:

	my $bryanb_timediff = $when - time;
	msg ("T: [$client] [$subptr] (@args) $bryanb_timediff $numtimers\n") unless
($bryanb_lasttimer == $numtimers);
	$bryanb_lasttimer = $numtimers;

This produced the following output:
2005-10-22 07:23:41.3115 T: [] [CODE(0x8d4c0dc)] () 300.310846090317 141
2005-10-22 07:28:41.3161 T: [] [CODE(0x8d4c0dc)] () 300.315644025803 142
2005-10-22 07:33:41.3211 T: [] [CODE(0x8d4c0dc)] () 300.320498943329 143
2005-10-22 07:38:41.3266 T: [] [CODE(0x8d4c0dc)] () 300.326057910919 144
2005-10-22 07:43:41.3327 T: [] [CODE(0x8d4c0dc)] () 300.332314968109 145
2005-10-22 07:48:41.3380 T: [] [CODE(0x8d4c0dc)] () 300.337065935135 146

Notice that there is no client id (the first set of brackets is empty).  And
notice the last number is $numtimers.  $numtimers normally decrements after the
timer expires for all timers except the one shown above.  This causes a
non-monotonically increasing $numtimers which eventually invokes the "Insane
number of timers" error.

I searched the source for "300" and this eventually led nowhere.  :-)  I then
decided to search for " 5" (5 minutes, 300 secods) and found this sub: 

# Sends an xPL heartbeat from all clients that are currently connected
sub sendxplhbeat {
	my @clients = Slim::Player::Client::clients();

	foreach my $client (@clients) {
		sendXplHBeatMsg($client);
	}

	Slim::Utils::Timers::setTimer("", Time::HiRes::time() + ($xpl_interval*60),
\&sendxplhbeat);
}

$xpl_interval was previously set to 5.  This is the source of the 300 second
timer (verified by disabling XPL support).

From what I understand looking at the other timer subs, calling setTimer without
a client id ("" as above) will guarantee that the timer is never removed from
the timer stack (or whatever it is called) and this will eventually trigger the
error.

I recall reading somewhere that XPL support is experimental.  I am not actually
using it (I turned it on once to monitor the packets it sent), but thought
someone might like to know about this.

My wife is relieved that I figured out why "the gadget" stopped crashing and
that's really all I need to accomplish.  :-)
Comment 1 Adrian Smith 2005-10-22 10:50:38 UTC
Bryan,

I don't think a timer with no client should cause your problem.  It should be 
removed from the timer list when it fires (see 
Slim::Utils::Timers::checkTimers).  Hence there should only be one instance of 
this timer on the list at once assuming xPL::init is only called once.  There 
are several other timers in the code with no client set (most set this to 0 
rather than "" though)

It does sound like you have a timer leak, could it be with another timer?  
What happens when you turn on --d_time?  Does this suggest the same code is 
continually being added to the timer stack?  Does it say timers are always 
blocked?
Comment 2 Bryan Boatright 2005-10-23 05:48:17 UTC
Ah.  You are correct that the timer I thought was causing the problem is itself
working correctly.  My debugging approach was flawed.  I replaced my original
debugging code with this to dump the timer array each time a timer is added:

  	my $bryanb_timediff = $when - time;
  	msg ("T: New timer: [$client] [$subptr] (@args) $bryanb_timediff $numtimers\n");
  	for (my $i = 0; $i <= $#normalTimers; $i++) {
  	  $bryanb_timediff = $normalTimers[$i]->{when} - time;
  	  msg ("   T: [$i] ($normalTimers[$i]->{client})
($normalTimers[$i]->{subptr}) $bryanb_timediff (@{$normalTimers[$i]->{args}})\n");
  	}

That code results in this output after about 30 minutes of runtime (with XPL
enabled and a playlist being played):

2005-10-23 07:00:07.6870 T: New timer: [Slim::Player::SLIMP3=ARRAY(0x9f9852c)]
[CODE(0x9fbfaac)] () 0.736871004104614 19
2005-10-23 07:00:07.6873    T: [0] (Slim::Player::SLIMP3=ARRAY(0x9f9852c))
(CODE(0x9193308)) 0.71684193611145 ()
2005-10-23 07:00:07.6875    T: [1] (Slim::Player::SLIMP3=ARRAY(0x9f9852c))
(CODE(0x9fbfaac)) 0.736871004104614 ()
2005-10-23 07:00:07.6878    T: [2] (Slim::Player::SLIMP3=ARRAY(0x9f9852c))
(CODE(0x8ef0934)) 1.28964781761169 (Slim::Player::SLIMP3=ARRAY(0x9f9852c))
2005-10-23 07:00:07.6882    T: [3]
(Slim::DataStores::DBI::DBIStore=HASH(0xa079b10)) (CODE(0x89e4f80))
1.30803203582764 ()
2005-10-23 07:00:07.6887    T: [5]
(Slim::DataStores::DBI::DBIStore=HASH(0xa076990)) (CODE(0x89e4f80))
8.90686202049255 ()
2005-10-23 07:00:07.6889    T: [6]
(Slim::DataStores::DBI::DBIStore=HASH(0xa075bec)) (CODE(0x89e4f80))
16.6173961162567 ()
2005-10-23 07:00:07.6892    T: [7]
(Slim::DataStores::DBI::DBIStore=HASH(0xa035d78)) (CODE(0x89e4f80))
20.7026860713959 ()
2005-10-23 07:00:07.6893    T: [8]
(Slim::DataStores::DBI::DBIStore=HASH(0xa075bf8)) (CODE(0x89e4f80))
20.7172288894653 ()
2005-10-23 07:00:07.6895    T: [9]
(Slim::DataStores::DBI::DBIStore=HASH(0xa03366c)) (CODE(0x89e4f80))
20.7911450862885 ()
2005-10-23 07:00:07.6899    T: [10]
(Slim::DataStores::DBI::DBIStore=HASH(0xa073444)) (CODE(0x89e4f80))
20.8605389595032 ()
2005-10-23 07:00:07.6903    T: [11]
(Slim::DataStores::DBI::DBIStore=HASH(0x9bdfcc0)) (CODE(0x89e4f80))
21.2709050178528 ()
2005-10-23 07:00:07.6905    T: [12]
(Slim::DataStores::DBI::DBIStore=HASH(0x976be78)) (CODE(0x89e4f80))
21.7198550701141 ()
2005-10-23 07:00:07.6906    T: [13]
(Slim::DataStores::DBI::DBIStore=HASH(0xa079b94)) (CODE(0x89e4f80))
22.7797019481659 ()
2005-10-23 07:00:07.6908    T: [14]
(Slim::DataStores::DBI::DBIStore=HASH(0xa0759b8)) (CODE(0x89e4f80))
24.7895228862762 ()
2005-10-23 07:00:07.6911    T: [15] (0) (CODE(0x9bcc320)) 53.2884349822998 ()
2005-10-23 07:00:07.6913    T: [16] (0) (CODE(0x9055064)) 53.5564730167389 ()
2005-10-23 07:00:07.6914    T: [17] () (CODE(0x8d4dbc8)) 200.705127000809 ()
2005-10-23 07:00:07.6916    T: [18] (0) (CODE(0x991b29c)) 29706.0501899719 ()

Timer 17 is the one with client id of "" which proves I was wrong... I thought
those were somehow being accumulated.

Without uploading the full log of the debug output suffice it to say that the
timers that are accumlating are the ones that look like this:

2005-10-23 07:00:07.6884    T: [4] (Slim::Player::SLIMP3=ARRAY(0x9f9852c))
(CODE(0x8de8620)) 1.58605909347534 ()

And by "accumulating" I mean it looks like the total number of active timers
like the one shown above increases by 1 each time the 300 second timer quoted in
my original post fires.  So, I still feel pretty strongly that the 300 second
timer in xPL.pm is causing something bad to happen even if indirectly.

Also, I did enable the --d_time option as suggested and this never resulted in a
"blocked timer" message.  I didn't mention this before, but to sensitize the
problem a playlist must be playing.  (I am not saying this is the only way to
trigger the problem, though, but SlimServer doesn't exhibit symptoms that I've
noticed while in the "off" state.)

If you can suggest the next debugging step, I'll be glad to try it.  
Comment 3 Adrian Smith 2005-10-23 06:05:34 UTC
Bryan,

There's a sub in Slim/Utils/Misc called deparseCodeRef.  I would suggest using 
a cut down version of this (which does not return the body of the coderef) to 
turn the coderef into a readable name...

Alternatively if you add bt(); into setTimer it will tell you who is calling 
setTimer for the problem case.  
Comment 4 Bryan Boatright 2005-10-30 04:14:05 UTC
I added bt to my debug code as suggested.  I found this interesting sequence
occurring right before the addition of the 300 second timer:

   frame 0: Slim::Utils::Timers::setTimer
(/usr/local/slimserver/Slim/DataStores/DBI/DBIStore.pm line 1033)
   frame 1: Slim::DataStores::DBI::DBIStore::_commitDBTimer
(/usr/local/slimserver/Slim/DataStores/DBI/DBIStore.pm line 92)
   frame 2: Slim::DataStores::DBI::DBIStore::new
(/usr/local/slimserver/Slim/Control/xPL.pm line 265)
   frame 3: Slim::Control::xPL::sendXplHBeatMsg
(/usr/local/slimserver/Slim/Control/xPL.pm line 310)
   frame 4: Slim::Control::xPL::sendxplhbeat
(/usr/local/slimserver/Slim/Utils/Timers.pm line 100)
   frame 5: Slim::Utils::Timers::checkTimers
(/usr/local/slimserver/slimserver.pl line 641)
   frame 6: main::idle (/usr/local/slimserver/slimserver.pl line 579)
   frame 7: main::main (/usr/local/slimserver/slimserver.pl line 1182)

sendXplHBeatMsg has this section of code:

	if ($playmode eq 'play') {
		$playmode = "playing";
		my $currentDB = Slim::DataStores::DBI::DBIStore->new();
		my $url = Slim::Player::Playlist::song($client);
		my $track = ref $url ? $url : $currentDB->objectForUrl($url, 1, 1);
		if (defined($track->album())) {
			if (defined($track->album()->title())) {
				$album = $track->album()->title();
			}
		}


Notice that this calls the new method of Slim::DataStores::DBI::DBIStore.

Slim::DataStores::DBI::DBIStore->new() has this code:

	$self->_commitDBTimer();

and _commitDBTimer() has this code:

Slim::Utils::Timers::setTimer($self, Time::HiRes::time() + DB_SAVE_INTERVAL,
\&_commitDBTimer);

Note the reference that this timer has back to itself making this a
self-renewing timer.

So here's what I believe is happening:

Every 300 seconds the XPL heartbeat code calls sendXplHBeatMsg.  This calls
Slim::DataStores::DBI::DBIStore->new() which sets up a recurring timer via the
call to $self->_commitDBTimer();

This explains my observation that every 300 seconds that the net number of
timers grows by one when xPL is enabled and the player is playing--it's the
accumulation of calls to Slim::DataStores::DBI::DBIStore->new() every 300
seconds that's causing the problem.

Agree?
Comment 5 Adrian Smith 2005-10-30 05:12:35 UTC
Good catch - definately looks the problem with the code you are running.

This was resolved by a change to xPL.pm at version 2343:
http://svn.slimdevices.com/trunk/server/Slim/Control/xPL.pm?
rev=4243&r1=3315&r2=4243

(and explains why I didn't see it as I tend to have quite a recent version 
running)

Could you try a 6.2 nightly to check this is fixed?

Comment 6 Blackketter Dean 2005-10-30 10:38:39 UTC
Bryan, I'm going to mark this as Fixed.  Please reopen if it's still an issue. Thanks.
Comment 7 Chris Owens 2006-06-16 14:41:37 UTC
There are 536 bugs in the database with targets of '---' that were fixed prior to new year 2006.  I am setting them to targets of 6.2.1 to keep them from showing up in my queries.