Bug 3094 - SN Timer issues
: SN Timer issues
Status: NEW
Product: MySqueezebox.com
Classification: Unclassified
Component: SlimService
: Prod
: All All
: P2 normal (vote)
: Future
Assigned To: Andy Grundman
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-03-02 11:55 UTC by Dan Sully
Modified: 2009-04-15 09:45 UTC (History)
3 users (show)

See Also:
Category: ---


Attachments
Excerpt of server.log (209.03 KB, application/octet-stream)
2009-04-14 11:26 UTC, Sébastien Phélep
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Sully 2006-03-02 11:55:16 UTC
2006-03-02 10:29:14.3469 can't open /home/svcprod/ss/Orbital-48/service/../base/../service/SQL/mysql/sql.version                                            
2006-03-02 10:29:14.5186 Auto-detecting IP: 192.168.254.4                                                                                                   
2006-03-02 10:29:14.5188 Disconnecting all clients from 192.168.254.4:3607                                                                                  
2006-03-02 10:45:52.5971 Couldn't resolve IP address for: 8Q2BpKKQ2BaPQ7CQ2BQ7Dk@Q22SCQ3EQ22rL@8                                                            
Can't use an undefined value as a SCALAR reference at /home/svcprod/ss/Orbital-48/service/Slim/Player/SqueezeboxG.pm line 750.                              
2006-03-02 11:34:23.2898 Disconnecting all clients from 192.168.254.4:3607
Comment 1 Andy Grundman 2006-03-03 08:16:43 UTC
*** Bug 3099 has been marked as a duplicate of this bug. ***
Comment 2 Dan Sully 2006-03-11 10:49:09 UTC
Andy - any thoughts on this one? It's causing 95% of our crashes right now.
Comment 3 Andy Grundman 2006-03-11 11:15:08 UTC
Not really, I don't know much of anything about the scrolling code where it's crashing. :(  Let me check in some more debug code, not sure if it will be helpful though.
Comment 4 Dan Sully 2006-03-11 11:34:05 UTC
Adrian, any thoughts on this?

SqueezeNetwork just pulls from trunk for most of it's code.
Comment 5 Adrian Smith 2006-03-11 13:01:02 UTC
Dan, Andy - afraid I can't look at until Monday night (UK time) as I am away and reading mail by web at present.

Could you confirm the actual line and subroutine the problem is at?  Line 750 in trunk doesn't look like it could cause this..
Comment 6 Andy Grundman 2006-03-11 13:32:36 UTC
It's scrollUpdateDisplay:

# update display for graphics scrolling
sub scrollUpdateDisplay {
	my $client = shift;
	my $scroll = shift;

	my $frame = $scroll->{scrollHeader} . 
		(${$scroll->{bitsref}} | substr(${$scroll->{scrollbitsref}}, $scroll->{offset}, $scroll->{overlay2start}));

	# check for congestion on slimproto socket and send update if not congested
	if ((Slim::Networking::Select::writeNoBlockQLen($client->tcpsock) == 0) && (length($frame) == $scroll->{scrollFrameSize})) {
		Slim::Networking::Select::writeNoBlock($client->tcpsock, \$frame);
	}
}

750 is the line starting with 'my $frame'.  One of the 2 scalar refs is not being defined.
Comment 7 Adrian Smith 2006-03-11 14:23:44 UTC
That should never happen!

scrollUpdateDisplay will be called from Player::Player::scrollUpdate to send an updated scrolling display to the player.  All the relavent state is stored in $client->scrollData, which is passed to scrollUpdateDisplay as $scroll.

Would it be possible to check the two refs and dump the contents of $scroll if either are undef just before line 750?

bitsref is a reference to the static portion of the display, scrollbitsref is a reference to the scrolling portion.  Both are created by sub render in SqueezeboxG.

I can think of two possible causes:
- scrollUpdate being called by a timer even though scrolling has been cancelled (and $client->scrollData cleared)
- render not setting one of the bitsrefs but leaving scrolling set [can't see how it could do this]

Do you have way to reproduce this?
Comment 8 Andy Grundman 2006-03-11 14:51:52 UTC
I've added some debug code to check the refs and print a bt and dump $scroll, but it's not pushed out yet.

We can't reproduce but it seems to happen fairly often.
Comment 9 Andy Grundman 2006-03-13 06:41:43 UTC
My debug code didn't help much, it's clear that $client->scrollData is null when the scrollUpdate timer is firing.  Maybe it's just a case of a client disconnecting right before the scroll timer fires?

   frame 0: Slim::Player::SqueezeboxG::scrollUpdateDisplay (/home/svcprod/ss/Orbital-52/service/Slim/Player/Player.pm line 1103)
   frame 1: Slim::Player::Player::scrollUpdate (/home/svcprod/ss/Orbital-52/service/Slim/Utils/Timers.pm line 100)
   frame 2: Slim::Utils::Timers::checkTimers (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 374)
   frame 3: main::idle (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 325)
   frame 4: main::main (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 591)

2006-03-12 22:55:13.2164 $VAR1 = {};
Can't use an undefined value as a SCALAR reference at /home/svcprod/ss/Orbital-52/service/Slim/Player/SqueezeboxG.pm line 756.

Is it safe to check that $client->scrollData is defined in Player::scrollUpdate and just kill the timer if it is not defined?
Comment 10 Andy Grundman 2006-03-13 06:43:46 UTC
Or maybe there is a bug in the timer optimizations...
Comment 11 Adrian Smith 2006-03-13 11:08:37 UTC
Andy - I would suspect the timer code based on this.

Player::scrollStop should kill the timer and set scrollData to undef at the same time.  This occurs when a client's display changes from scolling to non scrolling state.  I suspect your debug code caused the undef to become and empty hash?

Is killTimers correctly killing a timer on the high priority queue (asssuming you are using high priority queues)?

You could definately add a return straight away from Player::scrollUpdate if $client->scrollData is undef, but this is fixing the symptom not the problem...
Comment 12 Andy Grundman 2006-03-13 12:08:06 UTC
You're probably right.  SN has some timer optimizations that improve performance when many clients are connected.  Each client keeps a list of their own timers because it's faster to delete timers this way rather than going through the list of all timers.

This was written before my time and I'm not sure I fully understand how all the timer stuff works yet.

We were having problems with many "insane number of timers" deaths and it appeared that there were lots of timers not being killed properly.  I added some code to try to clean up these killed timers and I think this may be when we started seeing the scalar ref error.
Comment 13 Adrian Smith 2006-03-13 12:21:40 UTC
Not sure how SN has implemented it, but there are other uses of killTimers in the display code, so it would be good to sort out the root cause.

If you want a second pair of eyes - mail me Timer.pm off line if you are happy to.
Comment 14 Adrian Smith 2007-04-09 15:46:18 UTC
Andy - didn't we fix this one by the timer code rewrite to use the POE::Queue::Array?
Comment 15 Blackketter Dean 2008-02-06 09:38:39 UTC
Will review the outstanding SN bugs after 7.0 ships.
Comment 16 Sébastien Phélep 2009-04-14 11:26:13 UTC
Created attachment 5116 [details]
Excerpt of server.log

Excerpt from server.log showing the bug arising in an hour time, until SC halts.
Comment 17 Sébastien Phélep 2009-04-14 11:35:57 UTC
Hi all,

Reviving this old bug report, since one french fellow encountered the "Insane number of timers" error last week-end (http://forums.slimdevices.com/showthread.php?t=62399) while demoing his Duet (Finagle's or Murphy's laws, anybody?).

It looks like it's related to SqueezeCenter trying to sync player prefs from SqueezeNetwork at startup, while SN was down (or looked like so).

Hope this'll help.
Comment 18 Chris Owens 2009-04-15 09:45:32 UTC
I've changed the description and added Andy to the cc list, but left the target on 'future' for now.