Bugzilla – Bug 3094
SN Timer issues
Last modified: 2009-04-15 09:45:32 UTC
2006-03-02 10:29:14.3469 can't open /home/svcprod/ss/Orbital-48/service/../base/../service/SQL/mysql/sql.version 2006-03-02 10:29:14.5186 Auto-detecting IP: 192.168.254.4 2006-03-02 10:29:14.5188 Disconnecting all clients from 192.168.254.4:3607 2006-03-02 10:45:52.5971 Couldn't resolve IP address for: 8Q2BpKKQ2BaPQ7CQ2BQ7Dk@Q22SCQ3EQ22rL@8 Can't use an undefined value as a SCALAR reference at /home/svcprod/ss/Orbital-48/service/Slim/Player/SqueezeboxG.pm line 750. 2006-03-02 11:34:23.2898 Disconnecting all clients from 192.168.254.4:3607
*** Bug 3099 has been marked as a duplicate of this bug. ***
Andy - any thoughts on this one? It's causing 95% of our crashes right now.
Not really, I don't know much of anything about the scrolling code where it's crashing. :( Let me check in some more debug code, not sure if it will be helpful though.
Adrian, any thoughts on this? SqueezeNetwork just pulls from trunk for most of it's code.
Dan, Andy - afraid I can't look at until Monday night (UK time) as I am away and reading mail by web at present. Could you confirm the actual line and subroutine the problem is at? Line 750 in trunk doesn't look like it could cause this..
It's scrollUpdateDisplay: # update display for graphics scrolling sub scrollUpdateDisplay { my $client = shift; my $scroll = shift; my $frame = $scroll->{scrollHeader} . (${$scroll->{bitsref}} | substr(${$scroll->{scrollbitsref}}, $scroll->{offset}, $scroll->{overlay2start})); # check for congestion on slimproto socket and send update if not congested if ((Slim::Networking::Select::writeNoBlockQLen($client->tcpsock) == 0) && (length($frame) == $scroll->{scrollFrameSize})) { Slim::Networking::Select::writeNoBlock($client->tcpsock, \$frame); } } 750 is the line starting with 'my $frame'. One of the 2 scalar refs is not being defined.
That should never happen! scrollUpdateDisplay will be called from Player::Player::scrollUpdate to send an updated scrolling display to the player. All the relavent state is stored in $client->scrollData, which is passed to scrollUpdateDisplay as $scroll. Would it be possible to check the two refs and dump the contents of $scroll if either are undef just before line 750? bitsref is a reference to the static portion of the display, scrollbitsref is a reference to the scrolling portion. Both are created by sub render in SqueezeboxG. I can think of two possible causes: - scrollUpdate being called by a timer even though scrolling has been cancelled (and $client->scrollData cleared) - render not setting one of the bitsrefs but leaving scrolling set [can't see how it could do this] Do you have way to reproduce this?
I've added some debug code to check the refs and print a bt and dump $scroll, but it's not pushed out yet. We can't reproduce but it seems to happen fairly often.
My debug code didn't help much, it's clear that $client->scrollData is null when the scrollUpdate timer is firing. Maybe it's just a case of a client disconnecting right before the scroll timer fires? frame 0: Slim::Player::SqueezeboxG::scrollUpdateDisplay (/home/svcprod/ss/Orbital-52/service/Slim/Player/Player.pm line 1103) frame 1: Slim::Player::Player::scrollUpdate (/home/svcprod/ss/Orbital-52/service/Slim/Utils/Timers.pm line 100) frame 2: Slim::Utils::Timers::checkTimers (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 374) frame 3: main::idle (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 325) frame 4: main::main (/home/svcprod/ss/Orbital-52/service/slimservice.pl line 591) 2006-03-12 22:55:13.2164 $VAR1 = {}; Can't use an undefined value as a SCALAR reference at /home/svcprod/ss/Orbital-52/service/Slim/Player/SqueezeboxG.pm line 756. Is it safe to check that $client->scrollData is defined in Player::scrollUpdate and just kill the timer if it is not defined?
Or maybe there is a bug in the timer optimizations...
Andy - I would suspect the timer code based on this. Player::scrollStop should kill the timer and set scrollData to undef at the same time. This occurs when a client's display changes from scolling to non scrolling state. I suspect your debug code caused the undef to become and empty hash? Is killTimers correctly killing a timer on the high priority queue (asssuming you are using high priority queues)? You could definately add a return straight away from Player::scrollUpdate if $client->scrollData is undef, but this is fixing the symptom not the problem...
You're probably right. SN has some timer optimizations that improve performance when many clients are connected. Each client keeps a list of their own timers because it's faster to delete timers this way rather than going through the list of all timers. This was written before my time and I'm not sure I fully understand how all the timer stuff works yet. We were having problems with many "insane number of timers" deaths and it appeared that there were lots of timers not being killed properly. I added some code to try to clean up these killed timers and I think this may be when we started seeing the scalar ref error.
Not sure how SN has implemented it, but there are other uses of killTimers in the display code, so it would be good to sort out the root cause. If you want a second pair of eyes - mail me Timer.pm off line if you are happy to.
Andy - didn't we fix this one by the timer code rewrite to use the POE::Queue::Array?
Will review the outstanding SN bugs after 7.0 ships.
Created attachment 5116 [details] Excerpt of server.log Excerpt from server.log showing the bug arising in an hour time, until SC halts.
Hi all, Reviving this old bug report, since one french fellow encountered the "Insane number of timers" error last week-end (http://forums.slimdevices.com/showthread.php?t=62399) while demoing his Duet (Finagle's or Murphy's laws, anybody?). It looks like it's related to SqueezeCenter trying to sync player prefs from SqueezeNetwork at startup, while SN was down (or looked like so). Hope this'll help.
I've changed the description and added Andy to the cc list, but left the target on 'future' for now.