Bug 9102 – Netgear NAS Hibernation problem.

Bug 9102 - Netgear NAS Hibernation problem.

Summary:

Netgear NAS Hibernation problem.

Status:	NEW

Product:	Logitech Media Server
Classification:	Unclassified
Component:	Controller
Version:	7.4.0
Platform:	Infrant ReadyNAS Debian Linux

Importance:	P4 major with 1 vote (vote)
Target Milestone:	8.0.0
Assigned To:	Andy Grundman

URL:
Keywords:

Depends on:	8300 8646
Blocks:
	Show dependency tree / graph

Reported:	2008-08-11 10:19 UTC by Briain
Modified:	2010-01-25 04:09 UTC (History)
CC List:	13 users (show)

See Also:
Category:	---

Attachments
ReadyNAS log (19.62 KB, text/plain) 2008-11-11 10:52 UTC, Spies Steven	Details
Jive log (188.71 KB, text/plain) 2008-11-11 10:53 UTC, Spies Steven	Details
New Duet ReadyNAS log (4.96 KB, text/plain) 2009-02-05 13:20 UTC, Spies Steven	Details
New Duet ReadyNAS wireshark capture (144.67 KB, application/zip) 2009-02-05 13:22 UTC, Spies Steven	Details
02052008 ReadyNAS log (5.42 KB, text/plain) 2009-02-06 09:07 UTC, Spies Steven	Details
02052008 ReadyNAS unfiltered wireshark capture (4.27 MB, application/zip) 2009-02-06 09:09 UTC, Spies Steven	Details
Archive of Duet and ReadyNAS logs (132.57 MB, application/zip) 2009-02-09 15:54 UTC, Spies Steven	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Briain 2008-08-11 10:19:58 UTC

RAIDiator 4.1.3-T120 [1.00a042] and SqueezeCenter 7.2~22491

Hi

I've just uploaded 7.2~22491 to the ReadyNAS NV+. It all works well but whenever the Logitech remote is powered up (and sitting in its cradle) the NAS refuses to hibernate. I've highlighted this issue before (both in the Slim forum and the Netgear forum) and have noticed that other people are having the exact same problems, but nobody has commented on why this might be happening.

It really needs to be addressed as it's ecologically very poor practice to have the disks continuously spinning for something which is only occasionally used.

Incidentally, 7.0.0 and 7.0.1 caused the NAS to hibernate for 20 minutes then wake up, then hibernate for 20 minutes etc etc. 7.2~22491 permitted hibernation for 4 minutes, then after the hour (the set hibernation period) it permitted hibernation for 15 minutes. After the 15 minutes, it spun the disks up and kept them spun up for 3 hours (until now).

When the Duet remote is powered off, the NAS hibernates without issue. For information, I've tried 7.0.0 and 7.0.1 on my ReadyNAS NV+ and on two ReadyNAS Duos; all three experience similar hibernation problems when the Logitech remote is switched on, and all three hibernate fine with it switched off (it doesn't matter if the receiver is on or off)

Bri

Comment 1 Blackketter Dean 2008-08-11 16:33:26 UTC

Matt: Can you pass this on to our friends at Netgear?

Comment 2 Blackketter Dean 2008-08-13 07:24:59 UTC

Justin:  Not sure where this problem is, it could be in the controller.  What happens in the ReadyNAS to make the disk spin down/up?

Comment 3 Briain 2008-08-13 08:51:17 UTC

Hi
I wondered if it's something silly like the Logitech remote updating its clock every 20 minutes and either directly or indirectly waking the NAS (via SqueezeCentre) in the process? This is just a wild guess though!
Bri

Comment 4 Justin Maggard 2008-08-13 19:15:20 UTC

The ReadyNAS just tells the disks to spin down, then queues up all write requests while the disks are down.  A read request that misses buffer cache will go down to the disks to automatically spin them up.  We're pre-reading in a lot of commonly-used files so that they'll be in buffer cache when the OS needs them.  SqueezeCenter doesn't appear to hurt the cause much, except when a Duet remote is on and connected.  Any idea why SC7 is doing uncached reads only with Duet?

Comment 5 Blackketter Dean 2008-08-13 19:20:20 UTC

Could this be related to bug 8300?  Ben?

Comment 6 Ben Klaas 2008-08-13 20:36:07 UTC

I really doubt bug 8300 has anything to do with this.

Briain's comment #3 sounds like a good guess though, although the Controller polls once per hour for clock alignment, not once every 20 mins...

FYI, date request being a poll and not a SC subscription push is also a bug, bug# 8646

Comment 7 Briain 2008-08-14 02:17:47 UTC

Hi
For your info, I have looked at bug 8300 and it probably isn't the same issue. On all my tests, I ensured that the playlist was empty; I even tried emptying the playlist and selecting a radio feed to 'distance' it as far as possible from process of accessing FLAC files on the NAS.
Bri

Comment 8 Briain 2008-08-14 04:35:25 UTC

Is there any way for me to stop the handset polling at all (or change it to several days) just to eliminate that from the list of possibilities? The NAS was set to 60mins hibernation so I've just dropped it to 20 minutes. Once I've checked it hibernates stably for 2 hours at that setting, I'll switch on the Controller, then view the logs after 3 hours and report back.

Comment 9 Briain 2008-08-14 07:46:12 UTC

Looks like it is the clock update polling which interrupts hibernation. Set to 20 minute hibernation, the NAS hibernated for 2 hours. Switching on the Controller spun the disks up after a few minutes, it then hibernated after 20 minutes (as expected) and spun them up after a further 40 minutes which points to a 1 hour interrupt cycle. 

I'll monitor it over the next few hours to get a few more cycles then report back.

Comment 10 Briain 2008-08-15 05:25:58 UTC

Hi

Below are some accurately recorded activities showing the relative event timings when the Controller is active (in charging cradle), removed and placed on bench, lifted and placed in cradle then finally switched off. It's very interesting to see how it impacts on the hibernation cycle:

Updated to RAIDiator-4.1.3-T136 and SqueezeCenter_7.2~22588
NAS hibernation set to 20 minutes
Controller playlist empty
Controller switched on and sitting in charging cradle (until Event 8 + 03 minutes)
Drive mappings disconnected
NAS rebooted and Frontview closed

All below times in minutes and relate to previous event in line above (as opposed to being referenced to T=0)

T0=0 minutes Everything fully settled after above reboot.
T0+20 Spin down (Event 1)
E1+00 Spin up (E2)
E2+20 Spin down (E3)
E3+26 Spin up (E4)
E4+20 Spin down (E5)
E5+18 Spin up (E6)
E6+20 Spin down (E7)
E7+17 Spin up (E8) (E8+03 Logitech Controller lifted out of cradle and placed on shelf)
E8+20 Spin down (E9)
E9+17 Spin up (E10)
E10+20 Spin down (E11)
E11+103 Hibernated for 103 minutes then Logitech Controller lifted and placed into cradle. 
E11+104 Connected to NAS and gone into Screensaver mode.
E11+107 Spin up (E12)
E12+03 Controller switched off
E12+20 Spin down
E12+840 NAS hibernated until now (activated by myself re-mapping drives)

Comment 11 Briain 2008-08-15 07:56:10 UTC

Note that after just over the first hour (Event 4 onwards), it settles into a very steady pattern; accessing the NAS drives every ~37 minutes before I break the cycle by changing something (probably the Controller going into standby mode just after E10): 

E4 ~ E6 = 38 mins
E6 ~ E8 = 37 mins
E8 ~ E10 = 37 mins

Can anyone think of any process which could be linked to that period?

Comment 12 John Robinson 2008-08-17 04:45:58 UTC

I don't use a ReadyNAS but have my own home-brew server and am following this topic with interest.

I've been capturing network traffic to/from the Squeezebox Controller with tcpdump and inspecting it with Wireshark, and so far I've observed playlist information being sent by SqueezeCenter every 30 seconds (that'll be bug 8300), and something else which looks like a settings update every minute, but these are clearly not causing the problem here.

I've also noted that once every 100 minutes, the Controller makes a POST request to SqueezeCenter:

POST /cometd HTTP/1.1
User-Agent: Jive/7.0.1 r2448
Content-Length: 129
Host: 192.168.121.33:9000
Content-Type: text/json
Accept-Language: en

[{"id":559,"data":{"request":["00:04:20:16:58:ab",["date"]],"response":"\/c26fa2
68\/slim\/request"},"channel":"\/slim\/request"}]


and gets the reply:

HTTP/1.1 200 OK
Server: SqueezeCenter (7.0.1 - 19705)
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 72
Content-Type: application/json
Expires: -1
X-Time-To-Serve: 0.00476694107055664

[{"clientId":null,"id":559,"channel":"/slim/request","successful":true}]


Now, if there's a time update happening every 60 minutes (which I haven't yet spotted in the packet trace), and the above every 100 minutes, and both cause HDD spin-ups, I can well imagine the pattern of spin-ups looks like something every 20 or 40 minutes, and perhaps we have 2 bugs to fix here.

Comment 13 John Robinson 2008-08-17 05:12:14 UTC

OK, now I've spotted the time update once an hour, and also on a 60-minute period, but not at the same time as the time update, SqueezeCenter is sending the Controller what looks like a notification of a firmware update:

[{"data":{"relativeFirmwareUrl":"/firmware/jive_7.0.1_r2448.bin","firmwareUpgrade":0},"id":"393","channel":"/c26fa268/slim/firmwarestatus/00:04:20:16:58:ab","ext":{"priority":""}}]

So I guess it's possible SqueezeCenter is looking for updated firmware once an hour as well, which could certainly cause discs to spin up.

Comment 14 Matt Wise 2008-08-19 10:46:48 UTC

Is it possible that this is because of the Swap space on the ReadyNas? Does the Readynas ever use its swap if the system is not running SC?

Comment 15 Briain 2008-08-19 15:40:49 UTC

Hi Matt

No, it works fine even with SC fired up: testing has shown the NV+ (1GB memory / 4.1.3 T136) hibernating for over 80 hours with SqueezeCentre and TwonkyMedia running (and indeed with Sonos, Linn and Logitech Duet receivers all switched on, but, of course, not streaming any music). As soon as the Logitech Controller is switched on, the hibernation is interrupted, then follows the wake-up cycles as described above. Switching the Controller off again enables the hibernation to function normally (without a NAS re-boot). 

Incidentally, the 80h mentioned above isn't any sort of technical limit; it is simply the time limit I put on the test.

Bri

Comment 16 Briain 2008-09-10 13:45:08 UTC

Hi
I’m just setting up the latest nightly build of SC7.3 and I noticed that there is an option to turn off software updates (it might have been there before, I probably just didn't notice it) so I'll try that and see how it impacts on the hibernation cycles. It'll be interesting to see if it changes into an hourly wake-up cycle (as per the time updates).
Bri

Comment 17 Briain 2008-09-11 03:40:50 UTC

7.3~23122 beta accesses disks every few minutes (even with Controller switched off) but scanner and Controller navigation are fantastically fast compared to 7.2; will be an excellent product when finalized and hibernation issues addressed.

Comment 18 Briain 2008-10-05 08:03:47 UTC

I think that the issue with 7.3 was simply that I didn't have the correct firmware and that it was writing to the logs every 10 minutes. I ought to have used the 7.2.1 Jive firmware to see if it cured the issue but to be honest, 7.2.1 runs so well that I don't see the need to change until the Jive/hibernation issue has been sorted.

Comment 19 Ben Klaas 2008-10-23 21:28:56 UTC

I don't believe this bug is properly assigned. Unassigning and tagging for discussion in next bug meeting

Comment 20 Briain 2008-10-24 03:50:06 UTC

Hi

Priority:
Probably needs assigned as quite high as it will probably impact all systems where HDD hibernation is deployed. It’s not just a power/cost issue but a significant disk stress and thus disk/PSU reliability issue (see below). Incidentally, I've installed 4 NAS/Duet systems and all have this problem (with all requesting a solution).

Suggestion:
If it is caused by the Jive time updates, a facility to disable them (maybe at the expense of the clock having to be switched off – to hide the displayed time drift – and a blank screen displayed instead) would be a welcome compromise. 

Current Status:
To update where we are at, the latest 7.3 nightly causes no problem with Jive switched off (and Receiver left on); the NAS hibernates fine for days. If the Controller is left switched on (and in its charge cradle) the interruptions are still very regular (with a few exceptions as shown). Below are the results with the NAS hibernation set to 30 minutes. Incidentally, the NAS hibernates properly as soon as (well, 30 minutes after) the controller is switched off again:

Oct 21 20:56:43 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 21 20:56:45 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 21 21:09:07 BriServ noflushd[685]: Spinning up /dev/hdc after 12 minutes.
Oct 21 21:09:07 BriServ noflushd[685]: Spinning up /dev/hde after 12 minutes.
Oct 21 21:39:26 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 21 21:39:28 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 21 21:46:54 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 21 21:46:54 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 21 22:17:13 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 21 22:17:15 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 21 22:25:12 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 21 22:25:12 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 21 22:55:30 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 21 22:55:32 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 21 22:55:41 BriServ noflushd[685]: Spinning up /dev/hdc after 0 minutes.
Oct 21 22:55:42 BriServ noflushd[685]: Spinning up /dev/hde after 0 minutes.
Oct 21 23:26:00 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 21 23:26:02 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 21 23:40:39 BriServ noflushd[685]: Spinning up /dev/hdc after 14 minutes.
Oct 21 23:40:39 BriServ noflushd[685]: Spinning up /dev/hde after 14 minutes.
Oct 22 01:06:42 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 01:06:44 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 01:33:58 BriServ noflushd[685]: Spinning up /dev/hdc after 27 minutes.
Oct 22 01:33:58 BriServ noflushd[685]: Spinning up /dev/hde after 27 minutes.
Oct 22 02:04:16 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 02:04:18 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 02:05:14 BriServ noflushd[685]: Spinning up /dev/hdc after 0 minutes.
Oct 22 02:05:14 BriServ noflushd[685]: Spinning up /dev/hde after 0 minutes.
Oct 22 02:35:33 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 02:35:35 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 02:49:57 BriServ noflushd[685]: Spinning up /dev/hdc after 14 minutes.
Oct 22 02:49:57 BriServ noflushd[685]: Spinning up /dev/hde after 14 minutes.
Oct 22 03:20:16 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 03:20:17 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 03:27:14 BriServ noflushd[685]: Spinning up /dev/hdc after 6 minutes.
Oct 22 03:27:14 BriServ noflushd[685]: Spinning up /dev/hde after 6 minutes.
Oct 22 03:57:33 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 03:57:34 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 04:05:31 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 22 04:05:31 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 22 04:35:50 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 04:35:52 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 04:42:43 BriServ noflushd[685]: Spinning up /dev/hdc after 6 minutes.
Oct 22 04:42:43 BriServ noflushd[685]: Spinning up /dev/hde after 6 minutes.
Oct 22 05:30:47 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 05:30:49 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 05:58:48 BriServ noflushd[685]: Spinning up /dev/hdc after 28 minutes.
Oct 22 05:58:48 BriServ noflushd[685]: Spinning up /dev/hde after 27 minutes.
Oct 22 06:55:45 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 06:55:47 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 07:14:50 BriServ noflushd[685]: Spinning up /dev/hdc after 19 minutes.
Oct 22 07:14:50 BriServ noflushd[685]: Spinning up /dev/hde after 19 minutes.
Oct 22 07:45:08 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 07:45:10 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 07:52:02 BriServ noflushd[685]: Spinning up /dev/hdc after 6 minutes.
Oct 22 07:52:02 BriServ noflushd[685]: Spinning up /dev/hde after 6 minutes.
Oct 22 08:22:20 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 08:22:22 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 08:30:19 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 22 08:30:19 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 22 09:00:38 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 09:00:39 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 09:07:36 BriServ noflushd[685]: Spinning up /dev/hdc after 6 minutes.
Oct 22 09:07:36 BriServ noflushd[685]: Spinning up /dev/hde after 6 minutes.
Oct 22 09:37:55 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 09:37:56 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 09:45:53 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 22 09:45:53 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 22 10:16:12 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 10:16:14 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 10:23:35 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 22 10:23:35 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 22 10:53:54 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 10:53:56 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 11:01:22 BriServ noflushd[685]: Spinning up /dev/hdc after 7 minutes.
Oct 22 11:01:22 BriServ noflushd[685]: Spinning up /dev/hde after 7 minutes.
Oct 22 12:59:25 BriServ noflushd[685]: Spinning down /dev/hdc.
Oct 22 12:59:26 BriServ noflushd[685]: Spinning down /dev/hde.
Oct 22 13:00:48 BriServ noflushd[685]: Spinning up /dev/hdc after 1 minutes.
Oct 22 13:00:57 BriServ noflushd[685]: Spinning up /dev/hde after 1 minutes.

Comment 21 Blackketter Dean 2008-10-24 13:09:23 UTC

This appears to be an issue with the SBC firmware.  It's not clear to me from the discussion what the controller is doing to trigger the wake up.



Ben?

Comment 22 Ben Klaas 2008-10-24 13:55:10 UTC

It's already been flagged bug_meeting. I'd like to discuss this in our Monday meeting.

Comment 23 Ben Klaas 2008-10-24 14:02:41 UTC

Briain, if you're willing and able, I have a test you can try to see if the issue is the date polling (which, incidentally, is bug 8646).

1. Turn remote login on in the controller, Settings->Advanced->Remote login. The IP address of your controller will be listed at the bottom after enabling
2. ssh to the controller as user root, password 1234
3. vi the file /usr/share/jive/applets/SqueezeboxJive/SqueezeboxJiveApplet.lua and find the line that reads:
  self.clockTimer:start()
comment it out by adding two dashes in front of it like so:
 --self.clockTimer:start()
4. save the file and reboot the controller
5. re-run your hibernation tests to see if that fixes the HDD spin up problem

Apologies for pushing work back to you, but I don't have a setup that can effectively test this. If this is more than you can or want to do, let me know and I'll get one of our ReadyNAS support help to assist.

Comment 24 Ben Klaas 2008-10-24 15:02:44 UTC

FYI, I just fixed the date polling bug 8646, moving it to a comet subscription.

I'm not sure if that will fix this problem, but to test it requires an update on both SC and Controller. In SC 7.3, change 23685 paired with Controller firmware 7.3 r3213. The 7.3 firmware should get pushed out to beta users soon, either tomorrow or early next week.

Comment 25 Briain 2008-11-11 04:39:35 UTC

Hi

Sorry for the delay, I missed the prompting email! I'll pick up any responses immediately next time and act upon them directly.

I've looked at the file but cannot find that line (see below). Has Jive been radically changed (and thus this line removed) since you asked the question?

Again, sorry for the delay!

# vi /usr/share/jive/applets/SqueezeboxJive/SqueezeboxJiveApplet.lua

-- stuff we use
local ipairs, pcall, tostring = ipairs, pcall, tostring

local oo                     = require("loop.simple")
local string                 = require("string")
local math                   = require("math")
local os                     = require("os")
local io                     = require("io")

local jiveBSP                = require("jiveBSP")
local Watchdog               = require("jiveWatchdog")
local Wireless               = require("jive.net.Wireless")

local Applet                 = require("jive.Applet")
local Sample                 = require("squeezeplay.sample")
local Checkbox               = require("jive.ui.Checkbox")
local Choice                 = require("jive.ui.Choice")
local Font                   = require("jive.ui.Font")
local Framework              = require("jive.ui.Framework")
local Group                  = require("jive.ui.Group")
local Icon                   = require("jive.ui.Icon")
local Label                  = require("jive.ui.Label")
- /usr/share/jive/applets/SqueezeboxJive/SqueezeboxJiveApplet.lua 1/1204 0%

Comment 26 Briain 2008-11-11 04:49:29 UTC

Sorry, ignore last email (not used to Vi) trying this now

Comment 27 Briain 2008-11-11 05:02:48 UTC

(In reply to comment #23)
> Briain, if you're willing and able, I have a test you can try to see if the
> issue is the date polling (which, incidentally, is bug 8646).
> 
> 1. Turn remote login on in the controller, Settings->Advanced->Remote login.
> The IP address of your controller will be listed at the bottom after enabling
> 2. ssh to the controller as user root, password 1234
> 3. vi the file /usr/share/jive/applets/SqueezeboxJive/SqueezeboxJiveApplet.lua
> and find the line that reads:
>   self.clockTimer:start()
> comment it out by adding two dashes in front of it like so:
>  --self.clockTimer:start()
> 4. save the file and reboot the controller
> 5. re-run your hibernation tests to see if that fixes the HDD spin up problem

Hi

I've found local clockSynctimer:start but not self.clockTimer:start, his it been changed?

Comment 28 Spies Steven 2008-11-11 09:34:43 UTC

It appears that change 23685 does not improve the disk sleeping issue on ReadyNAS.  Attempting to gather logs to pinpoint the issue.

Comment 29 Spies Steven 2008-11-11 10:52:08 UTC

Created attachment 4233 [details]
ReadyNAS log

Comment 30 Spies Steven 2008-11-11 10:53:05 UTC

Created attachment 4234 [details]
Jive log

Comment 31 Ben Klaas 2008-11-11 11:41:04 UTC

Unfortunately I'm not seeing any smoking gun in the comparison between those two logs.

I looked at 5 separate occurrences of the HDD spinning up on the ReadyNAS, at timestamps of:
9:42:01
9:56:09
10:05:36
10:16:39
10:27:51

at 9:42, I see jive getting playerstatus and serverstatus responses from SC
at 9:56, I see more of the same along with a playerDelete and playerNew notification
at 10:05 jive syncs its software clock to its hardware clock, which should not involve readyNAS at all, plus more status responses from SC
at 10:16 more status responses

So the common thread is playerstatus and serverstatus messages, but I should note that serverstatus messages are delivered once every 60 secs and playerstatus messages once every 30 seconds. Those events are showing up in the log at every one of those specified intervals, most of which are not causing a HDD spin up event. Or another way to look at it, no matter where I look in the log a playerstatus message is within 30 seconds of a given timestamp. 

I wouldn't quite rule out the playerstatus/serverstatus messages spinning up the HD, but it looks like a red herring to me right now.

Comment 32 Ben Klaas 2008-11-14 12:14:33 UTC

r3376, checked in to fix bug# 8300, may help this issue. 

I moved the playerstatus subscription from a once/30 second request to one where SC generates it on demand. Previously with a controller connected to an SC-connected player, SC would generate a playerstatus message every 30 seconds regardless of the player's state. There is some possibility that these messages would eventually cause a NAS HD to spin up.

Steven to test with controller firmware >=r3376

Comment 33 Chris Owens 2008-11-20 15:35:08 UTC

Ping steven

Comment 34 Spies Steven 2008-11-21 16:06:24 UTC

I have not been able to determine what in SqueezeCenter 7.3 is preventing drive spin down and or waking up the drives on ReadyNAS systems.

Since bug 8300 may be one of the causes of this bug and it will most likely not make the 7.3 release I am changing the target to 7.3.1.

Comment 35 Briain 2008-11-22 03:52:17 UTC

Bug 8300 is interestingly similar, however, it's worth noting that these hibernation interruptions happen whether there is anything in the playlist or not and thus might be throwing us a 'curved ball'. I looked at bug 8300 some time back and have always left the playlist empty; I can thus confirm that the NAS still cycles with the playlist empty.
Hope that helps.

Comment 36 Spies Steven 2008-12-05 16:13:44 UTC

Michael, since you have been looking at bug 10137 for the ReadyNAS Pro, do you have any insight on what in SqueezeCenter might be causing the disks to wake from sleep on the ReadyNAS products?

Comment 37 Michael Herger 2008-12-08 07:36:04 UTC

I have no idea so far (haven't looked at it yet). But I'm a bit confused: Ben said he had checked in a change which _might_ be helpful. But nobody said whether this changed anything. Did you test a recent build?

Comment 38 James Richardson 2008-12-19 08:01:57 UTC

Changing target to next release

Comment 39 Spies Steven 2009-01-08 14:33:53 UTC

It does not look like this will be resolved before the next release.

Comment 40 Blackketter Dean 2009-01-08 16:34:23 UTC

Steven, what's the state of this here? Are you trying to reproduce?

Comment 41 Spies Steven 2009-01-08 16:49:36 UTC

Dean, I was able to reproduce this issue easily when I last looked at it.  Unfortunately I was having extreme difficulty pinpointing the cause and the logs don't seem to show a SC correlation for the disk spin up.

Michael, do you have any time to look into this?

Comment 42 Michael Herger 2009-01-08 22:16:19 UTC

Steven - did you test this recently? See my comment #37

Comment 43 Michael Herger 2009-01-15 00:24:39 UTC

Steven? Did you test this after Ben's comment #37?

Comment 44 Ben Klaas 2009-01-15 07:35:20 UTC

FYI, my fix for bug 8300 had to be reverted (see bug 10064), so if that fix helped this bug, you won't see it now.

Comment 45 Spies Steven 2009-01-15 08:42:42 UTC

I'm testing as I write this.  Will report soon.

Comment 46 Spies Steven 2009-01-15 08:52:57 UTC

I am still seeing the issue with nightly 7.3.2 - 24638.  With this build the drives are spun up every 30 minutes or so.  This is with r3476 on the controller.

Ben, what build can I use to see if your fix for bug 8300 would also fix this issue?

Comment 47 Spies Steven 2009-02-05 13:20:39 UTC

Created attachment 4768 [details]
New Duet  ReadyNAS log

Comment 48 Spies Steven 2009-02-05 13:22:21 UTC

Created attachment 4769 [details]
New Duet ReadyNAS wireshark capture

Comment 49 Spies Steven 2009-02-05 13:35:39 UTC

I used SqueezeCenter 7.3.2 - 24695 and Controller 7.3 r3993 when creating the wireshark capture.

The capture was done with the filter "host 192.168.1.102" so it only captured Controller traffic.  The network consisted of a ReadyNAS Duo, Receiver, Controller, Netgear Ethernet Hub and a Linksys WRT54GS.  The Duet was setup in the bridged configuration.

Please let me know if there are any other logs and or logging options that would help pinpoint the issue while I have this test rig setup.

Comment 50 Ben Klaas 2009-02-05 14:42:51 UTC

From the ReadyNAS log I identified several timestamps of when the NAS spun up its hard drive, then looked in the wireshark log for traffic from the controller at or just before the spin up.

There is nothing in the network traffic that shows traffic from the controller at these spin up times.

For example, the NAS hard drive spun up at 16:37:06
in the wireshark capture, there is nothing at that timestamp or for up to 23 seconds before it from the controller.

16:36:43 a serverstatus notification was sent from NAS to controller, and an ACK back from the controller (serverstatus messages are sent every minute and are found throughout the wireshark capture at expected intervals)

16:36:48 ARP from controller to NAS with response from NAS

16:36:48 2 UDP broadcast packets from Controller (filenet-tms, filenet-rpc). I don't know the purpose of these
16:36:48 UDP packet (filenet-rpc) from NAS to Controller

23 seconds pass, then HD spins up

The packets described above also are all found throughout the capture at times when the hard drive did not spin up.

I'm at a loss...there isn't any smoking gun here.

Comment 51 Justin Maggard 2009-02-05 15:05:52 UTC

That 23 seconds number may be a bit misleading, because if you're basing the time on log entries, it will be off by however long it takes the disks to spin up.  With two disks, that could be as long as ~20 seconds.

Comment 52 Spies Steven 2009-02-06 09:07:31 UTC

Created attachment 4773 [details]
02052008 ReadyNAS log

Comment 53 Spies Steven 2009-02-06 09:09:10 UTC

Created attachment 4775 [details]
02052008 ReadyNAS unfiltered wireshark capture

Comment 54 Spies Steven 2009-02-06 09:25:20 UTC

New overnight wireshark capture.  The only change I made was to capture without a filter.  What I don't understand is that the wake interval has changed.  Last nights capture the log shows the drives spun up about every 30 minutes or so and the previous log about every 46 minutes.  I'm at a loss.

Comment 55 Spies Steven 2009-02-09 15:54:08 UTC

Created attachment 4787 [details]
Archive of Duet and ReadyNAS logs

This archive contains server.log, syslog and a wireshark capture.  The server.log and wireshark capture were taken with all logging enabled in hopes that they might reveal something and are very large.

Comment 56 Spies Steven 2009-02-18 15:44:20 UTC

Not sure what else I can do about this one.  I am open to suggestions.

Comment 57 Chris Owens 2009-02-23 09:19:40 UTC

Andy suggests turning on prefs logging to see if SC is writing information to the prefs file and preventing hibernation.

Comment 58 Chris Owens 2009-03-16 09:35:40 UTC

We are now planning to make a 7.3.3 release.  Please review your bugs (all marked open against 7.3.3) to see if they can be fixed in the next few weeks, or if they should be retargeted for 7.4 or future.

Thanks!

Comment 59 Chris Owens 2009-03-30 09:44:23 UTC

Justin, do you have any guidance for where we should look next on this bug?

Comment 60 Justin Maggard 2009-03-30 17:50:44 UTC

Maybe wait until the disks are about to spin down, and then echo 1 > /proc/sys/vm/block_dump to figure out which process is reading from what block to cause the spinup.  Thing is, in this case we pretty much know it's squeezecenter.pl.  But it might help to see if it's consistently reading from the same block or what.  Or... if it's a write that's causing the spinup, that means somebody somewhere is doing a sync().

Comment 61 Blackketter Dean 2009-07-22 08:39:23 UTC

Moving to the product SqueezePlay because this bug appears to apply to any player based on that application code.  Feel free to move it back if it's specific to the single original product.

Comment 62 Richard Titmuss 2009-07-23 08:25:19 UTC

It this related to bug 8300?

Comment 63 John Robinson 2009-07-23 09:01:46 UTC

(In reply to comment #62)
> It this related to bug 8300?

I don't know, I can't see bug 8300 any more, but Briain did say in comment #7 that he thought not, and again in comment #35.

As for it being related to a player rather than SqueezeCenter's servicing of players generally, that hasn't been identified either.

Comment 64 Richard Titmuss 2009-07-27 01:13:22 UTC

Reset priority before triage.

Comment 65 Briain 2009-07-27 04:08:38 UTC

Just to clarify, this only happens when the controller is left switched on. Most people do leave the controller in its base charger and thus this issue will impact on most.

I just visited a friends house and discovered they'd left there controller on for over a year (I told them not to, but they like the clock display on the handset); their NAS has spun down and up roughly every hour which is over 8000 disk cycles. I'd have thought this should be raised to a P1 and thus fixed in a timelier manner; it impacts on all ReadyNAS Duo and NV+ installations.

This was reported a year ago and that's way too long to fix a major bug like this. I've fitted 3 other Logitech/ReadyNAS systems and need an answer to this soon; the users are really hacked off about this! Energy efficiency is a major issue these days and they are really annoyed at me telling them to leave their NAS spinning 24/7; the NAS hibernation feature was a major consideration when they bought these systems. I know it doesn't save that much power, but the psychological impact seems to be huge. Sonos and Twonky based systems don't cause any hibernation problems, so this really needs to be fixed quickly.

Comment 66 John Robinson 2009-07-27 05:31:20 UTC

I estimate that having 2 drives spinning for 18 hours/week instead of 168 (24x7) would currently save £14 of electricity per year at UK prices. I think some current consumer-level drives are specced for 8x5 use, not 24x7. Current drives claim a life of 50,000 start/stop or load/unload cycles. Either way - running 24x7 or loading/unloading hourly - is putting a fair dent in the drives' specified life, and it's obviously unnecessary.

Please get this fixed asap, it's a silly, irritating flaw in what is otherwise a neat system (by which I mean NAS+Duet).

Comment 67 Briain 2009-07-27 06:28:55 UTC

Agreed on the figures. The customers like that a [hibernating] NAS is approximately equivalent to leaving on a CFL bulb; an upper limit they feel comfortable with, not for cost reasons, but simply for environmental impact. To increase this breeches the waste limit they have deemed acceptable and they end up switching their NAS off. This means the music is no longer instantly available; they have to go to their cupboards and switch it on, then wait for it to boot up before playing music. That significantly detracts from the product concept and is thus something that needs addressed.

Incidentally, hourly start stop cycling not only stresses the drives (particularly since they will also be temperature cycling) but puts two surge loads on the SMPS (one to start each drive) which it should handle, but would be better if it didn’t have to do so every hour. This all takes some explanation to the non-technical users; it's just that this is all so unnecessary, and that it has now been an issue for a year that irritates people so much. 

PS Over this last year, I’ve installed quite a few Sonos systems, but I will certainly not recommend any more Duet systems to anyone until this is addressed; Logitech are loosing money over this.

Comment 68 John Robinson 2009-07-27 09:50:02 UTC

Can one of the SqueezeCenter developers tell me how to change the period with which the time update and firmware version notificatons - which I noted were hourly in comment #13 - are sent to the Controller? If varying or disabling these updates were found to affect the issue, that might help more clearly identify the problem.

Comment 69 Andy Grundman 2009-07-27 09:53:15 UTC

The 'date' query is scheduled based on the call from the Controller.  You could disable it by commenting out the following lines in Slim/Control/Jive.pm:

	# manage the subscription
	if (defined(my $timeout = $request->getParam('subscribe'))) {
		$request->registerAutoExecute($timeout, \&dateQuery);
	}

Firmware update is only sent every 12 hours I believe, and only if you need one.

Comment 70 John Robinson 2009-07-27 10:07:46 UTC

Well, whatever version we were using last August when I traced with WireShark, it was sending a notification of the latest available version every hour, and only setting firmwareUpgrade=1 if it was actually a newer version, and it looks like that's still the behaviour now from looking at the 7.3.3 code today. Thanks for confirming for me that that's the place; I'll try with these commented out. Now where does the $timeout value come from?

Comment 71 Andy Grundman 2009-07-27 10:10:31 UTC

$timeout comes from the request from the Controller.

Comment 72 Andy Grundman 2009-07-29 14:58:40 UTC

Moving 7.4 bugs to 8.0.