Bug 15846 – USB drive disconnects during stress testing

Bug 15846 - USB drive disconnects during stress testing

Summary:

USB drive disconnects during stress testing

Status:	RESOLVED WONTFIX

Product:	SB Touch
Classification:	Unclassified
Component:	TinySC
Version:	7.5.0
Platform:	PC Windows XP

Importance:	-- normal (vote)
Target Milestone:	Investigating
Assigned To:	Unassigned bug - please assign me!

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-03-06 15:40 UTC by Mickey Gee
Modified:	2019-01-25 10:11 UTC (History)
CC List:	4 users (show)

See Also:
Category:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mickey Gee 2010-03-06 15:40:45 UTC

In 3 instances with different versions of Touch firmware (r8622, 8627) on two different Touch setups, I've experienced this after running tests for about 24 hours. Here's both of the setups:

1: Fab4 with USB drive, playing flac tracks continuously. Not a server for other players.
2: Fab4 with USB-IDE converter drive, playing MP3 tracks continuously. Serving MP3 tracks to 5 other players (4 IP3K, one Baby)

Not sure whether rebooting is needed, since I always get a firmware update prompt. Also not sure if I have to restart TinySC, since firmware update always starts a rescan.

Here's the log info. By the time I see it, the log just has this stuff repeating.  Will try getting turning on more switches, but not sure.

Mar  5 10:56:59 squeezeplay: stack traceback:
Mar  5 10:56:59 squeezeplay: 	/usr/share/jive/jive/net/SocketHttp.lua:388: in function 'pump'
Mar  5 10:56:59 squeezeplay: 	/usr/share/jive/jive/net/SocketTcp.lua:200: in function 'writePump'
Mar  5 10:56:59 squeezeplay: 	/usr/share/jive/jive/net/Socket.lua:186: in function </usr/share/jive/jive/net/Socket.lua:184>
Mar  5 10:56:59 squeezeplay: INFO   net.comet - Comet.lua:804 Comet {Mickeys Fab4 in Office (USB)}: _getEventSink error: connection refused
Mar  5 10:56:59 squeezeplay: INFO   net.comet - Comet.lua:997 Comet {Mickeys Fab4 in Office (USB)}: handleAdvice state=CONNECTING
Mar  5 10:56:59 squeezeplay: INFO   squeezebox.server - SlimServer.lua:714 disconnected Mickeys Fab4 in Office (USB) idleTimeoutTriggered: nil
Mar  5 10:56:59 squeezeplay: INFO   applet.AlarmSnooze - AlarmSnoozeApplet.lua:310 notify_serverDisconnected: SlimServer {Mickeys Fab4 in Office (USB)} is now disconnected
Mar  5 10:56:59 squeezeplay: WARN   applet.AlarmSnooze - AlarmSnoozeApplet.lua:323 notify_serverDisconnected: SlimServer {Mickeys Fab4 in Office (USB)} - disconnected, but no server alarm in progress : nil
Mar  5 10:56:59 squeezeplay: INFO   net.comet - Comet.lua:1038 Comet {Mickeys Fab4 in Office (USB)}: advice is retry, connect in 2.679 seconds
Mar  5 10:57:01 squeezeplay: INFO   net.slimproto - SlimProto.lua:599 connect to 172.19.120.150 (172.19.120.150)
Mar  5 10:57:01 squeezeplay: INFO   net.slimproto - SlimProto.lua:773 connection error: closed, reconnecting in 1.297 seconds
Mar  5 10:57:02 squeezeplay: ERROR  net.http - SocketHttp.lua:388 SocketHttp {Mickeys Fab4 in Office (USB)_Chunked}:t_sendRequest.pump: connection refused

Comment 1 Mickey Gee 2010-03-06 15:42:22 UTC

Can someone look and see whether latest Linux bug fixes for storage and USB drivers have been applied? Most likely they've not been updated in one year.

Comment 2 Mickey Gee 2010-03-08 12:02:55 UTC

Switching to latest Fab4 units and not using PB1 or PB2 units.

Comment 3 Vahid Fereydouny 2010-03-09 13:53:26 UTC

Enabled the kernel logs and running the test to reproduce the problem and to capture the logs. Meanwhile I will be looking into the driver and the latest changes in the kernel.

Comment 4 Vahid Fereydouny 2010-03-10 14:18:38 UTC

I have been running the test for more than 24 hours now and have not seen it happen. I looked at the kernel changes and there has been a lot of modifications. As soon as the bug happens I should be able to understand the nature of it and to fix it.

Comment 5 Vahid Fereydouny 2010-03-10 14:39:36 UTC

I think this bug should be targeted for post 7.5.0. It is too late to make drastic changes in the USB driver in the kernel. I think the risk exceeds the rewards.

Comment 6 Vahid Fereydouny 2010-03-10 18:12:22 UTC

The changes in the kernel related to USB and SCSI support after 2.6.26 has been a lot. To find the right changes and apply them I need to reproduce the bug and have the logs.
Here is the links to the kernel changes:
http://www.kernel.org/
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/

Comment 7 Vahid Fereydouny 2010-03-11 14:06:14 UTC

My system is still running after almost three days. I will load and run some test tools that put a lot of load on the device and I will check the performance/reliability of the system. This bug's target date should be postponed to after 7.5.0

Comment 8 Vahid Fereydouny 2010-03-11 16:03:29 UTC

Here is the link to the dt test that could be used to run stress tests on the device. I will run the test and it should give us good indication of the performance and the reliability of the USB/SCSI connections.
http://www.scsifaq.org/RMiller_Tools/index.html

Comment 9 Vahid Fereydounkolahi 2010-03-15 23:00:07 UTC

Here is an example of the way I would run the dt on my system.
I believe we should use the dt as part of our DVT process if we are planning to support the external drives.
#dt of=/dev/sda1 capacity=124m iotype=random bs=512 pattern=iot aios=14 dlimit=1024 prefix='logitech' dtype=disk runtime=03d00h00m0s enable=aio,debug records=26000 oncerr=abort enable=aio,debug records=26000 oncerr=abort enable=noprog alarm=10s noprogt=20s

I think we should also run the vmstat at the same time to monitor IO, Memory, and CPU load while this operation is in progress:

#vmstat 10

Here is a sample result of running dt:
"
Write Statistics:
     Total records processed: 26000 @ 512 bytes/record (0.500 Kbytes)
     Total bytes transferred: 13312000 (13000.000 Kbytes, 12.695 Mbytes)
      Average transfer rates: 48124 bytes/sec, 46.996 Kbytes/sec
     Number I/O's per second: 93.992
      Total passes completed: 0
       Total errors detected: 0/1
          Total elapsed time: 04m36.62s
           Total system time: 00m06.49s
             Total user time: 00m02.54s
               Starting time: Mon Mar 15 22:33:07 2010
                 Ending time: Mon Mar 15 22:37:43 2010

dt: All requests completed before cancel...
dt: Closing file '/dev/sda1', fd = 3...
dt: Attempting to reopen file '/dev/sda1', open flags = 0 (0)...
dt: File '/dev/sda1' successfully reopened, fd = 3
dt: All requests completed before cancel...

Read Statistics:
     Total records processed: 26000 @ 512 bytes/record (0.500 Kbytes)
     Total bytes transferred: 13312000 (13000.000 Kbytes, 12.695 Mbytes)
      Average transfer rates: 363716 bytes/sec, 355.191 Kbytes/sec
     Number I/O's per second: 710.383
      Total passes completed: 1
       Total errors detected: 0/1
          Total elapsed time: 00m36.60s
           Total system time: 00m07.14s
             Total user time: 00m02.71s
               Starting time: Mon Mar 15 22:33:07 2010
                 Ending time: Mon Mar 15 22:38:55 2010
"

Here is a sample result of running vmstat:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  3      0  12008  70388  11544    0    0   352   342 1416 5549  5 13  0 82
 0  2      0  10032  71820  11544    0    0   143   453 1244 4925  9 10  0 81
 0  2      0  10412  71960  11544    0    0    14   427 1105 4173  2  7  0 91
 1  2      0  10440  71960  11544    0    0     0   311  974 3968  2  5  0 93
 0  1      0  10440  71960  11544    0    0     0   232  916 3535  3 42  0 55
 1  1      0  67256  16584  11544    0    0  1532     0 2588 12807 53 40  0  8
 1  0      0  47696  36696  11544    0    0  2011     0 3130 16724 17 52  0 31
 1  0      0  27060  55848  11544    0    0  1915     0 2994 17108 22 53  0 25
 1  1      0  81040   2840  11544    0    0  1769     0 2847 17386 44 42  0 14
 0  2      0  73004  10804  11544    0    0   796   192 1782 7877  8 24  0 67
 1  3      0  70456  13344  11544    0    0   254   230 1192 4459  3 10  0 88
 1  3      0  67684  16580  11544    0    0   324   250 1300 5086  5 12  0 83

Comment 10 Paul Kovitz 2010-12-20 16:12:57 UTC

In my home setup, I am seeing my Western Digital Elements 500 GB drive disconnect under 0 stress conditions. The drive was connected with no activity for most of the day. Three times, I had to power of the SB, plug the drive into a PC, delete the Squeezebox folder using a computer, power up the SB Touch, plug in the drive, and wait for the server to re-find all the files. Sometimes when this happens, the SB seems to find 2x the number of songs than what I have on the drive. I cannot reliably repeat this test. I have a SB Boom and a SB Radio hooked to the server, but I have seen this issue happen when nothing else is attached. As far as I know, I am using the latest FW version.

Comment 11 Michael Herger 2012-03-12 23:50:24 UTC

Vahid is no longer working for us.

Comment 12 Michael Herger 2019-01-25 10:11:41 UTC

TinySC will not be updated any more.