Reflections on Sans Digital TR5UT External USB/eSATA RAID Enclosure

http://www.sansdigital.com/images/stories/products/TR5UT/tr5ut_2.jpg

After having used this enclosure for about two years, I feel qualified to comment on my experiences with it, and best practices that I would recommend.

Disk choice and reliability

The enclosure does a decent job at cooling disks with a large, relatively quiet fan in the rear.

I would recommend using at least two different models of disk when building the array for two reasons.  The first reason is that similar disks have similar failure patterns and this reduces the probability of experiencing a multi-disk failure.  The second reason is so that it is more likely to find another replacement of the same nominal size down the road which has the minimum number of LBAs needed to rebuild into the array.

I started with Hitachi 2GB 7K2000 and WDC 2GB EADS drives.  I am not impressed with the 7K2000 since two of them have failed.  One of the WDC EADS has failed.  I replaced one of the failed 7K2000 with a 7K3000 which I am much more impressed with. When RMAing the EADS (512 byte block), I received a EARS (4KB block)  back from WDC.  Perhaps WDC considers these equivalent, but I do not.  This enclosure was designed in the days of 512 byte blocks and I do not know if using 4KB block disks, much less mixing 4KB block disks with 512 byte block disks, is a good idea.

Note: WDC disks such as the EADS frequently have a power-saving setting by default which will need to be removed with the WDIDLE utility.  This setting will cause frequent head load cycles, which shortens the life of the disk and causes erratic latency issues.  The disk will need to be removed from the array to check and to disable this setting.

Performance

With a 4-disk 2GB 7200RPM RAID5 (6TB logical volume), I get around 200MB/s read performance with an Athlon II X3 and DDR2 RAM.  Not too bad for a single eSATA cable and SATA controller port.  This is using JMB393 in RAID mode.  If it was used in PMP mode the performance would likely be much slower due to having only the single USB2.0 or eSATA connection.

Monitoring RAID health

The enclosure monitors SMART health of the member disks but does not perform periodic media scans.  This is a problem because large disks with infrequently accessed data can allow surface defects to grow unnoticed, leading to a potentially catastrophic situation when the same thing happens to more than one disk.  I have a cron job set up every two weeks to use dd (nice-d to 10) to block-read the entire array at a time of day when load is typically minimal.  This way failing disks will be kicked out of the array and replaced before the problem grows larger.

JMicron’s “HWRaidManager” utility for the JMB393 really sucks.  It is a GUI application, necessitating the setup of a headless X server just to host it.  It takes a long time to start since it scans for every possible SCSI device that could exist on a Linux system and checks for the presence of the JMB393.  I have also had sporadic problems with it failing to start up on certain kernels.  Currently on 3.x kernels it seems to be working.  It sends private, undocumented commands through the Linux ‘sg’ interface to configure the RAID and to check status.  JMicron refused to release to me any details of programming this interface, so there is no Free Software way to deal with the JMB393 at present for monitoring purposes.  There is likewise no way to access the individual disks behind the JMB393 for purposes of SMART monitoring without having this programming interface.

Problems

I have had several disk failures over the course of years.  Some were made worse due to poor software choices (i.e. using btrfs before it is production ready and when still suffering from other kernel bugs).  Mostly, the enclosure seems to have no problems figuring out which disk is the spare disk and automatically failing over to it and rebuilding when necessary.  The JMB393 GUI monitoring app, most of the time, will alert you via email to problems it detects.

Occasionally if you re-use disks the JMB393 will not be able to figure out that a disk with old data on it is intended to be a spare.  I have found that just wiping the start and end of the disk is not enough in this case; the entire disk had to be wiped.  I am not sure where the JMB393 physically keeps its metadata on the disk.

If a multi-disk failure is experienced where a disk is kicked out and a rebuild to spare is attempted when yet another source disk has unreadable blocks, the RAID rebuild can take a very long time, a week’s time even.  I recommend that you allow this process to continue as long as the lights on the front of the case are indicating a rebuild in progress.  I attempted to work around it by cloning the disk with bad sectors to a new disk using dd_rescue, then inserting the new cloned disk in its place.  However, the JMB393 would not accept this cloned disk as a source for rebuilding the RAID to spare.  I do not know what the JMB393 is using to identify a unique disk, perhaps the serial number, which would foil such a substitution attempt.

Finally, the internal Mini-ITX format power supply failed.  It was a model CFI-250AT-1U.  The enclosure would not power on.  I replaced it temporarily with an external AT-style PC power supply.  There are two Molex power connectors on the TR5UT mainboard which power the logic and all of the disks.

Conclusion

I think this enclosure was worth the money.  I paid $179.98 shipped in 2010.  It is fast enough for a network fileserver, torrents, and networked streaming media, and spared me from having to build another server just to house a large amount of disks.  The only drawbacks are that the power supply is not good quality and that there is not a scriptable, command line RAID management and monitoring tool like Unix admins are used to having.  JMicron’s refusal to provide any documentation for the JMB393 management and monitoring interface is also a turn-off.

 

Leave a Reply