Jump to content


Photo
- - - - -

RAIDz 4TB Box Build Thread


  • Please log in to reply
6 replies to this topic

#1 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 08 May 2013 - 03:39 PM

Just ordered 3x WD Green 2TB drives for a RAIDz project, so I'm going to start a build thread! I'll post pics of the hardware along with specs and progress. Right now there's a few questions to answer:

 

-- Which is more important, a good NIC or good SATA controller

-- Is onboard SATA 1 sufficient

-- What is the performance gain of aggregated gigabit Broadcom NICs to hosts with Intel adapters

-- Do jumbo frames pay off for having to segment the network

-- Which network sharing protocol gives the best results in a mostly-NIX environment

 

I'll be using surplus server hardware from Rackable Systems and FreeBSD-CURRENT for its zfs/raidz support.



#2 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 09 May 2013 - 08:30 AM

Initial pics of the hardware:

Attached File  DSC03046.JPG   115.78KB   5 downloads

 

The base system is an older Rackable Systems 2U half-depth chassis. The showed up ultra-cheap around a year ago on eBay after Rackable Systems bought SGI. I think I paid $35 + shipping for the base system, which included:

 

2x dual core Opteron 270 processors (4 cores total)

16 GB ECC DDR2

3Ware 9550S SATA-1 RAID controller

4 hot swap trays

 

Attached File  DSC03047.JPG   122.77KB   7 downloads

 

Top off, you can see the drive trays are a swappable unit. These came in SCSI and SAS flavors too, so if you get one cheaply with the wrong tray, you can swap it out.

 

Attached File  DSC03048.JPG   130.09KB   4 downloads

 

The tray folds up after you loosen two thumb screws. The motherboard is a standard ATX server board, so if/when I decide to upgrade, I don't have to find a special motherboard for it.

 

Attached File  DSC03049.JPG   142.28KB   4 downloads

 

The single expansion slot is full 3.3V PCI-X. Currently there's a 3Ware SATA-2 RAID controller in there. The controller's RAID capabilities aren't important as we'll be running the disks in JBOD mode, letting ZFS handle the RAID.

 

Attached File  DSC03050.JPG   83.68KB   2 downloads

 

Hot swaps are fairly spartan. This is the boot drive, I'm still waiting on the new drives to arrive.



#3 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 11 May 2013 - 05:28 PM

Drives arrived Friday:

 

Attached File  DSC03051.JPG   115.4KB   2 downloads

 

I ordered three Western Digital WD20EARX 2 TB drives from TigerDirect. I also had to order some #6-32 1/4" long flat head screws for the drive trays. The Rackable Systems boxes were used equipment and had the drives removed prior to sale, so they didn't come with any of the screws. I ordered mine from McMaster-Carr for less than $4. Here's everything on my desk at work:

 

Attached File  DSC03053.JPG   101.91KB   3 downloads

 

And here's the new drives mounted in their trays. The numbers are just what happened to come with my chassis, they'll be relabeled later:

 

Attached File  DSC03054.JPG   108.88KB   9 downloads

 

I already have FreeBSD 9.1-RELEASE installed on another drive in slot 0 of the RAID box. It's just a vanilla 64-bit setup, no tuning or anything. I don't think anything except midnight commander was installed from packages/ports. RAID-Z setup is pretty straightforward, if you just want to add a disk to the pool or create a mirrored/raidz volume for playing around. My initial plan was to create a raidz volume from the three drives with no tuning, just to get a feel for baseline performance.

 

Of course, that wasn't going to happen. Right away I had what I initially suspected to be a drive failure...one of the drives would drop out of the array and report communications errors. This isn't super-uncommon for new hard drives, unfortunately. So I created a single-disk ZFS volume on one of the other drives for testing.

 

Initial tests showed that something was definitely wrong. Writes to the disk were reaching less than 5 MB/sec. I tried destroying the ZFS volume and rebuilding it on a GPT partition aligned for the 4K sectors that modern drives actually use (almost all of them lie to the OS and report 512 byte sectors). Little to no improvement. Reformatting with a standard FreeBSD UFS filesystem resulted in better write performace, around 20 MB/sec, but that's still far slower than I would expect from modern drives in a SATA-II controller in a PCI-X 133 MHz slot.

 

To isolate the problem, I built a test system from an Intel D252MW Atom Mini-ITX board and some older spare SATA drives I had kicking around. It's a 1.6 GHz dual-core 64 bit Atom with 4 GB RAM, so it should be a much less capable system than the Rackable Systems box. FreeBSD 9.1-RELEASE was installed in the same manner on the boot drive. Here's the pile-o-hardware sitting on my desk at home:

 

Attached File  DSC03055.JPG   181.12KB   10 downloads

 

Surprisingly enough, ZFS performance on this setup was actually very good. With an old Seagate Momentus 5400 RPM 120 GB laptop hard drive, I was getting 50-60 MB/sec throughput, which is about what I'd expect from any modern FS with that drive. I brought the new 2 TB WD drives home from work this morning to test with the test setup. The drive that consistently reported errors works fine!

 

It's looking like the source of the problem may be the 3Ware RAID controller. Apparently it is known for really awful write speeds if the write cache isn't in "perfomance mode," which is not a safe way to run if you don't have the battery module for the controller (I don't). I'm going to put the OS drive from the test setup in a second Rackable Systems box I've got for testing later tonight.



#4 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 11 May 2013 - 10:22 PM

I moved the test setup into another Rackable Systems box I have. This one is the same base system, but has two single-core Opteron 250s and only 2 GB RAM. It uses the same 3Ware 9500 SATA RAID card that originally came with the other system. The project has expanded to nearly all of my desk, requiring a cleanup:

 

Attached File  DSC03056.JPG   135.72KB   11 downloads

 

I reinstalled FreeBSD since apparently the 3Ware controller does something to disks it exports as JBOD and the original partition got wrecked. I noticed it took forever to do a base install (I let the other box run while doing other things at work, so I didn't observe the entire install). Once I got it up and going, ZFS produced the exact same miserable transfer rates (less than 5 MB/sec) with the new 2 TB disks as well as an older 160 GB Seagate Barracuda. Definitely not the disks then.

 

Next I tried a suggestion I'd read about in trying to diagnose my problem: configuring the 3Ware RAID controller to export the disks as "Single Unit" RAID devices with the write cache turned on. Now, without a battery backup, this is a fairly dangerous thing to do anyway, but it's even worse with ZFS since the RAID controller lies to the kernel driver, reporting that data has been committed when it may actually still be sitting in the cache. I gave it a go since this was just a test, and sure enough the write rate jumped up to 80 MB/sec, a completely reasonable number for a WD Green drive on a SATA 1 controller.

 

The takeaway from this misadventure seems to be that, for ZFS, you want the most minimal HBA you can find. Motherboard ports > expensive RAID controller. I'm looking for a cheap PCI-X based HBA as a replacement at the moment.



#5 Treewizard420

Treewizard420

    The phorce is with me!

  • Members
  • 70 posts
  • Location:New York

Posted 12 May 2013 - 01:52 PM

very sexy!!!!



#6 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 23 May 2013 - 04:34 PM

The replacement SATA adapter came in this week. I chose a Silicon Images 3124 based PCI-X adapter:

 

Attached File  sil3124.jpg   75.22KB   3 downloads

 

I've had good experience with Silicon Images adapters in the past, but I always run them in HBA mode with software RAID. This particular board seems to be their "reference implementation" for the 3124 chipset. As it came, it would lock up on boot when a drive > 500 GB was connected to it. I had to flash the latest firmware (version 6.6.0) available from their website.

 

Apparently hardware manufacturers still use MS-DOS for BIOS updates. This is a problem because most people don't have a DOS system running anymore, and the machine I was installing on has no floppy/CD drive. Fortunately, you can use FreeDOS to do it. I used a prebuilt 256 MB image from here:

 

http://chtaube.eu/te...s/bootable-usb/

 

Copy it to your flash drive with `dd` and then mount the drive to copy over the BIOS image and Flash loader. Silicon Images provides two images for the 3124 in a standalone adapter. One is a "base" image, which is SATA HBA mode only, the other is a SoftRAID image. I used the base image as I have no interest in software RAID.

 

Here's a pic of the card installed:

 

Attached File  DSC03074.JPG   143.1KB   2 downloads Attached File  DSC03075.JPG   136.92KB   4 downloads

 

The card fit fine even though the previous card had the SATA connectors on the rear of the card (opposite the card bracket). The new BIOS fixed the system hang and FreeBSD booted from the old root drive -- I had to mount the root partition manually since the device name changed from da0 to ada0.

 

As soon as I tried to create a new raidz volume, I immediately started getting hardware disconnect errors from the SATA controller. I'd experienced these errors before, so I grabbed the hot swap tray out of the second Rackable Systems chassis I have. Same errors on the same slots. I was mostly convinced that either both hot swap backplanes were bad or they wouldn't support SATA II datarates, but decided to try the Silicon Images card in the second system:

 

Attached File  DSC03077.JPG   161.83KB   3 downloads

 

Good thing I did, because this system is currently up and running with zero errors! I haven't yet determined if the problem is in the motherboard or the PCI-X right angle riser from the other chassis. In any case, I've been copying data back and forth to the box all afternoon in an effort to give the disks a workout. Here's a `df -h`:

$ df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p2    897G     28G    797G     3%    /
devfs          1.0k    1.0k      0B   100%    /dev
storage        3.6T    120G    3.5T     3%    /storage

Excellent, 3.6 TB of storage! Disk-to-raidz transfers are > 150 MB/sec on the same controller, and I've achieved > 290 MB/sec from ramdisk to the raidz array with compression turned on.

 

I also gave link aggregation a try using the FreeBSD lagg(4) driver. My switch is a layer 2 managed device, and supports LACP, so I tried it. It works in the sense that I get an aggregated link with fault tolerance. The bandwidth with the two onboard Broadcom PCI-X GbE interfaces is less than 45 MB/sec. So Broadcom cards still suck.



#7 systems_glitch

systems_glitch

    Dangerous free thinker

  • Moderating Team
  • 1,623 posts
  • Gender:Male

Posted 10 June 2013 - 12:30 PM

Just an update, this box is now in "production" use with 2x Opteron 250s installed. 1-minute load average during NIC saturation is over 5 with 150% CPU utilization, so apparently more and/or faster cores are required. I'm going to swap in a pair of dual-core Opteron 270s for testing.






BinRev is hosted by the great people at Lunarpages!