Craig's Linux Notes: Disks

Last modified: 05/11/2008

Contents

SMART
What is SMART?
Locating bad sectors in an LVM2 partition
smartctl commands
smartd
I/O Speed
SATA
SATA and hdparm
SATA and NCQ
Partitioning and Formatting
Choice of file systems
Repartitioning
Spin-down
SCSI Emulation
Moving to a new drive checklist

SMART

hdparm was very slow. Then I found a lot of these errors in /var/log/messages:

hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=357843, high=0, low=357843, sector=357839
ide: failed opcode was: unknown
end_request: I/O error, dev hdb, sector 357839

Some on the web had seen these messages before and advised they spelled disaster, that the disk was close to death and/or the power supply was bad. Simply incorrect. The disk's internal drive diagnostics were telling me that the problem was a bad sector the drive could not recover on its own.

What is SMART?

Internal drive diagnostics? Recover on its own? How's that work?

Most modern disks have built in monitoring software, called SMART, that makes predictions of future failure based on current and past operation. SMART monitors both the disk as a whole is monitored and individual disk sectors (usually 512 bytes). When a disk sector is nearing failure, the drive automatically moves the data to a spare sector. How does it know that failure is imminent? When you store 512 bytes, more than 512 bytes are actually used by the drive. The extra data are redundant, and are used to verify the integrity of your data (like a checksum), and there is enough redundancy that the drive can actually correct some number of read errors. Although your main processor never sees the errors, the drive knows whether corrections were necessary and how many. More corrections mean more problems with a sector, and sector failure is coming. CDs and DVDs do this, too, except they can't move data out of bad sectors.

My problem arose because a sector suffered more damage than could be corrected before it was next read. Because the drive hadn't visited the sector in a while, suddenly there were too many errors to correct. The data was gone, and the drive can't deal with it. The solution was to manually locate the damaged disk sector and write new data into it. When I replaced the sector contents, I gave the drive the opportunity to relocate the sector. The drive will insure the bad sector will never be used again.

To locate the bad sector, I used smartmontools' smartctl. It verified the bad sector location, and verified that I corrected the problem. I now use smartd to periodically test each drive and notify me of health changes. smartctl also told me that the disk had been on for 7,839 hours (nearly a year), had been powered on 535 times, temperatures had been well within normal operating limits, and that this disk was relatively young.

To locate the bad sector, use the Bad Block How-To on the smartmontools site. The instructions apply to an ext2/ext3 filesystem formatted directly on a disk partition. My problem was made slight more complex because my partitions are managed by LVM2.

Locating bad sectors in an LVM2 partition

First off, read the Bad Block How-To and understand how it works without LVM. Then do this:

  1. Follow the How-To's First Step to locate the disk partition containing the bad sector. Record the name of the partition containing the bad sector and the partition's starting sector, S, and the sector size, T (usually 512 bytes). If the partition type is "Linux LVM", don't bother looking in /etc/fstab or calculating the sector offset.

    In my case, my disk partition is /dev/hdb1, S = 63 sectors, T = 512 bytes, and my bad sector is L = 357843.

  2. Locate the physical volume associated with the disk partition. Refer to /etc/lvm/backup/volume-group-name and record the extent_size and pe_start. This is the offset from S where the first logical volume starts.

    In my case, extent_size = 8192 and pe_start = 384

  3. Locate the logical volume containing the bad sector. This is the difficult part. Logical volumes may not be contiguous, striped across multiple physical volumes, and/or not "linear". That makes things complicated. Fortunately (for me), the bad sector was located in a contiguous logical volume which was not striped. It helps to convert the logical volume's start_extent and extent_count from LVM extents into sectors:

  4. sector-offset = extents * extent_size (1)

    Convert the logical volume start_extent into sector offset, SLV

    In my case, start_extent = 0 and extent_count = 12800, corresponding to SLV = 0 and a logical volume count of 104857600 sectors. Sanity check: is this contained in the disk partition?
  5. If your logical volume is contiguous, the bad sector's offset from the beginning of the filesystem can be calculated with:

  6. SFS = L - SLV - pe_start - S (2)

    In my case, this works out to SFS = 357843 - 0 - 384 - 63 = 357396.

    If your logical volume is non-contiguous, equation (2) is a good start, but not complete. It produces an offset further into the file system than the sector actually is. You must subtract the amount of space between contiguous regions (because these are gaps not used by your filesystem). Use equation (1) to figure out how many sectors to subtract, and then convert these into file system blocks with (2). If your non-contiguous regions do not follow each other (that is, a later portion of the file system preceeds an earlier portion on physical disk), your math will be more complicated. I suggest drawing a diagram of the physical disk, and working out where your logical volumes are located.

  7. Find your file system's block size, B. See the How-To's Second Step. Specify your LVM file system.

    In my case, this means running tune2fs -l /dev/back/backup | grep Block, resulting in B = 4096.

  8. Calculate the File System Block containing the bad sector. This step is similar to the How-To's Third Step, but in our case the sector offset calculation has already been done.

  9. b = int(SFS * T / B) (3)

    In my case, this works out to b = int(357396 * 512 / 4096) = int(44674.5) = 44674.

  10. That's it! Proceed from here with the How-To's Fourth Step.

    Caution: Read through the entire How-To before modifying your disk. There's a nice part at the end which shows how to test a range of 70 sectors around the detected bad sector:

    [root] # export bad=357396
    [root] # export i=$((bad-70))
    [root] # while [ $i -lt $((bad+70)) ]; do
    > echo $i
    > dd if=/dev/back/backup of=/dev/null bs=512 count=1 skip=$i
    > let i+=1
    > done

    Also useful is to verify that the file you found is actually the one containing the bad sector. Don't modify a sector which you cannot confirm as bad! The How-To suggests using md5sum filename. Simply accessing the file with md5sum will result in error messages appended to /var/log/messages

    After completing your repair, run smartctl -t long /dev/hd? to verify that you have no more trouble waiting. It could take several hours, but you can use the disk while it runs.

smartctl commands

smartctl -i /dev/hdaPrints a bunch of drive info, including whether SMART is currently enabled.
smartctl -d ata -i /dev/sda"-d ata" is needed for SATA drive.
smartctl -H /dev/hdaGives a simple overall health indication. Bad health means imminent failure, take immediate action.
smartctl -t short /dev/hdaStart short test. Check back for results later.
smartctl -l selftest /dev/hdaDisplay self-test log.
smartctl -l error /dev/hdaDisplay error log. Five most recent non-trivial errors are shown. These are never cleared.
smartctl -A /dev/hdaReturn vendor-specific SMART attributes. VALUE is the current normalized attribute value. WORST is the lowest recorded VALUE. THRESH is the service limit -- pay attention when VALUE or WORST approach THRESH.

smartd

smartd is a daemon which periodically runs tests with smartctl and logs any errors found. See /etc/smartd.conf

I/O Speed

Check disk I/O with hdparm -tT /dev/hda

Drive Device Cached reads Buffered disk reads
Seagate Barracuda 300 GB (IDE) /dev/hda 1.8 Gb/sec 65.6 MB/sec
Western Digital Caviar 250 GB (IDE) /dev/hdb 1.8 Gb/sec 57.9 MB/sec
Seagate Barracuda 500 GB (SATA)/dev/sda1.8 Gb/sec60.6 MB/sec

Enabling DMA (-d1) is the only option that makes a difference. Other options (e.g. UDMA) are set to optimal values by the kernel driver. Gentoo configuration file: /etc/conf.d/hdparm. Playing DVDs requires DMA enabled.

SATA

SATA: I upgraded my older 250 Gb IDE drive to a 500 Gb SATA drive. The kernel did not recognize it. To get it to work, I did the following:

  1. BIOS: Enable SATA Combined, Enhanced, or AHCI (if you have it). Insure that BIOS reports the drive.
  2. Enable SCSI_SATA and SCSI_ATA_PIIX (see Kernel, SCSI devices) and rebuild kernel. This will be my boot device, so I built these drivers into the kernel.
  3. The SATA disk device will be the next available /dev/sd?. On my system, this was /dev/sda. If unsure, look in /proc/partitions.

SATA and hdparm

I encountered a lot of trouble with my system after installing a SATA drive. I traced the problem to hdparm. Use of hdparm for this drive caused an IRQ storm (an rapid, unending stream of interrupts from the device with no interrupt handler in the kernel). The kernel disables the IRQ after 100,000. This caused trouble for my USB printer because it was sharing the same IRQ. Solution: don't use hdparm on SATA drives! I removed my hdparm boot-up directives, and replaced them with the kernel boot parameters "ide0=ata66 ide1=ata66" (because I am using 80 conductor cables) -- results in equivalent performance.

Note: Use sdparm for SCSI/SATA drives. However, hdparm -tT is OK for SATA.

SATA and NCQ

NCQ = Native Command Queueing. This feature allows the kernel to send multiple outstanding requests to the disk drive rather than waiting for each one to complete before sending the next. This allows the kernel in some cases to return control to the application sooner, resulting in better performance. If your hardware supports it, this is a good thing to enable.

My SATA disk claims to support NCQ, but my motherboard does not. A disk controller supporting full AHCI is required to utilize NCQ. My MB has an Intel ICH5 controller, and this does not support full AHCI.

Partitioning and Formatting

Choice of file systems

Everyone wants a high performance file system. Linux's default file system, ext3, is not new technology, and several others (reiserfs, JFS2, XFS) are faster. How much faster? Various benchmarks are available, and they all indicate "somewhat faster", but not extraordinarily so. Here's a recent benchmark.

Which one to choose? I use ext3 for these reasons:

  1. Reliability. It works and has never let me down. ext3's journalling means fast and reliable recovery if my system crashes. I came across some e-mails complaining that JFS2 and XFS did not recover as well after a crash.
  2. Actively supported. Not clear who's working on JFS2 or XFS.
  3. Many tools available. If my system won't boot, all recovery disks support ext2/3.
  4. My system is a personal computer, not a server. I do many different things with it, and I need an all around good performer.
  5. Fast enough. Not the fastest, but fast enough for my needs. But speed doesn't help much if files are lost or damaged in a crash. Reliability is #1.
  6. Hardware improvements: I now have a SATA drive that can stream at 3 Gb/sec and can queue up to 31 commands, but I can't use these features with my current motherboard. A hardware upgrade could improve my effective disk speed much more than switching file systems.

Repartitioning

Spin-down

With my new, larger SATA drive installed, both of my IDE drives are now used for nightly backups. I don't need them spinning otherwise: they create extra noise and heat.

SCSI Emulation

Moving to a new drive checklist

New drives on the market are larger, faster, and cheaper than what you've got now. How to move your entire system to a new drive?

    1. Research: Does your kernel version and motherboard support this drive?
    2. Recovery: Get a recent boot recovery boot CD for your system. Just in case.
    3. Review space requirements. Assuming you keep the same partition structure, how much larger do you want them on the new drive? Especially important for real partitions, not so important for LVM partitions. Review the output of fdisk -l for your current drives, and use the raw partition size rather than formatted size.
    4. Shut down hardware, install new drive, and reboot.
    5. Configure BIOS so drive is recognized.
    6. Configure kernel to recognize drive. Build necessary drivers into kernel, not as modules.
    7. Reboot to new kernel.
    8. Partition new drive.
    9. Format real partitions and configure and format LVM partitions.
    10. Review fsck intervals with tune2fs.
    11. Switch to single user mode.
    12. Transfer data from old drive to new. This script will help: system-copy
    13. If new drive will have a new device name:
      1. Replace device name in /etc/fstab
      2. Replace device name in /boot/grub/grub.conf
      3. Replace suspend2 swap device in grub.conf, kernel, and /etc/hibernate config file.
      4. Install grub boot loader on new drive: grub-install --root-directory=/mnt/new /dev/...
    14. Modules now compiled into kernel should be removed from /etc/modules.autoload.d/kernel-2.6
    15. Reboot.
    16. Configure BIOS to boot from new drive.
    17. Boot to new system.
Copyright © 2003-2007 Craig Lawson
Index no-thank-you-spam-i-am@i-hate-spam.spam.spam Document made with Nvu