?

Log in

Linux Killed My Laptop's Hard Drive - Tothwolf
October 21st, 2008
07:15 am

[Link]

Previous Entry Share
Linux Killed My Laptop's Hard Drive
...yes, really...although it wasn't directly caused by the Linux kernel itself. It has more due to the fact the Linux distribution is responsible for configuring ACPI power management settings and many Linux distributions don't change the defaults. [...and the software packages that can adjust the default settings were not installed by default on my machine.]

If you run Linux on a laptop or otherwise use Linux with a laptop hard drive (maybe in an external enclosure?) you really should read this post instead of just skimming it.

[This is a follow up to my earlier post about my laptop's hard drive troubles.]


I'm never really satisfied until I know the cause of a hardware failure so I spent quite a bit of time Sunday night and Monday afternoon trying to figure out exactly what caused the drive to suddenly start having problems...

I've had Linux on this laptop since I put it into service a few years back but I found out Sunday that the smartmontools package wasn't installed by default. After I finished copying all my files over to the old P166 I set about installing it.

Here is the output from 'smartctl -A /dev/hda':
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   063   063   033    Pre-fail  Always       -       1
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       879
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   053   053   000    Old_age   Always       -       20827
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       171
191 G-Sense_Error_Rate      0x000a   099   099   000    Old_age   Always       -       131077
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       17
193 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       2131639
194 Temperature_Celsius     0x0002   114   114   000    Old_age   Always       -       47 (Lifetime Min/Max 17/62)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       61
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
Most of these numbers are well within what would be expected for this drive. This particular laptop is used as a surrogate desktop system so it stays powered up pretty much all the time so the high Power_On_Hours value is to be expected.


I disabled the power management for the drive (or so I thought) via the laptop's BIOS ages ago to keep the drive from constantly spinning down. It turns out that didn't completely disable all the power management functions and the hard drive's power management has been operating in "Low Power Idle" mode and not "Active Idle" or "Disabled".

This line from smartctl's output tells the full story:
193 Load_Cycle_Count        0x0012   001   001   000    Old_age   Always       -       2131639
Yes, that really is over 2.1 million head load/unload cycles...


Google turned up something about this problem that was submitted to Slashdot last October:
Ubuntu May Be Killing Your Laptop's Hard Drive

Note that I'm using Debian on this laptop, not Ubuntu, but this seems to be a common problem across many if not all Linux distributions.

Google also turned up these two lengthy discussions about the problem:
High frequency of load/unload cycles on some hard disks may shorten lifetime
laptop harddrive Load_Cycle_Count issue


Digging into the internals of the system, I find via 'hdparm -I /dev/hda' that even though I've disabled power management in the BIOS, the drive is still running in "Low Power Idle" mode:
Capabilities:
LBA, IORDY(can be disabled)
Standby timer values: spec'd by Vendor, no device specific minimum
R/W multiple sector transfer: Max = 16  Current = 0
Advanced power management level: 128
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns  IORDY flow control=120ns

I managed to find a copy of the Travelstar 80GN OEM Specification v2.0 (986k PDF) datasheet after following a link from this page.

Section 6.3.6 Load/unload:
"The product supports a minimum of 300,000 normal load/unloads."

Section 11.7.2 Active Idle Mode:
"In this mode, power consumption is 45-­55% less than that of Performance Idle mode. Additional electronics are powered off and the head is parked near the mid-diameter of the disk without servoing. Recovery time to Active mode is about 20 ms."

Section 11.7.3 Low Power Idle Mode:
"Power consumption is 60­-65% less than that of Performance Idle mode. The heads are unloaded on the ramp but the spindle is still rotated at the full speed. Recovery time to Active mode is about 300ms."

Section 13.33 Set Features (EFh), Note 3 (numbered page 154):
"Note 3. When the Feature register is 85h (=Disable Advanced Power Management), the deepest Power Saving mode becomes Active Idle."


Section 11.7.3 certainly explains a few things. Section 6.3.6 still doesn't tell me what the maximum rated number of load/unload cycles is for my 80GN series drive but later models of Hitachi drives tend to be rated at 600,000 cycles. It's probably safe to assume my drive is similarly designed and rated.

The drive really should have been running in Active Idle mode (0xFE / '-B 254') while on AC power but nothing in the system changed the default from Low Power Idle mode (0x80 / '-B 128'). Unlike with Windows, Linux is completely modular and the utilities that can change the default setting are separate from the kernel and base system installation.

Section 13.33 also explains why some of the people involved in the Ubuntu discussions linked above had luck using '-B 255' instead of '-B 254'. The '-B 255' hdparm option causes hdparm to issue the 0x85 command to the drive, which in the case of the 80GN series drives, is the same as issuing the 0xFE ('-B 254') command to put it into Active Idle mode.

A quick-fix command is: 'hdparm -B 254 /dev/hda' [for 0xFE (Active Idle)]

...of course that's a temporary fix and it's a little late now, but at least I know why the drive is starting to have problems.


Further research Tuesday morning shows that neither the acpi-support or the laptop-mode-tools software packages were installed on my laptop by default.

Debian's acpi-support package as of version 0.103-5 has support for changing the default power management mode for a laptop hard drive via 'hdparm -B'. The 90-hdparm.sh script found in /etc/acpi under the ac.d, battery.d, resume.d, and start.d directories contains this comment:
# This script adjusts hard drive APM settings using hdparm. The hardware
# defaults (usually hdparm -B 128) cause excessive head load/unload cycles
# on many modern hard drives. We therefore set hdparm -B 254 while on AC
# power. On battery we set hdparm -B 128, because the head parking is
# very useful for shock protection.
The laptop-mode-tools package can also control the hard drive's power management if CONTROL_HD_POWERMGMT is enabled in the /etc/laptop-mode/laptop-mode.conf config file. It has 3 other settings that allow fine tuning of the power management levels. These are the defaults:
BATT_HD_POWERMGMT=1
LM_AC_HD_POWERMGMT=254
NOLM_AC_HD_POWERMGMT=254
So...if the acpi-support package is installed and/or if the laptop-mode-tools package is installed and CONTROL_HD_POWERMGMT enabled, the power management level for the hard drive will be adjusted.


So who exactly is at fault for this problem?

The answer seems to be both everybody (Dell, Hitachi, and Debian) and at the same time, nobody.

It makes sense that the drive manufacturers default the drive to very aggressive power management settings that unload the heads and spin down the platters. This helps the drive survive minor shocks and bumps if used in a mobile system. It will help prevent head impacts on the platters and the resulting platter/head damage, data loss, and warranty returns. Of course, this is at the expense of a shorter drive life since spinning the drive down and unloading the heads both contribute to extra wear and tear, especially if the drive is going to be accessed regularly or spun right back up again.

From what I've read Dell has changed the BIOS screens in newer laptops so that this setting can be better adjusted. Supposedly enabling "Performance Mode" in their newer BIOS will take care of it. That still doesn't help those of us who are still using older laptops or people who don't know about that BIOS setting. It would be nice if Dell were to issue a BIOS update for these older laptops and make this problem better known but the way they probably see it is for them to do that would be admitting at least some sort of liability.

Debian and other Linux distributions really should be better at detecting the power management settings by default and changing the defaults as necessary. Windows does this to varying degrees but Linux offers much more control over these settings if the required utilities are installed. I haven't checked but hopefully Debian has since changed the priority level of the acpi-support package so it will be installed by default on ALL systems. The acpi-support package's default settings (at least in the version in the debian-unstable branch) would have saved a lot of wear and tear on my laptop's hard drive.

I guess it's a little late for my laptop's drive but maybe this information will at least save someone else's hard drive...

(2 comments | Leave a comment)

Comments
 
[User Picture]
From:shockwave77598
Date:October 21st, 2008 05:48 pm (UTC)
(Link)
I don't see why that would be worse for a laptop drive than a regular drive in a desktop. Are you telling me that Ubuntu is parking the head way more than it should be doing by default, thus aging the drive prematurely?
[User Picture]
From:tothwolf
Date:October 21st, 2008 06:21 pm (UTC)
(Link)
Are you telling me that Ubuntu is parking the head way more than it should be doing by default, thus aging the drive prematurely?

That's pretty much it.

Its not specific to Ubuntu though, it will happen with any Linux distribution or really even any OS that fails to change the default power management settings for the hard drive.

Unlike desktop drives, most modern laptop drives can retract the head without ever spinning down, which appears to be the default power management mode on this drive and many of the others out there. Most desktop drives if told to go into low power mode will retract the head and then spin down. Because this drive wasn't spinning down after I changed the power management settings in the computer's BIOS I had no way to know it was still retracting the heads so often.

The manufacturer specs for the 80GN series basically state that the drive is good for a minimum of 300,000 head load/unload cycles but their later series state 600,000 as the upper limit in the MTBF ratings. This drive has 2,131,639 load/unload cycles due to the drive operating in "Low Power Idle" mode and not "Active Idle" mode.

This breaks down as follows:

2131639 load cycles / 20827 power on hours = 102.35 cycles per hour
102.35 cycles per hour / 60 minutes = 1.71 cycles per minute

That is very excessive and has definitely shortened the lifespan of the drive.

Note that since I changed the power management mode to 0xFE (Active Idle) with 'hdparm -B 254 /dev/hda' the SMART Load_Cycle_Count counter has remained at 2131639. The machine has also become much more responsive but considering it takes the drive ~300ms to go from Low Power Idle mode to Active Idle mode when loading the heads its not that surprising really.

Oddly enough the drive still hasn't up and died and quit making noises yesterday. I obviously can't trust it now though but at least I got some advance warning.
Powered by LiveJournal.com