Drive Error - Can't save mail

MichaelConnors · March 12, 2007, 8:51pm

Last week I did an update of Virtualmin. Then I could’t recieve emails even though my logs said the mails were delivered. I check the Maildir and no mail was delivered to the NEW folder.

I figured sine there were Smart errors that my HDD has failed.
Iinstalled anew HDD and rebuild the virtual server.

Now the same thing is happening all over again. It was working fine for a day then I get the same SMART error (see bleow) and no mail is being processed.

Any suggestions would be nice.

/var/log/messages

Mar 12 12:32:43 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 13:02:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 13:32:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 14:02:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 14:09:02 server saslauthd[[5420]]: do_request : NULL password received
Mar 12 14:32:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 15:02:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 15:29:05 server saslauthd[[5420]]: do_request : NULL password received
Mar 12 15:32:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 16:02:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors
Mar 12 16:32:42 server smartd[[1947]]: Device: /dev/hda, 23 Currently unreadable (pending) sectors

The mails does come in… but never gets to the sent to mail dir… perhaps spamassassin?

/var/log/maillog

Mar 12 16:31:02 server postfix/smtpd[[25314]]: connect from catv-50637efb.catv.broadband.hu[[80.99.126.251]]
Mar 12 16:31:03 server postfix/smtpd[[25314]]: 515276F82E0: client=catv-50637efb.catv.broadband.hu[[80.99.126.251]]
Mar 12 16:31:03 server postfix/cleanup[[25318]]: 515276F82E0: message-id=000001c764e4$c55ee480$0100007f@localhost
Mar 12 16:31:03 server postfix/qmgr[[5377]]: 515276F82E0: from=eurocontroll.com@webvacancy.com, size=2722, nrcpt=1 (queue active)
Mar 12 16:31:04 server postfix/smtpd[[25314]]: disconnect from catv-50637efb.catv.broadband.hu[[80.99.126.251]]
Mar 12 16:31:04 server postfix/local[[25319]]: 515276F82E0: to=michael-webenergy.ca@server.webenergy.ca, orig_to=michael@webenergy.ca, relay=local, delay=0.96, delays=0.76/0.03/0/0.17, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
Mar 12 16:31:04 server postfix/qmgr[[5377]]: 515276F82E0: removed
Mar 12 16:34:24 server postfix/anvil[[25316]]: statistics: max connection rate 1/60s for (smtp:80.99.126.251) at Mar 12 16:31:02
Mar 12 16:34:24 server postfix/anvil[[25316]]: statistics: max connection count 1 for (smtp:80.99.126.251) at Mar 12 16:31:02
Mar 12 16:34:24 server postfix/anvil[[25316]]: statistics: max cache size 1 at Mar 12 16:31:02

Procmail is screwed up I think also…

/etc/procmailrc

:0wi
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
:0
:0

^X-Spam-Status: Yes
$HOME/monkeypharmacy/homes/ken/Maildir/.spam/
:0
^X-Spam-Status: Yes
$HOME/Maildir/.spam/
:0
^X-Spam-Status: Yes
$HOME/monkeypharmacy/homes/ken/Maildir/.spam
:0
^X-Spam-Status: Yes
$HOME/Maildir/.spam/
:0
^X-Spam-Status: Yes
$HOME/Maildir/.spam/
?/usr/bin/test "$VIRTUALMIN" != ""
{
INCLUDERC=/etc/webmin/virtual-server/procmail/$VIRTUALMIN
}
DEFAULT=$HOME/Maildir/
ORGMAIL=$HOME/Maildir/
DROPPRIVS=yes
:0
$DEFAULT

Joe · March 12, 2007, 9:15pm

Hey Michael,

If you’re getting drive errors, then it’s one of three things:

Actual problem with the drive. Since you’ve replaced it, we can probably (but not certainly) rule this out. (When I ran a hardware company, I once got in a bad batch of drives, in which 100% of six or seven drives failed within a week of being put into service.)
Motherboard or controller problem. Whatever hardware is driving the disk could be failing.
Driver bug, or incompatibility.

Procmail can’t possibly cause drive errors. Nothing in userspace can. Only the kernel or the hardware itself can trigger this kind of error.

You can test disks with the badblocks command, but that won’t necessarily narrow down the source of the trouble to kernel, disk, or controller. You’ll need to swap things around (e.g. change disks and change controllers), and run badblocks in each configuration, to isolate the specific problem.

Joe · March 12, 2007, 9:16pm

Oh, yeah, I’m still looking over your configuration to see if I can spot the trouble. Just because you’re getting smart errors doesn’t mean that is why you aren’t getting mail delivered.

MichaelConnors · March 12, 2007, 10:21pm

Thanks Joe,

I am running badblocks now. It’s really strange cause I was getting emails for about a day then it stopped.

Im trying to look deeper in my logs. Thanks for helping me figure out this problem.

Joe · March 12, 2007, 10:33pm

Hey Michael,

You’ll have to narrow it down to either hardware or kernel bug. In the case of kernel bug, you’ll still probably have to replace hardware, since not many folks are capable of fixing the kernel (I haven’t debugged in kernel driver in about five years–and the couple of times that I have, it took days of diving through the code).

In other words: You’ve gotta figure out what hardware needs to be replaced, and replace it.

I’m afraid that’s all that can be done, assuming you’re running the latest kernel for your Linux distribution, already.

MichaelConnors · June 7, 2009, 12:02pm

badblock results. is there a way to fix this?

/var/log/messages

Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:37 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:37 server kernel: Buffer I/O error on device hda, logical block 263684
Mar 12 18:26:37 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:39 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:39 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109757, sector=2109757
Mar 12 18:26:39 server kernel: end_request: I/O error, dev hda, sector 2109757
Mar 12 18:26:41 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:41 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109757, sector=2109757
Mar 12 18:26:41 server kernel: end_request: I/O error, dev hda, sector 2109757
Mar 12 18:26:41 server kernel: Buffer I/O error on device hda, logical block 263719
Mar 12 18:26:43 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }

MichaelConnors · June 7, 2009, 12:02pm

badblock results. is there a way to fix this?

/var/log/messages

Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:37 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:37 server kernel: Buffer I/O error on device hda, logical block 263684
Mar 12 18:26:37 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:37 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109472, sector=2109472
Mar 12 18:26:37 server kernel: end_request: I/O error, dev hda, sector 2109472
Mar 12 18:26:39 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:39 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109757, sector=2109757
Mar 12 18:26:39 server kernel: end_request: I/O error, dev hda, sector 2109757
Mar 12 18:26:41 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Mar 12 18:26:41 server kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=2109757, sector=2109757
Mar 12 18:26:41 server kernel: end_request: I/O error, dev hda, sector 2109757
Mar 12 18:26:41 server kernel: Buffer I/O error on device hda, logical block 263719
Mar 12 18:26:43 server kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }