I've just found this in the daily security output of my workstation (running 2.6.1) : Checking setuid files and devices: find: /usr/pkgsrc/sysutils/sysinfo/CVS/Entries: Input/output error find: /usr/pkgsrc/devel/avltree/CVS/Entries: Input/output error find: /usr/pkgsrc/security/p5-Crypt-RSA/CVS/Entries: Input/output error find: /usr/pkgsrc/ham/dpbox/CVS/Entries.Log: Input/output error find: /usr/pkg/openoffice.org: Input/output error find: /usr/pkg/openoffice.org3: Input/output error These directories can no longer be used: # cd /usr/pkg # ls -ld o* ls: openoffice.org: Input/output error ls: openoffice.org3: Input/output error I'm not sure if this is due to a bug in HAMMER or to some sort of hardware failure. The hard disk is a 500GB Western Digital Green model with a single hammer version 4 volume. The directories belong to the root fs. Filesystem Size Used Avail Capacity Mounted on PFS500GP 460G 400G 60G 87% / There's no relevant error message in dmesg. Is there any tool I can use to diagnose and fix this problem ? -- Francois Tigeot
Check with sysutils/smartmontools. I'd love to have a simplistic SMART monitoring daemon integrated into DragonFly which is run by default. This would be so cool and useful! Regards, Michael
I'd be all for that too. Actually, I'd be all for going one step further and write a program(or if one like that already exists, put it in base) that checks/monitors the health of the hardware. Bad hardware (drives, memory etc) are sneaky fuckers that tend to bite when one doesn't expect it. I dont know how one would go about making such software but going forward, a cluster of ie. 32 DF servers would be difficult to monitor for health and such software would just make it much easier. In my opinion. Petr
Thanks, this could be a very nice utility but I'm afraid the output is bit too cryptic for my tastes. I have included the most relevant part of the output of "smartctl -d sat -a /dev/da0" below: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 157 148 021 Pre-fail Always - 3116 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 135 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2218 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 133 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 131 193 Load_Cycle_Count 0x0032 147 147 000 Old_age Always - 160820 194 Temperature_Celsius 0x0022 109 093 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 -- Francois Tigeot
That's all normal except the Load_Cycle_Count is excessive. But
the Load_Cycle_Count is somewhat dependent on the manufacturer
and I don't see any errors reported in the output.
Another thing you can do is run 'hammer -f <device> show | less -B'
and search for lines beginning with the character 'B'. You would
type slash-carrot-B into LESS. The hammer show command checks
for problems.
-Matt
Matthew Dillon
<dillon@backplane.com>
There are many such lines. I have copied the first few ones:
BM NODE 800000268ea27000 cnt=33 p=800000268ea12000 type=L depth=4 mirror
000000013a77e4d0 fill=z8:19741=60% {
BM NODE 800000268ea31000 cnt=33 p=800000268ea12000 type=L depth=4 mirror
0000000139b6e140 fill=z8:19741=60% {
BM NODE 800000268eb0e000 cnt=32 p=800000268eb02000 type=L depth=4 mirror
0000000148774380 fill=z8:19741=60% {
BM NODE 800000268eb19000 cnt=32 p=800000268eb02000 type=L depth=4 mirror
0000000139b6e840 fill=z8:19741=60% {
--
Francois Tigeot
If the only flag after B is 'M' then it is ok, it's the mirror-tid
bug which was fixed in 2.6 but only effects incremental mirroring.
This is self-correcting so it isn't a concern.
If other flags are present then we have a real problem.
If the broken inodes are localized it could be stuff left dangling
from a crash recovery. That's my hope. Did you have any crashes
while work was occuring on any of the related directories? If this
is the case it should be possible to remove them with 'rm'.
-Matt
Matthew Dillon
<dillon@backplane.com>
Ok. I'm creating a complete log of the hammer show output. I'll grep for the Nope. The errors appeared out of the blue. At one time I suddenly found garbled contents in mails I was reading with This is not possible : # rm iconv.dir rm: iconv.dir: Input/output error -- Francois Tigeot
Francois, did the filesystem you started getting the I/O errors on
fill up at some point. I noticed that it looks very full.
That might be the common thread here between you and Jan.
-Matt
It never did. I think I had only about 20 GB free at some point, but then the 60-day snapshot limit kicked in and free space began to grow. I never did get this error before, even on test filesystems I managed to fill. -- Francois Tigeot
I have pushed a new hammer directive to HEAD called 'checkmap'.
It isn't complete yet but it should be good enough to verify that
all record data references are accounted for in the blockmap. I still
need to have it check B-Tree node allocations and other stuff.
hammer -f <device> checkmap
If it does not output any blockmap info then everything that it checks
(so far) is ok.
If you specify verbose (-v) it will output the blockmap info it
collects whether good or bad.
The directive scans the B-Tree so it will take a little while, but
nowhere near as long as 'show' because it doesn't dive the record data.
Also in the same commit I pushed some changes to the 'show' directive
to hopefully detect and report zone mismatches instead of assert.
--
Hopefully this will narrow down the problem space. I'd like you to
run both the hammer show and hammer checkmap operations with this
latest hammer utility and report back. bzip2'ing the output and
making it available would also be helpful.
-Matt
At around the time when you noticed the problem, and/or a few days before
or after, did you run a manual hammer cleanup and is it possible (if you
did) that it might have been running at the same time as another hammer
cleanup, such as the one run by cron?
-Matt
I think I found the smoking gun but I can't be sure until I see
the show output from Francois.
I noticed that all the bad CRC'd records were typically around element
30-32 in the B-Tree node (out of ~64 elements). That is, the middle
of the node.
This implies a race between the reblocker/rebalancer and a node split
during an insertion, or a race between the reblocker and the rebalancer.
I am testing a fix now and I am not 100% sure that this was the issue,
but there are a lot of things pointing to it:
* In both Jan's and Francois's cases the inodes that got corrupted
were in areas of the filesystem under a heavy write/create/delete
load.
* The corrupted records appear to nearly always be in the middle of
the B-Tree node, which implies a race against an insertion or a
rebalancing operation while the reblocker is running.
* And I found a bug in the reblocker that was exposed by recent work
(the work itself was not buggy, it just exposed the bug that already
existed) whereby the reblocker may reblock an element after relocking
the node but without properly checking that the element is still valid.
Jan, I think you can test this with your psql test, after you reformat
that volume for real and start fresh. You should be able to test this
by running a continuous hammer reblock operation on the data while you
are running the database test and see if corruption ultimately occurs.
I have included my proposed patch/fix below but please do not apply
it yet. I want to try to reproduce the corruption here to actually
test whether this fixes the issue or not.
Once we fix the issue I'll have to work up a procedure to fix any
broken filesystems. Locating breakage is really easy, the hammer
show and hammer checkmap commands can be used. Fixing it, short
of copying off the filesystem, may be more difficult.
Jan, I am convinced that it is ...It never occured. The emachine was left to run at night; most of the time in the last 15 days the cleanup ware run unattended at 3am. I did a manual cleanup once but it was after a complete 24h shutdown and I'm sure no periodic cron ran at the same time. I would have noticed the performance degradation. -- Francois Tigeot
This is the first time I've seen this sort of problem reported.
There are two possibilities. First is it is a real I/O error, but
in that case I would have expected messages in the dmesg output.
The second is the HAMMER volume somehow got corrupted... HAMMER
will generate EIO if records in the B-Tree specify offsets which
are out of bounds. If it were real corruption the CRCs would
fail and you would get dmesg output. If something else happened
that caused it to lay down good records that point to bad data
then it could report EIO without generating console output.
One thing you can do is see if the disk is completely readable
by using dd to read the raw drive. It should be able to read the
whole disk until it hits the end of the disk:
dd if=/dev/blahblah of=/dev/null bs=32k
But beyond that, think back about things you might have done
in the past that is outside the norm for HAMMER operations
that might have caused the problem, such as add or remove
a volume or adjust the disklabel or something like that.
-Matt
Matthew Dillon
<dillon@backplane.com>
It took almost 2 hours to run but found no errors. Nothing so extreme. I have updated the system to the latest 2.6 a few days ago. I think the previous version didn't have the new REDO code. I also briefly ran a DragonFly-2.6/x86-64 test installation from another disk and used the existing 500GB drive /home PFS. The error messages appeared the following day. -- Francois Tigeot
Hmm. Maybe there is an issue with 64-bit vs 32-bit use of the same
HAMMER filesystem. That hasn't been tested well at all, though
theoretically it should work just fine.
I think what I need is a full 'hammer -f <device> show' dump of the
filesystem. It will be large, but it should be able to bzip2
reasonably well. If you can upload that to leaf I can take a look
at it. I also need a list of (a few of the) broken filenames that
I can track down, including their inode numbers (ls -lia).
-Matt
Matthew Dillon
<dillon@backplane.com>
I'm not able to create a complete dump: # hammer -f /dev/da0s1d show > hammer-show.dump Assertion failed: (layer2->zone == zone), function blockmap_lookup, file /usr/src/sbin/hammer/blockmap.c, line 104. I can't get their inode numbers either: # ls -lia iconv.dir ls: iconv.dir: Input/output error -- Francois Tigeot
