Dear LKML, Apologies in advance for potential mis-use of LKML, but I don't know where else to ask. An ongoing study on datasets of several Petabytes have shown that there can be 'silent data corruption' at rates much larger than one might naively expect from the expected error rates in RAID arrays and the expected probability of single bit uncorrected errors in hard disks. The origin of this data corruption is still unknown. See for example http://cern.ch/Peter.Kelemen/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf In thinking about this, I began to wonder about the following. Suppose that a (possibly RAID) disk controller correctly reads data from disk and has correct data in the controller memory and buffers. However when that data is DMA'd into system memory some errors occur (cosmic rays, electrical noise, etc). Am I correct that these errors would NOT be detected, even on a 'reliable' server with ECC memory? In other words the ECC bits would be calculated in server memory based on incorrect data from the disk. The alternative is that disk controllers (or at least ones that are meant to be reliable) DMA both the data AND the ECC byte into system memory. So that if an error occurs in this transfer, then it would most likely be picked up and corrected by the ECC mechanism. But I don't think that 'this is how it works'. Could someone knowledgable please confirm or contradict? Cheers, Bruce -
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Joe Perches | [PATCH 011/148] include/asm-x86/bug.h: checkpatch cleanups - formatting only |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Tony Lindgren | [PATCH 29/90] ARM: OMAP: Palm Tungsten|T support |
git: | |
| Jakub Narebski | Re: VCS comparison table |
| Linus Torvalds | Re: [kernel.org users] [RFD] On deprecating "git-foo" for builtins |
| Jon Smirl | ! [rejected] master -> master (non-fast forward) |
| Scott Chacon | Re: git-scm.com |
| Richard Stallman | Real men don't attack straw men |
| Christophe Rioux | OpenBSD as host for VMWare Server |
| Eduardo Meyer | OpenBGP "state change OpenSent -> Active, reason: Connection closed" trouble |
| Jerome Santos | sshd.config and AllowUsers |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jiri Olsa | [PATCH] net: fix race in the receive/select |
| Wang Chen | [PATCH]&[Question] netdevice: Use netdev_priv() |
| Willy Tarreau | Re: [PATCH] tcp: splice as many packets as possible at once |
