> have EDAC turned on, or something ... I'm investigating now.
That gets you into arguments with the people who care about performance
but its really a distribution level debate and I suspect the answer is
itself distro specific depending on usage/
On a decent system ECC will do something. A modern server PC actually has
pretty good coverage on CPU L1, L2 and optionally RAM. I/O controllers
and disk internal caches seem to be a bit more variable which is one
reason big HPC cluster projects often checksum end to end - when you
produce terabytes of data all the one in a hundred billion error stats
start to look less than reassuring.
Alan
--