On Tuesday 11 March 2008 13:53, Willy Tarreau wrote:So we have a flock of people arguing that you can't trust Linux. Well maybe there are situations were you can't, but what can you trust? Disk firmware? Bios? Big maybes everywhere. In my experience, Linux is very reliable. I think Linus, Andrew and others care an awful lot about that and go to considerable lengths to make it true. Got a list of Linux kernel flaws that bring down a system? Tell me and I will not use that version to run a transaction processing system, or I will fix them or get them fixed. But please do not tell me that Linux is too unreliable to run a transaction processing system. If Linux can't do it, then what can? By the way, the huge ramdisk that Violin ships runs Linux inside, to manage the raided, hotswappable memory modules. (Even cooler: they run Linux on a soft processor implemented on a big FPGA.) Does anybody think that they did not test to make sure Linux does not compromise their MTBF in any way? In practice, for the week I was able to test the box remotely and the 10 days I had it in my hands, the thing was solid as a rock. Good hardware engineering and a nice kernel I say. Sure. Leaving out dodgy stuff like hald, other bits I could mention, is probably a good idea. Scary thing is, thinks like hald are actually being run on servers but that is another issue entirely. It wasn't too long ago that NFS client was in the dodgy category, with oops, lockups, whathaveyou. It is pretty solid now, but it takes a while for the bad experiences to fade from memory. On the other hand, knfsd has never been the slightest bit of a problem. Helpful suggestion: don't run NFS client on your transaction processing unit. It may well be solid, but who needs to find out experimentally? Might as well toss gamin, dbus and udev while you are at it, for a further marginal reliability increase. Oh, and alsa, no offense to the great work there, but it just does not belong on a server. Definitely do not boot into X (I know I should not have to say that, but...) I guess I am actually going to run evaluations on some mission critical systems using the arrangement described. I wish I could be more specific about it, but I know of critical systems pushing massive data that in fact rely on batteries just as I have described. For completeness, I will verify that pulling the UPS plug actually corrupts the data and report my findings. Not by pulling the plug of course, but by asking the vendors. I consider 1/year way too high a failure rate for anything that gets onto a server I own, and then there must necessarily be systems in place to limit the damage. For me, that means replication, or perhaps synchronously mirroring the whole stack which is technology I do not trust yet on Linux, so we don't do that. Yet. So here is the tradeoff: do you take the huge performance boost you get by plugging in the battery, and put the necessary backup systems in place or do you accept a lower performing system that offers higher theoretical reliability? It depends on your application. My immediate application happens to be hacking kernels and taking diffs which tends to suck rather badly on Linux. Ramback will fix that, and it will be in place on my workstation right here, I will give my report. (Bummer, I need to reboot because I don't feel like backporting to 2.6.20, too bad about that 205 day uptime, but I have to close the vmsplice hole anyway.) So I will have, say, a 3 GB source code partition running under ramback and it will act just like spinning media because of my UPS, except 25 times faster. Of course the reason I feel brave about this is, everything useful on that partition gets uploaded to the internet sooner rather than later. Nonetheless, having to reinstall everything would cost me hours, so I will certainly not do it if I think there is any reasonable likelihood I might have to. Right. See ddraid. It is in the pipeline, but everything takes time. We also need to reroll NBD complete with deadlock fixes before I feel good about that. Regards, Daniel --
| Arjan van de Ven | [patch] Add basic sanity checks to the syscall execution patch |
| Matthew Wilcox | Re: AIM7 40% regression with 2.6.26-rc1 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
git: | |
| Andy Whitcroft | Re: VCS comparison table |
| David | User's mailing list? And multiple cherry pick |
| Scott Chacon | Git Community Book |
| Mark Levedahl | Re: [PATCH] Teach remote machinery about remotes.default config variable |
| Marco Peereboom | Re: Real men don't attack straw men |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Tony Abernethy | Re: What is our ultimate goal?? |
| Arjan van de Ven | Re: [GIT]: Networking |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Denys Fedoryshchenko | packetloss, on e1000e worse than r8169? |
| Radu Rendec | Endianness problem with u32 classifier hash masks |
