On Wed, 18 Jun 2008, Bron Gondwana wrote:Hmm.. I'm pretty sure that using MAP_SHARED for writing is _more_ portable than mixing mmap() and "write()" - or at least more _consistent_. That said, it's probably six one way, and half a dozen the other. The shared writable mmap() doesn't work well on unix-lookalikes (ie "not real unix"). That does include really really old Linux versions (ie 1.x series), but more relevantly probably includes things like QNX etc. On the other hand, the mmap()+write(), as mentioned, doesn't work well on various hardware platforms where theer can be cache aliases, and that includes HP-UX (as you apparently have noticed), but I'm pretty certain there are other cases too. The cache alias issue can actually be really thorny, because it's going to be very hard to see and essentially random: if your working set is big enough (or the cache is small enough) that the cache basically gets flushed between the write and the access through the mmap (and vice versa), you'll never see any problems. But then, _occasionally_, you'll have really hard-to-replicate corruption due to cache aliases (ie you read something from the mmap() after the write, but you don't actually see the newly written data, because it's cached at a different virtual address). Linux tries really hard to be coherent between mmap and read/write even on those kinds of platforms, but I would definitely not call it "portable". It really is a fundamentally nasty thing, and depends deeply on the CPU architecture, not just the OS. Yeah, I can certainly see that working. That said, I can also see it failing, partly because of the CPU virtual indexing cache issues, but partly because it's such an unusual thing to do (partly because it simply is known not to work on some systems, ie HP-UX). And that will mean that it is probably not a well-tested path.. As you found out. (Side note: I mention HP-UX just because it is known to historically have totally and utterly brain-damaged and useless mmap support. It _may_ be that they've fixed it in more modern versions. It literally used to be a mix of horrible hardware problems - the virtual cache issue - _and_ a VM system that was based on some really old BSD code). So the more traditional way would be to do an all-mmap thing, and extend the file with ftruncate(), not write. That's somethign that programs like nntpd have been doing for decades, so it's a very "traditional" model and thus much more likely to be safe. It also avoids all the aliasing issues, if all accesses are done the same way. That said, you _would_ need to have alternate strategies to access things, but apparently Cyrus already has such strategies at least for HP-UX. One of the issues here is that in order to give coherency for mmap + read/write access, the OS may need to map the area uncached or at least flush caches when writing. So from a pure performance standpoint, it can also cause problems. Of course, even a uncached mmap() _can_ certainly be faster than using just read()/write(), depending on the access patterns. So maybe Cyrus is doing the rigth thing, it just sounds rather fragile and prone to unexpected and hard-to-debug problems. Linus --
| Linus Torvalds | Linux 2.6.21 |
| Rafael J. Wysocki | Re: Slow DOWN, please!!! |
| Ingo Molnar | [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
git: | |
| Paul E. McKenney | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Octavian Purdila | [RFC] support for IEEE 1588 |
| Arjan van de Ven | Re: [GIT]: Networking |
