On Sat, 29 Sep 2007 22:10:42 +0300 Artem Bityutskiy <dedekind@yandex.ru> wrote:ok.. writepage under i_mutex is commonly done on the sys_write->alloc_pages->direct-reclaim path. It absolutely has to work, and you'll be fine relying upon that. However ->prepare_write() is called with the page locked, so you are vulnerable to deadlocks there. I suspect you got lucky because the page which you're holding the lock on is not dirty in your testing. But in other applications (eg: 1k blocksize ext2/3/4) the page _can_ be dirty while we're trying to allocate more blocks for it, in which case the lock_page() deadlock can happen. One approach might be to add another flag to writeback_control telling write_cache_pages() to skip locked pages. Or even put a page* into wrietback_control and change it to skip *this* page. yup. Or another CPU can do the same. Perhaps a heavier workload is needed. There is code in the VFS which tries to prevent lots of CPUs from getting in and fighting with each other (see writeback_acquire()) which will have the effect of serialising things for some extent. But writeback_acquire() is causing scalability problems on monster IO systems and might be removed, and it is only a partial thing - there are other ways in which concurrent writeout can occur (fsync, sync, page reclaim, ...) err, it's basically an open-coded mutex via which one thread can get exclusive access to some parts of an inode's internals. Perhaps it could literally be replaced with a mutex. Exactly what I_LOCK protects has not been documented afaik. That would need to be reverse engineered :( On a regular file i_mutex is used mainly for protection of the data part of the file, although it gets borrowed for other things, like protecting f_pos of all the inode's file*'s. I_LOCK is used to serialise access to a few parts of the inode itself. -
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Eric Sandeen | Re: [RFC] Heads up on sys_fallocate() |
| YOSHIFUJI Hideaki / | request_module: runaway loop modprobe net-pf-1 (is Re: Linux 2.6.21-rc1) |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Ben Greear | Re: MACVLANs really best solution? How about a bridge with multiple bridge virtual... |
| Rafael J. Wysocki | 2.6.29-rc8: Reported regressions from 2.6.28 |
