On Sat, 29 Sep 2007 22:10:42 +0300 Artem Bityutskiy <dedekind@yandex.ru> wrote:ok.. writepage under i_mutex is commonly done on the sys_write->alloc_pages->direct-reclaim path. It absolutely has to work, and you'll be fine relying upon that. However ->prepare_write() is called with the page locked, so you are vulnerable to deadlocks there. I suspect you got lucky because the page which you're holding the lock on is not dirty in your testing. But in other applications (eg: 1k blocksize ext2/3/4) the page _can_ be dirty while we're trying to allocate more blocks for it, in which case the lock_page() deadlock can happen. One approach might be to add another flag to writeback_control telling write_cache_pages() to skip locked pages. Or even put a page* into wrietback_control and change it to skip *this* page. yup. Or another CPU can do the same. Perhaps a heavier workload is needed. There is code in the VFS which tries to prevent lots of CPUs from getting in and fighting with each other (see writeback_acquire()) which will have the effect of serialising things for some extent. But writeback_acquire() is causing scalability problems on monster IO systems and might be removed, and it is only a partial thing - there are other ways in which concurrent writeout can occur (fsync, sync, page reclaim, ...) err, it's basically an open-coded mutex via which one thread can get exclusive access to some parts of an inode's internals. Perhaps it could literally be replaced with a mutex. Exactly what I_LOCK protects has not been documented afaik. That would need to be reverse engineered :( On a regular file i_mutex is used mainly for protection of the data part of the file, although it gets borrowed for other things, like protecting f_pos of all the inode's file*'s. I_LOCK is used to serialise access to a few parts of the inode itself. -
| Peter Zijlstra | Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK |
| Christoph Lameter | Re: crash in kmem_cache_init |
| Greg Kroah-Hartman | [PATCH 017/196] aoechr: Convert from class_device to device |
| David Miller | Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead |
git: | |
| Raimund Bauer | [wishlist] graphical diff |
| Johannes Schindelin | Re: A tour of git: the basics (and notes on some unfriendly messages) |
| Pazu | qgit on Mac OS X |
| pradeep singh rautela | Re: Why does git track directory listed in .gitignore/".git/info/exclude"? |
| Stephen Pierce | SLS |
| Theodore Ts'o | Re: demand paging: proposal |
| Dong Liu | Re: CXterm for LINUX |
| Marc G Fournier | Re: Reducing traffic on c.o.l.; splitting c.o.l |
| Markus Wernig | host to host ipsec link |
| Beavis | mutiple pptp pass-through PF |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| Sam Fourman Jr. | Asus Striker Extreme does not support 4GB memory |
| magical mounts | 12 hours ago | Linux kernel |
| Problem in scim in Fedora 9 | 13 hours ago | Linux general |
| The new Western Digital power saving drives | 13 hours ago | Hardware |
| Battery Maximizer Software | 1 day ago | Linux kernel |
| windows folder creation surprise | 1 day ago | Windows |
| Firewall | 2 days ago | OpenBSD |
| IP layer send packet | 2 days ago | Linux kernel |
| dtrace for linux available | 3 days ago | Linux kernel |
| Unable to mount ramdisk image using UBoot while upgrading to 2.6.15 kernel for a MPC8540 based target | 3 days ago | Linux kernel |
| RealTek RTL8169 - can't connect | 3 days ago | NetBSD |
