Andrew Morton <akpm@linux-foundation.org> writes:We broke coherence between the fs and /dev/hda1 when we introduced the page cache years ago, and weird hacky cases like unmap_underlying_metadata don't change that. Currently only metadata is more or less in sync with the contents of /dev/hda1. Well I took a look at ext3. For online resize all of the writes are done by the fs not by the user space tool. For e2fsck of a read-only filesystem currently we do cache the buffers for the super block and reexamine those blocks when we mount read-only. Which makes my patch by itself unsafe. If however ext3 and anyone else who does things like that were to reread the data and not to merely reexamine the data we should be fine. Fundamentally doing anything like this requires some form of synchronization, and if that synchronization does not exist today there will be bugs. Further decoupling things only makes that requirement clearer. Unfortunately because of things like the ext3 handling of remounting from ro to rw this doesn't fall into the quick trivial fix category :( The buffer_head itself seems to be a reasonable entity. The buffer cache is a monster. It does not follow the ordinary rules of the page cache, making it extremely hard to reason about. Currently in the buffer cache there are buffer_heads we are not allowed to make dirty which hold dirty data. Some filesystems panic the kernel when they notice this. Others like ext3 use a different bit to remember that the buffer is dirty. Because of ordering considerations the buffer cache does not hold a consistent view of what has been scheduled for being written to disk. It instead holds partially complete pages. The only place we should ever clear the dirty bit is just before calling write_page but try_to_free_buffers clears the dirty bit! We have buffers on pages without a mapping! In general the buffer cache violates a primary rule for comprehensible programming having. The buffer cache does not have a clear enough definition that it is clear what things are bugs and what things are features. 99% of the weird strange behavior in rd.c is because of the buffer cache not following the normal rules. This presumes I want to use a filesystem on my block device. Where I would care most is when I am doing things like fsck or mkfs on an unmounted filesystem. Where having buffer_heads is just extra memory pressure slowing things down, and similarly for highmem. We have to sync the filesystem before mounting but we have to do that anyway for all of the non metadata so that isn't new. Anyway my main objective was to get a good grasp on the buffer cache and the mm layer again. Which I now more or less have. While I think the buffer cache needs a bunch of tender loving care before it becomes sane I have other projects that I intend to complete before I try anything in this area. Eric -
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Rob Landley | What still uses the block layer? |
git: | |
| Antonio Almeida | HTB accuracy for high speed |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
