Nick Piggin <nickpiggin@yahoo.com.au> writes:Well what brought this up for me was old user space code using an initial ramdisk. The actual failure that I saw occurred on the read path. And fixing init_page_buffers was the real world fix. At the moment I'm messing with it because it has become the itch I've decided to scratch. So at the moment I'm having fun, learning the block layer, refreshing my VM knowledge and getting my head around this wreck that we call buffer_heads. The high level concept of buffer_heads may be sane but the implementation seems to export a lot of nasty state. At this point my concern is what makes a clean code change in the kernel. Because user space can currently play with buffer_heads by way of the block device and cause lots of havoc (see the recent resierfs bug in this thread) that is why I increasingly think metadata buffer_heads should not share storage with the block device page cache. If that change is made then it happens that the current ramdisk would not need to worry about buffer heads and all of that nastiness and could just lock pages in the page cache. It would not be quite as good for testing filesystems but retaining the existing characteristics would be simple. After having looked a bit deeper the buffer_heads and the block devices don't look as intricately tied up as I had first thought. We still have the nasty case of: if (buffer_new(bh)) unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr); That I don't know how it got merged. But otherwise the caches are fully separate. So currently it looks to me like there are two big things that will clean up that part of the code a lot: - moving the metadata buffer_heads to a magic filesystem inode. - Using a simpler non-buffer_head returning version of get_block so we can make simple generic code for generating BIOs. As a meta_data cache manager perhaps, for a translation cache we need 8 bytes per page max. However all we need for a generic translation cache (assuming we still want one) is an array of sector_t per page. So what we would want is: int blkbits_per_page = PAGE_CACHE_SHIFT - inode->i_blkbits; if (blkbits_per_page <= 0) blkbits_per_page = 0; sector_t *blocks = kmalloc(sizeof(sector_t) << blkbits_per_page); And to remember if we have stored the translation: #define UNMAPPED_SECTOR (-1(sector_t)) ... The core of all of this being something like: #define MAX_BLOCKS_PER_PAGE (1 << (PAGE_CACHE_SHIFT - 9)) typedef int (page_blocks_t)(struct page *page, sector_t blocks[MAX_BLOCKS_PER_PAGE], int create); Which I can agree with. By definition! Eric -
| Adrian Bunk | Re: Linux 2.6.21 |
| Linus Torvalds | Linux 2.6.21-rc2 |
| WANG Cong | [-mm Patch] UML: fix a building error |
| Roland McGrath | Re: [PATCH 0/5] ftrace: to kill a daemon |
git: | |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Patrick McHardy | Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Theodore Ts'o | Re: cc1 fails silently |
| Michael Nolan | Power routines on notebook cause kernel panic |
| Marc Peters | v 0.11 boot disk problem |
| Dave `geek' Gymer | WARNING (was Re: New afio release) |
