(cc'ing linux-kernel) We use kobjsize() for pointers returned from do_mmap() which is why I kept the page->index case. Are we using PageCompound for those as well? Hmm... --
What does do_mmap have to do with page->index? --
I don't know how the nommu code is supposed to work. I simply assumed it has a reason for being there and that compound_order() is not enough to handle that. --
Could the nommu people either tell us what this is all about? Or junk the code. This looks very wrong. --
ELF-FDPIC is currently using kobjsize() so that it can expand the heap/stack segment to fill up the entirety of its allocation. It's probably worth dropping that, though. NOMMU mmap() is using kobjsize()/ksize() to keep track of the number of bytes allocated and the amount of dead space. We can probably ditch that too. However, fs/proc/task_nommu.c uses kobjsize() quite a bit to determine how much metadata space a process is carrying around. We could just use sizeof(), I suppose, and not bother calculating the slack. David --
For short term, we should merge my two nommu patches as-is, no? --
Not until the page->index bits are killed, otherwise you aren't fixing anything. SLOB on nommu with those page->index tests will automatically oops today, before or after your patches. Until that's resolved, there's no point in pretending like kobjsize() has been "fixed". As no one has come up with a valid reason for those tests existing in the first place, simply having your patches and killing the BUG_ON()'s seems ok. If we're not going to kill the BUG_ON()'s, then your patches are purely cosmetic fixups with no behavioural change -- (ie, nommu is still hosed on SLOB with current git). --
Hi Paul, Sorry if I'm starting to sound like a broken record, but can you explain why removing the ->index bits are safe? I mean, if removing them is really okay, that means we don't hit that code path with SLAB at all? It fixes nommu with SLUB, doesn't it? Pekka --
Paul, so with something like this, the WARN_ON never triggers? diff --git a/mm/nommu.c b/mm/nommu.c index dca93fc..38eec2e 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -109,16 +109,23 @@ unsigned int kobjsize(const void *objp) * If the object we have should not have ksize performed on it, * return size of 0 */ - if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp)))) + if (!objp) + return 0; + + if ((unsigned long)objp >= memory_end) + return 0; + + page = virt_to_page(objp); + if (!page) return 0; if (PageSlab(page)) return ksize(objp); - BUG_ON(page->index < 0); - BUG_ON(page->index >= MAX_ORDER); + if (WARN_ON(!PageCompound(page))) + return 0; - return (PAGE_SIZE << page->index); + return PAGE_SIZE << compound_order(page); } /* --
This still needs to be virt_to_head_page() I think. I don't have my nommu boards at home, so I'll test at the office tomorow morning and let you know. --
Yes, I messed that up. Thanks a lot for your help, Paul! Pekka [PATCH] nommu: kobjsize fix From: Christoph Lameter <clameter@sgi.com> The kobjsize() function is broken with SLOB and SLUB. As summarized by Paul Mundt: The page->index bits look like they are being used for determining compound order, which is _completely_ bogus, and only happens to "work" in a few cases. Christoph and I have repeatedly asked for someone to explain what the hell those tests are there for, as right now they not only look completely bogus, but they also stop us from booting on SLOB. So far no one has provided any input on why those page->index BUG_ON()'s have any right to exist. So while having 2 out of 3 SLAB allocators in a bootable state might seem like progress, I'd rather see kobjsize() fixed correctly. Even my initial patches worked for all 3. If no one can speak up to defend those bits, they should be killed off before 2.6.26. Whether this is done in combination with your patch or Christoph's patch or whatever else doesn't matter. You can find the discussion here: http://lkml.org/lkml/2008/5/22/223 Reported-by: Paul Mundt <lethal@linux-sh.org> Cc: David Howells <dhowells@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> --- diff --git a/mm/nommu.c b/mm/nommu.c index dca93fc..935887b 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -109,16 +109,23 @@ unsigned int kobjsize(const void *objp) * If the object we have should not have ksize performed on it, * return size of 0 */ - if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp)))) + if (!objp) + return 0; + + if ((unsigned long)objp >= memory_end) + return 0; + + page = virt_to_head_page(objp); + if (!page) return 0; if (PageSlab(page)) return ksize(objp); - BUG_ON(page->index < 0); - BUG_ON(page->index >= MAX_ORDER); + if ...
Hi Paul,
I was about to send a patch that fixes some of the kobjsize() abuses to
use ksize() but then I realized that what we probably should do is
something like this instead.
I mean, assuming the BUG_ON bits are bogus, then we should always pass the
pointer to the allocator. I audited most of the callers and they all seem
to be really just using kmalloc() for allocation anyway.
What do you think?
Pekka
diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..fc1ff52 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -103,22 +103,20 @@ EXPORT_SYMBOL(vmtruncate);
*/
unsigned int kobjsize(const void *objp)
{
- struct page *page;
-
/*
* If the object we have should not have ksize performed on it,
* return size of 0
*/
- if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+ if (!objp)
return 0;
- if (PageSlab(page))
- return ksize(objp);
+ if ((unsigned long)objp >= memory_end)
+ return 0;
- BUG_ON(page->index < 0);
- BUG_ON(page->index >= MAX_ORDER);
+ if (!virt_to_page(objp))
+ return 0;
- return (PAGE_SIZE << page->index);
+ return ksize(objp);
}
/*
--
Oh, almost, you had this bit in ksize() of SLAB: + page = virt_to_head_page(objp); + if (unlikely(!PageSlab(page))) + return PAGE_SIZE << compound_order(page); Did you actually need it for something? Pekka --
Not that I recall, it was just for consistency with SLUB. I'll have to re-test though, as I'm not sure if it was necessary or not. --
OK. If we do need it, then something like this should work. But I don't see how we could have these "arbitrary pointers" (meaning not returned by kmalloc()) to compound pages; otherwise the BUG_ON would trigger with SLAB as well. I also don't see any call-sites that do this (but I'm not an expert on nommu). Pekka diff --git a/mm/nommu.c b/mm/nommu.c index dca93fc..604e7de 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -109,16 +109,39 @@ unsigned int kobjsize(const void *objp) * If the object we have should not have ksize performed on it, * return size of 0 */ - if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp)))) + if (!objp) return 0; + if ((unsigned long)objp >= memory_end) + return 0; + + page = virt_to_head_page(objp); + if (!page) + return 0; + + /* + * If the allocator sets PageSlab, we know the pointer came from + * kmalloc(). The allocator, however, is not guaranteed to set PageSlab + * so the pointer might still have come from kmalloc() (see the comments + * below). + */ if (PageSlab(page)) return ksize(objp); - BUG_ON(page->index < 0); - BUG_ON(page->index >= MAX_ORDER); + /* + * The ksize() function is only guaranteed to work for pointers + * returned by kmalloc(). So handle arbitrary pointers, that we expect + * always to be compound pages, here. Note: we also handle pointers to + * compound pages that came from kmalloc() here. + */ + if (PageCompound(page)) + return PAGE_SIZE << compound_order(page); - return (PAGE_SIZE << page->index); + /* + * Finally, handle pointers that came from kmalloc() that don't have + * PageSlab set. + */ + return ksize(objp); } /* --
In the kmem_cache_alloc() case calling ksize() there is bogus, the
previous semantics for kobjsize() just defaulted to returning PAGE_SIZE
for these, since page->index was typically 0. Presently if we ksize()
those objects, we get bogus size results that are smaller than the
minimum alignment size. So we still need a way to handle that, even if
it's not frightfully accurate.
If we go back and apply your PG_slab patch for SLUB + SLOB, then
kobjsize() can just become:
diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..3abd084 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -104,21 +104,43 @@ EXPORT_SYMBOL(vmtruncate);
unsigned int kobjsize(const void *objp)
{
struct page *page;
+ int order = 0;
/*
* If the object we have should not have ksize performed on it,
* return size of 0
*/
- if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+ if (!objp)
return 0;
+ if ((unsigned long)objp >= memory_end)
+ return 0;
+
+ page = virt_to_head_page(objp);
+ if (!page)
+ return 0;
+
+ /*
+ * If the allocator sets PageSlab, we know the pointer came from
+ * kmalloc().
+ */
if (PageSlab(page))
return ksize(objp);
- BUG_ON(page->index < 0);
- BUG_ON(page->index >= MAX_ORDER);
+ /*
+ * The ksize() function is only guaranteed to work for pointers
+ * returned by kmalloc(). So handle arbitrary pointers, that we expect
+ * always to be compound pages, here.
+ */
+ if (PageCompound(page))
+ order = compound_order(page);
- return (PAGE_SIZE << page->index);
+ /*
+ * Finally, handle arbitrary pointers that don't set PageSlab.
+ * Default to 0-order in the case when we're unable to ksize()
+ * the object.
+ */
+ return PAGE_SIZE << order;
}
/*
--
Hi Paul, What call-sites are using kmem_cache_alloc()? Can we convert them to kmalloc() or page_alloc()? IIRC both Christoph and Matt opposed my PG_slab patches. --
Well, with that modified version of your patch I posted, even if your previous PG_slab patches aren't applied, kobjsize() doesn't behave any worse than it presently does in terms of object size accuracy. In short: PG_slab doesn't get set and ksize() is never called, so we get the same degree of accuracy as the existing implementation, and the oopses get fixed (and the comments are still accurate, too!). So I think it's worth applying. Verified on all of SLUB/SLOB/SLAB. As for the call-site question, perhaps I'm misunderstanding your question. alloc_vfsmnt() is the first to call kmem_cache_zalloc() during boot-up on my system, but I'm not sure what relevance this has to anything? Accurately measuring kmem_cache_alloc() and static allocations is going to need quite a bit more of a re-think, but that's out of scope for 2.6.26. Presently I'd rather have my system booting first :-) --
Hi Paul, Agreed. Can you send this to Andrew? Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> David already mentioned some (most?) of the kobjsize() calls can go away and I think we should pursue that for 2.6.27. --
Ah, the kobjsize() calls is what you were getting at. Yes, getting rid of those would be a good plan. I'll start looking at that more in-depth once I've got some of my other 2.6.27 stuff out of the way. That's definitely one interface that needs to be killed off with extreme prejudice.. --
I thought I already had several times, but I'll attempt to summarize again.. I would expect the only reason those BUG_ON()'s haven't been triggered in the non-SLOB case is due to the fact PageSlab() takes a different path out. The page->index bits look like they are being used for determining compound order, which is _completely_ bogus, and only happens to "work" in a few cases. Christoph and I have repeatedly asked for someone to explain what the hell those tests are there for, as right now they not only look completely bogus, but they also stop us from booting on SLOB. So far no one has provided any input on why those page->index BUG_ON()'s have any right to exist. So while having 2 out of 3 SLAB allocators in a bootable state might seem like progress, I'd rather see kobjsize() fixed correctly. Even my initial patches worked for all 3. If no one can speak up to defend those bits, they should be killed off before 2.6.26. Whether this is done in combination with your patch or Christoph's patch or whatever else doesn't matter. --
The nagging question here is: Why is page->private used in kobjsize()? A special heap/stack setup? --
