Re: [PATCH] nommu: fix kobjsize() for SLOB and SLUB

Previous thread: optimizing out inline functions by Steve French on Wednesday, May 28, 2008 - 12:51 pm. (7 messages)

Next thread: Re: Linux 2.6.26-rc4 by Linus Torvalds on Wednesday, May 28, 2008 - 1:10 pm. (3 messages)
From: Pekka Enberg
Date: Wednesday, May 28, 2008 - 1:03 pm

(cc'ing linux-kernel)


We use kobjsize() for pointers returned from do_mmap() which is why I 
kept the page->index case. Are we using PageCompound for those as well?


Hmm...
--

From: Christoph Lameter
Date: Wednesday, May 28, 2008 - 1:25 pm

What does do_mmap have to do with page->index?

--

From: Pekka Enberg
Date: Wednesday, May 28, 2008 - 1:25 pm

I don't know how the nommu code is supposed to work. I simply assumed it 
has a reason for being there and that compound_order() is not enough to 
handle that.
--

From: Christoph Lameter
Date: Wednesday, May 28, 2008 - 1:29 pm

Could the nommu people either tell us what this is all about? Or junk the 
code. This looks very wrong.

--

From: David Howells
Date: Thursday, May 29, 2008 - 6:08 am

ELF-FDPIC is currently using kobjsize() so that it can expand the heap/stack
segment to fill up the entirety of its allocation.  It's probably worth
dropping that, though.

NOMMU mmap() is using kobjsize()/ksize() to keep track of the number of bytes
allocated and the amount of dead space.  We can probably ditch that too.

However, fs/proc/task_nommu.c uses kobjsize() quite a bit to determine how
much metadata space a process is carrying around.  We could just use sizeof(),
I suppose, and not bother calculating the slack.

David
--

From: Pekka J Enberg
Date: Thursday, May 29, 2008 - 6:21 am

For short term, we should merge my two nommu patches as-is, no?
--

From: Paul Mundt
Date: Thursday, May 29, 2008 - 2:12 pm

Not until the page->index bits are killed, otherwise you aren't fixing
anything. SLOB on nommu with those page->index tests will automatically
oops today, before or after your patches. Until that's resolved, there's
no point in pretending like kobjsize() has been "fixed". As no one has
come up with a valid reason for those tests existing in the first place,
simply having your patches and killing the BUG_ON()'s seems ok.

If we're not going to kill the BUG_ON()'s, then your patches are purely
cosmetic fixups with no behavioural change -- (ie, nommu is still hosed
on SLOB with current git).
--

From: Pekka Enberg
Date: Sunday, June 1, 2008 - 12:58 am

Hi Paul,


Sorry if I'm starting to sound like a broken record, but can you explain 
why removing the ->index bits are safe? I mean, if removing them is 
really okay, that means we don't hit that code path with SLAB at all?


It fixes nommu with SLUB, doesn't it?

		Pekka
--

From: Pekka J Enberg
Date: Sunday, June 1, 2008 - 1:22 am

Paul, so with something like this, the WARN_ON never triggers?

diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..38eec2e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -109,16 +109,23 @@ unsigned int kobjsize(const void *objp)
 	 * If the object we have should not have ksize performed on it,
 	 * return size of 0
 	 */
-	if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+	if (!objp)
+		return 0;
+
+	if ((unsigned long)objp >= memory_end)
+		return 0;
+
+	page = virt_to_page(objp);
+	if (!page)
 		return 0;
 
 	if (PageSlab(page))
 		return ksize(objp);
 
-	BUG_ON(page->index < 0);
-	BUG_ON(page->index >= MAX_ORDER);
+	if (WARN_ON(!PageCompound(page)))
+		return 0;
 
-	return (PAGE_SIZE << page->index);
+	return PAGE_SIZE << compound_order(page);
 }
 
 /*
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 1:24 am

This still needs to be virt_to_head_page() I think.

I don't have my nommu boards at home, so I'll test at the office tomorow
morning and let you know.
--

From: Pekka J Enberg
Date: Sunday, June 1, 2008 - 1:36 am

Yes, I messed that up. Thanks a lot for your help, Paul!

		Pekka

[PATCH] nommu: kobjsize fix
From: Christoph Lameter <clameter@sgi.com>

The kobjsize() function is broken with SLOB and SLUB. As summarized by Paul
Mundt:

  The page->index bits look like they are being used for determining compound
  order, which is _completely_ bogus, and only happens to "work" in a few
  cases.  Christoph and I have repeatedly asked for someone to explain what the
  hell those tests are there for, as right now they not only look completely
  bogus, but they also stop us from booting on SLOB.  So far no one has
  provided any input on why those page->index BUG_ON()'s have any right to
  exist.

  So while having 2 out of 3 SLAB allocators in a bootable state might seem
  like progress, I'd rather see kobjsize() fixed correctly. Even my initial
  patches worked for all 3.

  If no one can speak up to defend those bits, they should be killed off before
  2.6.26. Whether this is done in combination with your patch or Christoph's
  patch or whatever else doesn't matter.

You can find the discussion here:

  http://lkml.org/lkml/2008/5/22/223

Reported-by: Paul Mundt <lethal@linux-sh.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---

diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..935887b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -109,16 +109,23 @@ unsigned int kobjsize(const void *objp)
 	 * If the object we have should not have ksize performed on it,
 	 * return size of 0
 	 */
-	if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+	if (!objp)
+		return 0;
+
+	if ((unsigned long)objp >= memory_end)
+		return 0;
+
+	page = virt_to_head_page(objp);
+	if (!page)
 		return 0;
 
 	if (PageSlab(page))
 		return ksize(objp);
 
-	BUG_ON(page->index < 0);
-	BUG_ON(page->index >= MAX_ORDER);
+	if ...
From: Pekka J Enberg
Date: Sunday, June 1, 2008 - 2:13 am

Hi Paul,


I was about to send a patch that fixes some of the kobjsize() abuses to 
use ksize() but then I realized that what we probably should do is 
something like this instead.

I mean, assuming the BUG_ON bits are bogus, then we should always pass the 
pointer to the allocator. I audited most of the callers and they all seem 
to be really just using kmalloc() for allocation anyway.

What do you think?

		Pekka

diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..fc1ff52 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -103,22 +103,20 @@ EXPORT_SYMBOL(vmtruncate);
  */
 unsigned int kobjsize(const void *objp)
 {
-	struct page *page;
-
 	/*
 	 * If the object we have should not have ksize performed on it,
 	 * return size of 0
 	 */
-	if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+	if (!objp)
 		return 0;
 
-	if (PageSlab(page))
-		return ksize(objp);
+	if ((unsigned long)objp >= memory_end)
+		return 0;
 
-	BUG_ON(page->index < 0);
-	BUG_ON(page->index >= MAX_ORDER);
+	if (!virt_to_page(objp))
+		return 0;
 
-	return (PAGE_SIZE << page->index);
+	return ksize(objp);
 }
 
 /*
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 3:24 am

Isn't this what my original patch did? ;-)
--

From: Pekka Enberg
Date: Sunday, June 1, 2008 - 3:29 am

Oh, almost, you had this bit in ksize() of SLAB:

+	page = virt_to_head_page(objp);
+	if (unlikely(!PageSlab(page)))
+		return PAGE_SIZE << compound_order(page);

Did you actually need it for something?

		Pekka
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 4:21 am

Not that I recall, it was just for consistency with SLUB. I'll have to
re-test though, as I'm not sure if it was necessary or not.
--

From: Pekka J Enberg
Date: Sunday, June 1, 2008 - 4:30 am

OK. If we do need it, then something like this should work. But I don't 
see how we could have these "arbitrary pointers" (meaning not returned 
by kmalloc()) to compound pages; otherwise the BUG_ON would trigger with 
SLAB as well. I also don't see any call-sites that do this (but I'm not an 
expert on nommu).

		Pekka

diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..604e7de 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -109,16 +109,39 @@ unsigned int kobjsize(const void *objp)
 	 * If the object we have should not have ksize performed on it,
 	 * return size of 0
 	 */
-	if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+	if (!objp)
 		return 0;
 
+	if ((unsigned long)objp >= memory_end)
+		return 0;
+
+	page = virt_to_head_page(objp);
+	if (!page)
+		return 0;
+
+	/*
+	 * If the allocator sets PageSlab, we know the pointer came from
+	 * kmalloc(). The allocator, however, is not guaranteed to set PageSlab
+	 * so the pointer might still have come from kmalloc() (see the comments
+	 * below).
+	 */
 	if (PageSlab(page))
 		return ksize(objp);
 
-	BUG_ON(page->index < 0);
-	BUG_ON(page->index >= MAX_ORDER);
+	/*
+	 * The ksize() function is only guaranteed to work for pointers
+	 * returned by kmalloc(). So handle arbitrary pointers, that we expect
+	 * always to be compound pages, here. Note: we also handle pointers to
+	 * compound pages that came from kmalloc() here.
+	 */
+	if (PageCompound(page))
+		return PAGE_SIZE << compound_order(page);
 
-	return (PAGE_SIZE << page->index);
+	/*
+	 * Finally, handle pointers that came from kmalloc() that don't have
+	 * PageSlab set.
+	 */
+	return ksize(objp);
 }
 
 /*
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 10:59 pm

In the kmem_cache_alloc() case calling ksize() there is bogus, the
previous semantics for kobjsize() just defaulted to returning PAGE_SIZE
for these, since page->index was typically 0. Presently if we ksize()
those objects, we get bogus size results that are smaller than the
minimum alignment size. So we still need a way to handle that, even if
it's not frightfully accurate.

If we go back and apply your PG_slab patch for SLUB + SLOB, then
kobjsize() can just become:

diff --git a/mm/nommu.c b/mm/nommu.c
index dca93fc..3abd084 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -104,21 +104,43 @@ EXPORT_SYMBOL(vmtruncate);
 unsigned int kobjsize(const void *objp)
 {
 	struct page *page;
+	int order = 0;
 
 	/*
 	 * If the object we have should not have ksize performed on it,
 	 * return size of 0
 	 */
-	if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp))))
+	if (!objp)
 		return 0;
 
+	if ((unsigned long)objp >= memory_end)
+		return 0;
+
+	page = virt_to_head_page(objp);
+	if (!page)
+		return 0;
+
+	/*
+	 * If the allocator sets PageSlab, we know the pointer came from
+	 * kmalloc().
+	 */
 	if (PageSlab(page))
 		return ksize(objp);
 
-	BUG_ON(page->index < 0);
-	BUG_ON(page->index >= MAX_ORDER);
+	/*
+	 * The ksize() function is only guaranteed to work for pointers
+	 * returned by kmalloc(). So handle arbitrary pointers, that we expect
+	 * always to be compound pages, here.
+	 */
+	if (PageCompound(page))
+		order = compound_order(page);
 
-	return (PAGE_SIZE << page->index);
+	/*
+	 * Finally, handle arbitrary pointers that don't set PageSlab.
+	 * Default to 0-order in the case when we're unable to ksize()
+	 * the object.
+	 */
+	return PAGE_SIZE << order;
 }
 
 /*
--

From: Pekka Enberg
Date: Sunday, June 1, 2008 - 11:32 pm

Hi Paul,


What call-sites are using kmem_cache_alloc()? Can we convert them to
kmalloc() or page_alloc()? IIRC both Christoph and Matt opposed my
PG_slab patches.
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 11:50 pm

Well, with that modified version of your patch I posted, even if your
previous PG_slab patches aren't applied, kobjsize() doesn't behave any
worse than it presently does in terms of object size accuracy.

In short: PG_slab doesn't get set and ksize() is never called, so we get
the same degree of accuracy as the existing implementation, and the
oopses get fixed (and the comments are still accurate, too!). So I think
it's worth applying. Verified on all of SLUB/SLOB/SLAB.

As for the call-site question, perhaps I'm misunderstanding your
question. alloc_vfsmnt() is the first to call kmem_cache_zalloc() during
boot-up on my system, but I'm not sure what relevance this has to
anything? Accurately measuring kmem_cache_alloc() and static allocations
is going to need quite a bit more of a re-think, but that's out of scope
for 2.6.26. Presently I'd rather have my system booting first :-)
--

From: Pekka Enberg
Date: Sunday, June 1, 2008 - 11:58 pm

Hi Paul,


Agreed. Can you send this to Andrew?

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>


David already mentioned some (most?) of the kobjsize() calls can go
away and I think we should pursue that for 2.6.27.
--

From: Paul Mundt
Date: Monday, June 2, 2008 - 12:01 am

Ah, the kobjsize() calls is what you were getting at. Yes, getting rid of
those would be a good plan. I'll start looking at that more in-depth once
I've got some of my other 2.6.27 stuff out of the way. That's definitely
one interface that needs to be killed off with extreme prejudice..
--

From: Paul Mundt
Date: Sunday, June 1, 2008 - 1:22 am

I thought I already had several times, but I'll attempt to summarize again..

I would expect the only reason those BUG_ON()'s haven't been triggered in
the non-SLOB case is due to the fact PageSlab() takes a different path
out.

The page->index bits look like they are being used for determining
compound order, which is _completely_ bogus, and only happens to "work"
in a few cases. Christoph and I have repeatedly asked for someone to
explain what the hell those tests are there for, as right now they not
only look completely bogus, but they also stop us from booting on SLOB.
So far no one has provided any input on why those page->index BUG_ON()'s
have any right to exist.

So while having 2 out of 3 SLAB allocators in a bootable state might seem
like progress, I'd rather see kobjsize() fixed correctly. Even my initial
patches worked for all 3.

If no one can speak up to defend those bits, they should be killed off
before 2.6.26. Whether this is done in combination with your patch or
Christoph's patch or whatever else doesn't matter.
--

From: Christoph Lameter
Date: Thursday, May 29, 2008 - 8:00 am

The nagging question here is: Why is page->private used in kobjsize()?  A 
special heap/stack setup?


--

Previous thread: optimizing out inline functions by Steve French on Wednesday, May 28, 2008 - 12:51 pm. (7 messages)

Next thread: Re: Linux 2.6.26-rc4 by Linus Torvalds on Wednesday, May 28, 2008 - 1:10 pm. (3 messages)