Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Previous thread: [PATCH] x86: fix typo(?) in step.c by Jan Beulich on Wednesday, March 5, 2008 - 1:36 am. (11 messages)

Next thread: swapper OOPS in linux 2.6.24 on dell D620 laptop by l.genoni on Wednesday, March 5, 2008 - 2:22 am. (1 message)
From: Pavel Machek
Date: Wednesday, March 5, 2008 - 2:06 am

Hi!

leet:~ # uname -a
Linux leet 2.6.25-rc3 #189 SMP Mon Mar 3 13:16:59 CET 2008 x86_64
x86_64 x86_64
GNU/Linux
leet:~ #

32-bit distro on 64-bit kernel.

leet:~ # cat /proc/meminfo
MemTotal:      4055780 kB
MemFree:       3972400 kB
Buffers:          4892 kB
Cached:          29844 kB
SwapCached:          0 kB
Active:          23140 kB
Inactive:        20800 kB
SwapTotal:     2104472 kB
SwapFree:      2104472 kB
Dirty:            1300 kB
Writeback:           0 kB
AnonPages:        9152 kB
Mapped:           8684 kB
Slab:            18336 kB
SReclaimable:     7448 kB
SUnreclaim:      10888 kB
PageTables:        676 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4132360 kB
Committed_AS:    27684 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     18112 kB
VmallocChunk: 34359720115 kB
leet:~ # /etc/init.d/gpm start

Linux version 2.6.25-rc3 (pavel@amd) (gcc version 4.1.3 20071209
(prerelease) (Debian 4.1.2-18)) #189 SMP Mon Mar 3 13:16:59 CET 2008
Command line: root=/dev/sda2 vga=6    resume=/dev/sda1 splash=silent
nosmp no_console_suspend 3
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
 BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfed0000 (usable)
 BIOS-e820: 00000000bfed0000 - 00000000bfee2000 (ACPI data)
 BIOS-e820: 00000000bfee2000 - 00000000bfeee000 (ACPI NVS)
 BIOS-e820: 00000000bfeee000 - 00000000c0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec03000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Entering add_active_range(0, 256, 786128) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 3200 used
end_pfn_map = 1310720
DMI present.
ACPI: ...
From: Pavel Machek
Date: Wednesday, March 5, 2008 - 2:36 am

The nasty oops is in swsusp_save+0x298 . 

rdx=0xffff810000001008 
rdi=0xffff81000c001008
rdi=0x0000000131613000

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Pavel Machek
Date: Wednesday, March 5, 2008 - 2:39 am

Rafael, this seems to be similar to some problem you were trying to
solve... something with numa... I could not find it in
bugzilla.kernel.org... do you remember details by chance?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Rafael J. Wysocki
Date: Wednesday, March 5, 2008 - 3:12 pm

http://bugzilla.kernel.org/show_bug.cgi?id=9966

[Just have a look at the list of regressions from 2.6.24. ;-)]

In fact, I didn't even try to solve it myself, but asked some knwoledgeable
people (CCed) for advice.  No one responded, unfortunately ...

Thanks,
Rafael
--

From: Andi Kleen
Date: Wednesday, March 5, 2008 - 3:31 pm

Just commenting on the subject. The 34TB are not an over/underflow. x86-64
simply has so much address space reserved for vmalloc. It doesn't mean of course 
that that much could be actually allocated in real memory.

-Andi
--

From: Ingo Molnar
Date: Thursday, March 6, 2008 - 4:14 am

btw., the exact amount of available vmalloc space on 64-bit x86 is 32 TB 
(32768 GB), or 0x0000200000000000 hexa. (this is still only 0.0002% of 
the complete 64-bit address space [25% of the 128 TB 64-bit kernel 
address space] so we've got plenty of room)

but the first fundamental limit we'll hit on 64-bit is the 32-bit offset 
limit of binaries - this affects kernel modules, the kernel image, etc. 
We wont hit that anytime soon, but we'll eventually hit it. (user-space 
will be the first i guess)

	Ingo
--

From: Andi Kleen
Date: Thursday, March 6, 2008 - 4:30 am

If that ever happens just -fPIC mode would need to be supported
and a proper PLT for the references between modules and kernel. It would complicate 

I recently submitted a patch to fix the 2GB limit for user space
binaries (missing O_LARGEFILE). I think it made it into .25.

Newer gcc/binutils support the large code model so you could actually
try to generate binaries that big :-) e.g. some of the rtl-to-C compilers
seem to generate huge code so it might be actually useful.

Also of course you can always split the executable into ~2GB shared libraries.

-Andi

--

From: Ingo Molnar
Date: Thursday, March 6, 2008 - 2:06 pm

The largest kernel image i've had so far was slightly above 40MB so at 
least in the kernel we are not there yet ;-)

Do you have any experience with how much of a size difference there is 
when binaries are built for big address mode? I'd expect something in 
the neighborhood of 5% for an image with a structure similar to the 

... which brings back happy memories of DOS extenders ;-)

	Ingo
--

From: Pavel Machek
Date: Thursday, March 6, 2008 - 3:43 am

I searched that one, that's why I discovered those two

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Christian Kujau
Date: Wednesday, March 5, 2008 - 12:49 pm

out of curiosity: yesterday I've seen a box[0] with ~4 TB Committed_AS:

CommitLimit:   3085152 kB
Committed_AS: 4281048084 kB
VmallocTotal:   118776 kB
VmallocUsed:     13772 kB
VmallocChunk:   103880 kB

Since it's a rather old kernel (2.6.19.2), I just want to know: could this 
be related to what you've seen or this completely different (and 
Committed_AS is just this high because some st00pid app has allocated this 
much memory but not freed again)?

Thanks,
Christian.

[0] amd64, 32bit kernel, 32bit userland, 4GB RAM
-- 
BOFH excuse #101:

Collapsed Backbone
--

From: Hugh Dickins
Date: Wednesday, March 5, 2008 - 2:11 pm

I don't see what Pavel's issue is with this: it's simply a fact that
with a 64-bit kernel, we've lots of virtual address space to spare
for vmalloc.  What would be surprising is for VmallocUsed to get up


Unlikely.  Offhand I'm not quite sure that's impossible, but it's far
more likely that we've a kernel bug and vm_committed_space has wrapped
negative.

Ancient as your kernel is, I don't notice anything in the ChangeLogs
since then to say we've fixed a bug of that kind since 2.6.19.
Any idea how to reproduce this?  Are you using HugePages at all?
(It's particularly easy for us to get into a muddle over them,
though historically I think mremap has proved most difficult for
Committed_AS accounting).

Thanks,
--

From: Pavel Machek
Date: Wednesday, March 5, 2008 - 2:21 pm

Hmm... ok, I see, I thought "clearly this overflowed somewhere", and I
was wrong, it is expected result.

Still.... what is 34TB of vmalloc space good for when we can only ever
allocate 4GB (because that is how much physical memory we have?)? To
prevent fragmentation? 

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Hugh Dickins
Date: Wednesday, March 5, 2008 - 2:36 pm

The (mis)alignment does makes it look that way,

Well, what else would you want to use that space for?  If there were
a compelling reason to tune it according to how much physical memory
you have (and you're right, that we want a good surplus of address
space so as to avoid silly limitations by fragmentation), I guess
that could have been done.  But why bother if there's no reason?

It's a hard life, there's just too much room to spare in 64-bit ;)

Hugh
--

From: Christian Kujau
Date: Wednesday, March 5, 2008 - 3:11 pm

Well, if it's "interesting"...here are some more details from the box:


Huh. When I first saw this I thought "kernel bug" too, but then read the

Well, the box is running fine and since it's a production machine I don't 
intend to reboot the box very often. And since it's really an old kernel 
(for lkml discussion, that is) I don't intend to debug this one further. 
I really was only curious if this was userspace related (some app 

I have:

# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set

...was this, what you meant?

Thanks,
Christian.
-- 
BOFH excuse #340:

Well fix that in the next (upgrade, update, patch release, service pack).
--

From: Hugh Dickins
Date: Wednesday, March 5, 2008 - 4:21 pm

It's pretty sure to be userspace related i.e. kernel bug triggered by
particular userspace usage; and understandably, the bits you put there
don't tell me much about what userspace has been up to.  (And don't
worry, I'm not expecting you to tell me more!  I can't think of
anything useful to ask about it - it's not a question of what mix
of apps you have running there, it's a matter of what system calls,

Absolutely.  You mentioned it because it's useful for us to know there's
such an issue about, to keep our eyes open: thank you for doing so.

Right, I was forgetting that the various "HugePage" lines of /proc/meminfo
don't even appear when those are configured off, so my question was more
obscure than I'd intended.  You're not using HugePages at all, so we
can rule out that line of inquiry - that's helpful, thanks.

I'll keep my eyes open.

Hugh
--

Previous thread: [PATCH] x86: fix typo(?) in step.c by Jan Beulich on Wednesday, March 5, 2008 - 1:36 am. (11 messages)

Next thread: swapper OOPS in linux 2.6.24 on dell D620 laptop by l.genoni on Wednesday, March 5, 2008 - 2:22 am. (1 message)