Re: Problem: Out of memory after 2days with 2GB RAM

Previous thread: Confusions with reserve_early, reserve_bootmem, e820, efi, ... on x86_64 by Paul Jackson on Thursday, June 12, 2008 - 3:06 am. (12 messages)

Next thread: [patch 0/6] AMD C1E aware idle support by Thomas Gleixner on Thursday, June 12, 2008 - 3:28 am. (12 messages)
From: Zdenek Kabelac
Date: Thursday, June 12, 2008 - 3:07 am

Hello

I'm attaching a trace where my machine has got into big troubles after
2 day usage and several successful suspend/resumes (this seems to be
finally getting better now :))

It looks like while there was a huge amount of buffers and caches -
system was unable to allocate few pages for kmalloc in iwl3945 driver
after resume.

I've even tried to 3 > drop_cache  and reinsert iwl driver - but this
had fatal results - machine died completely with blinking caps lock -
and no oops in the log for this case:

This is the commit aab2545fdd6641b76af0ae96456c4ca9d1e50dad  for the
2.6.26-rc5 I've been in this case.
Machine is T61/2GB/C2D

I'm attaching also slabinfo in case it would be usable for something.

NetworkManager: <info>  (eth0): device state change: 1 -> 2
NetworkManager: <info>  (eth0): preparing device.
NetworkManager: <info>  (eth0): deactivating device.
<6>[53906.520363] ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17
(level, low) -> IRQ 17
NetworkManager: <info>  (wlan0): device state change: 1 -> 2
NetworkManager: <info>  (wlan0): bringing up device.
<4>[53906.578855] NetworkManager: page allocation failure. order:5, mode:0x1024
<4>[53906.578855] Pid: 2645, comm: NetworkManager Tainted: G        W
2.6.26-rc5 #33
<4>[53906.578855]
<4>[53906.578855] Call Trace:
<4>[53906.578855]  [<ffffffff81092c70>] __alloc_pages_internal+0x460/0x590
<4>[53906.578855]  [<ffffffff81092dbb>] __alloc_pages+0xb/0x10
<4>[53906.578855]  [<ffffffff81011c26>] dma_alloc_pages+0x26/0x30
<4>[53906.578855]  [<ffffffff81011cf3>] dma_alloc_coherent+0xc3/0x2a0
<4>[53906.578855]  [<ffffffffa01aa06e>] :iwl3945:iwl3945_hw_nic_init+0x8de/0x940
<4>[53906.578855]  [<ffffffffa019de01>] :iwl3945:__iwl3945_up+0x91/0x640
<4>[53906.578855]  [<ffffffffa019e968>] :iwl3945:iwl3945_mac_start+0x568/0x790
<4>[53906.578855]  [<ffffffff8128a67d>] ? __nla_put+0x2d/0x40
<4>[53906.578855]  [<ffffffff8128a633>] ? __nla_reserve+0x53/0x70
<4>[53906.578855]  [<ffffffff8128a67d>] ? __nla_put+0x2d/0x40
<4>[53906.578855]  ...
From: Rik van Riel
Date: Thursday, June 12, 2008 - 6:38 am

On Thu, 12 Jun 2008 12:07:34 +0200

It looks like this is because it wants to allocate 2**5 contiguous

As you can see, the 128kB free areas have been pretty much exhausted
and there is still a good amount of free memory.

I am not sure why this last 128kB area was not allocated, but lets
face it - it would have blown up the next allocation anyway.

Doing such a large allocation from a driver is probably not the best
idea.

-- 
All rights reversed.
--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 6:54 am

64-bit system I assume?
The allocation should be 256 * 20 * sizeof(struct sk_buff *).

Try the patch below. It should improve code generation too.

I discussed this with Tomas previously and he says the hw is capable of
doing 20 fragments per frame (wonder why, Broadcom can do an unlimited
number...) but he complained about the networking stack not being able
to. Well, the hardware needs to support IP checksumming for that, hence,
afaik, only two fragments can ever be used (one for hw header, one for
frame)

This cuts the allocation to 10%, or (under) a page in all cases.

johannes

--- everything.orig/drivers/net/wireless/iwlwifi/iwl-3945.h	2008-06-12 15:50:29.000000000 +0200
+++ everything/drivers/net/wireless/iwlwifi/iwl-3945.h	2008-06-12 15:50:31.000000000 +0200
@@ -120,7 +120,7 @@ struct iwl3945_queue {
 int iwl3945_queue_space(const struct iwl3945_queue *q);
 int iwl3945_x2_queue_used(const struct iwl3945_queue *q, int i);
 
-#define MAX_NUM_OF_TBS          (20)
+#define MAX_NUM_OF_TBS          (2)
 
 /* One for each TFD */
 struct iwl3945_tx_info {
--- everything.orig/drivers/net/wireless/iwlwifi/iwl-dev.h	2008-06-12 15:50:18.000000000 +0200
+++ everything/drivers/net/wireless/iwlwifi/iwl-dev.h	2008-06-12 15:50:24.000000000 +0200
@@ -115,7 +115,7 @@ struct iwl_queue {
 				* space less than this */
 } __attribute__ ((packed));
 
-#define MAX_NUM_OF_TBS          (20)
+#define MAX_NUM_OF_TBS          (2)
 
 /* One for each TFD */
 struct iwl_tx_info {


--

From: Zdenek Kabelac
Date: Thursday, June 12, 2008 - 7:12 am

I'll surely try you patch - but is the iwl the only driver which needs
128kB continuous memory chunk?

I think that if the 128kB memchunks are exhausted in 2 days while
there is over 1GB of free RAM in buffers & caches I think some
defragmentation is needed then ?

btw: Does it really means that within those buffers kernel could not
find any adjacent 32 pages which could be made free ?

Zdenek
--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 7:19 am

I don't know. But I think it'll probably fail right after, trying to
allocate a 32kb buffer with pci_alloc_something....

johannes
From: Tomas Winkler
Date: Thursday, June 12, 2008 - 9:38 am

On Thu, Jun 12, 2008 at 5:12 PM, Zdenek Kabelac

We do some stupid free-alloc sequence on restart this is where it
fails. I'm still polishing a patch that eliminates it.
Tomas
--

From: Tomas Winkler
Date: Thursday, June 12, 2008 - 8:43 am

On Thu, Jun 12, 2008 at 4:54 PM, Johannes Berg

This is scatter gather buffers that can be kicked in one DMA transaction.

This I still don't understand why but everybody is already tired to

Probably. it would be safe to use vmalloc for allocating txb anyway.
I'll give it a try.

There was already discussion on LKML about memory allocation problems
on X86_64, which might explain this regression. This didn't happen
before.

Tomas
--

From: Tomas Winkler
Date: Thursday, June 12, 2008 - 9:35 am

So vmalloc  didn't break anything on the 32bit machine I'm just about
to install 64 one so it will take hour or two.. I will issue some

This is  the thread title if you are interested.
'x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY'

Tomas
--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 10:05 am

Well, I disagree, and I'll push my patch as soon as somebody confirms

Like I said, it doesn't matter, there's no need to _waste_
18*256*sizeof(void *) bytes memory.

johannes
From: Tomas Winkler
Date: Thursday, June 12, 2008 - 10:39 am

On Thu, Jun 12, 2008 at 8:05 PM, Johannes Berg

Remember you are not a maintainer of this driver and second we are
open to all suggestions you don't have to use this kind of

--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 10:46 am

Yeah, you're right, I can't really do that. But I can submit the patch
to akpm, and I'm sure he'll take it after you provide your counter
argument about hope never dying again ;)

Frankly, I don't see why you're so opposed to this patch even if it
doesn't solve anything it probably leads to better code generation and
using a lot less memory.

Also, I know you cannot actually need those descriptors since mac80211
will never ever pass such frames, and _that_ is an area I do have at

Well, thing is, my patch saves 18 KiB memory on 32-bit and 36 on 64-bit,
so I think we should merge it regardless. Yes, the pci allocation is
icky, and yes, it would be good to just do it once instead of over and
over again, but even if you change it to do _all_ those allocations just
once we should not be wasting those 18/36 KiB memory for nothing.

johannes
From: Tomas Winkler
Date: Thursday, June 12, 2008 - 11:03 am

On Thu, Jun 12, 2008 at 8:46 PM, Johannes Berg


I'm not against it. You;v decided that I'm fighting you because I gave
another solution.
Frankly we probably don't need this allocation at all. maybe one skb
is just enough
even with my never dying hope all fragments are in skb fragment list.
This still probably won't save pci memory allocation problem

--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 11:15 am

Ok, no, I'm not saying you shouldn't rewrite all the code to get rid of
it, but I think you can use a patch like mine interim as such a rewrite


Yeah, true, that one needs to be done, but it could probably be done
only once when hw is probed rather than every time it is brought up.
Most likely not something you'll get to fix in 2.6.26 either though.

johannes
From: Zdenek Kabelac
Date: Thursday, June 12, 2008 - 1:11 pm

Well - it's great that there will be saved few kB in allocation of
never used pointers in iwl driver - but does this really solve the
problem that kernel gets relatively quickly out of memory for
allocations of this size - I guess iwl isn't the only driver
requesting 32 sequential pages.

Is it possible to track how this memory gets fragment/lost - who owns
the block and why they are not back in the pool?

btw with 8hour uptime at this moment I can see this:

DMA: 26*4kB 37*8kB 72*16kB 65*32kB 3*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 1*4096kB = 7920kB
DMA32: 203*4kB 79*8kB 26*16kB 11*32kB 6*64kB 9*128kB 3*256kB 2*512kB
2*1024kB 0*2048kB 0*4096kB = 7588kB

so at this moment I can see quiet a lot of free DMA memory - but in my
trace at the thread beginig after several suspend/resumes this memory
was gone....

Zdenek
--

From: Tomas Winkler
Date: Thursday, June 12, 2008 - 3:17 pm

On Thu, Jun 12, 2008 at 11:11 PM, Zdenek Kabelac

Currently the driver frees the memory on down and allocates it back on
up. This is done even in initial
reset for sake of flow simplicity.
I'm not sure yet why the memory is actually accumulating, whether the
bug is in the driver or memory system is not clear to me yet.  I
haven't seen this on older kernels or driver.
As I wrote I'm also polishing a patch that doesn't do this free-alloc
loop hope this will remedy somehow this problem. It has a drawback as
it will hold on memory even if devices is down.

Tomas
--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 5:43 pm

On Thu, 12 Jun 2008 22:11:32 +0200

I hope it is the only one.  Doing a 128kb GFP_ATOMIC allocation is
hopelessly unreliable.  Even a 128k GFP_KERNEL allocation will fail all
over the place.

Please convert the driver to allocate no more than 4k at a time.
--

From: John W. Linville
Date: Thursday, June 12, 2008 - 11:41 am

Is there some reason you think I would need to be cut out of the loop??
akpm isn't the only one who likes solutions...

John
-- 
John W. Linville
linville@tuxdriver.com
--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 10:03 am

And you can safely decrease the allocation to 10% as I do in my patch
because once you understand you'll see that you cannot possibly use

Yeah, but why bother if we can just allocate 10% of the size, waste a
lot less memory etc. mac80211 isn't going to pass in a scatter/gather

Doesn't really matter, iwlwifi is _wasting_ this allocation, it cannot
possibly use all those buffers anyway.

The more interesting thing is the pci_alloc_consistent allocation right
below that is also _huge_, but that's because of the stupid hardware
design, or can the hardware cope with having the descriptors non-linear
in memory?

johannes
From: Tomas Winkler
Date: Thursday, June 12, 2008 - 10:35 am

On Thu, Jun 12, 2008 at 8:03 PM, Johannes Berg

Hope never dies. I actually have seen this speed up the throughput so


We talk after your next HW design. How will configure 265 * 16
descriptors separately.

Tomas
--

From: Johannes Berg
Date: Thursday, June 12, 2008 - 10:39 am

Well, you can always add it back later if you make the networking stack

Well, considering that other hardware does manage to do things
differently (say Broadcom because I know their DMA engine), I don't know
why your hw designers went wild with this. All you need is an
"end-of-frame" flag. But that's not really interesting to discuss,
unless this is actually controlled by the microcode and you can change
it.

johannes
From: Tomas Winkler
Date: Thursday, June 12, 2008 - 10:50 am

On Thu, Jun 12, 2008 at 8:39 PM, Johannes Berg

--

From: Rik van Riel
Date: Thursday, June 12, 2008 - 10:10 am

On Thu, 12 Jun 2008 18:43:37 +0300

The only thing that makes no sense to me is why your
driver "needs" to allocate 10x as much memory in that
buffer than it will ever use.

What is the problem with the simpler solution, which
just reduces the size of the buffer to an amount of
memory that might actually get used?

-- 
All Rights Reversed
--

From: Jiri Slaby
Date: Thursday, June 12, 2008 - 2:30 pm

Why it wouldn't be "safe". I suggested it to you already, since allocating 64k 
by kmalloc for descriptors accessed only in kernel is crud. Moreover you're 
mixing the buffer with its descriptors here? Or what you're considering to vmalloc?
--

From: Tomas Winkler
Date: Thursday, June 12, 2008 - 3:26 pm

Not that. I just wasn't sure when I dropped the line  I'm not doing it
under some spinlock or something like that.
Tomas
--

From: Rafael J. Wysocki
Date: Friday, June 13, 2008 - 7:08 am

Is this a regression from 2.6.25, BTW?

Rafael
--

From: Zdenek Kabelac
Date: Friday, June 13, 2008 - 7:15 am

Well I've never seen this with 2.6.25 kernel - on the other hand
usually I've not been running machine for a longer period of time,
because suspend was failing too often I guess. Now it's more stable so
this bug has shown up.

It might be related to this issue as well http://lkml.org/lkml/2008/5/22/308

Zdenek
--

From: Zdenek Kabelac
Date: Monday, June 30, 2008 - 4:30 am

I'd like to point out - that -rc8 kernel without the iwl patch from
this thread is still failing (even though the OOM patch for memory
allocation on x86_64 is in the /mm directory.

Also as far as I can see - there is actually DMA memory chunk to
satisfy order 5 allocation in the log - so why is it failing ?

Zdenek

----

ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17 (level, low) -> IRQ 17
NetworkManager: page allocation failure. order:5, mode:0x24
Pid: 2656, comm: NetworkManager Tainted: G        W 2.6.26-rc8 #37

Call Trace:
 [<ffffffff81092de0>] __alloc_pages_internal+0x460/0x5a0
 [<ffffffffa0228818>] ? :iwl3945:iwl3945_hw_tx_queue_init+0x38/0x1a0
 [<ffffffff81092f3b>] __alloc_pages+0xb/0x10
 [<ffffffff81011c86>] dma_alloc_pages+0x26/0x30
 [<ffffffff81011d74>] dma_alloc_coherent+0xe4/0x2d0
 [<ffffffffa02273d3>] :iwl3945:iwl3945_tx_queue_init+0x63/0x1e0
 [<ffffffffa022a08e>] :iwl3945:iwl3945_hw_nic_init+0x8de/0x940
 [<ffffffffa021de01>] :iwl3945:__iwl3945_up+0x91/0x640
 [<ffffffffa021e968>] :iwl3945:iwl3945_mac_start+0x568/0x790
 [<ffffffff8128b30d>] ? __nla_put+0x2d/0x40
 [<ffffffff8128b2c3>] ? __nla_reserve+0x53/0x70
 [<ffffffff810b3714>] ? deactivate_slab+0x194/0x1c0
 [<ffffffffa0184dff>] :mac80211:ieee80211_open+0x13f/0x590
 [<ffffffff81274738>] ? dev_set_rx_mode+0x48/0x60
 [<ffffffff81276809>] dev_open+0x89/0xf0
 [<ffffffff81276031>] dev_change_flags+0xa1/0x1e0
 [<ffffffff81273ca9>] ? dev_get_by_index+0x19/0x80
 [<ffffffff8127f214>] do_setlink+0x214/0x3a0
 [<ffffffff812f6c20>] ? _read_unlock+0x30/0x60
 [<ffffffff8127f4ad>] rtnl_setlink+0x10d/0x150
 [<ffffffff8128069d>] rtnetlink_rcv_msg+0x18d/0x240
 [<ffffffff81280510>] ? rtnetlink_rcv_msg+0x0/0x240
 [<ffffffff8128b079>] netlink_rcv_skb+0x89/0xb0
 [<ffffffff812804f9>] rtnetlink_rcv+0x29/0x40
 [<ffffffff8128aa95>] netlink_unicast+0x2d5/0x2f0
 [<ffffffff8126ef7e>] ? __alloc_skb+0x6e/0x150
 [<ffffffff8128acb4>] netlink_sendmsg+0x204/0x300
 [<ffffffff812f6c20>] ? _read_unlock+0x30/0x60
 [<ffffffff81266887>] ...
Previous thread: Confusions with reserve_early, reserve_bootmem, e820, efi, ... on x86_64 by Paul Jackson on Thursday, June 12, 2008 - 3:06 am. (12 messages)

Next thread: [patch 0/6] AMD C1E aware idle support by Thomas Gleixner on Thursday, June 12, 2008 - 3:28 am. (12 messages)