Hello I'm attaching a trace where my machine has got into big troubles after 2 day usage and several successful suspend/resumes (this seems to be finally getting better now :)) It looks like while there was a huge amount of buffers and caches - system was unable to allocate few pages for kmalloc in iwl3945 driver after resume. I've even tried to 3 > drop_cache and reinsert iwl driver - but this had fatal results - machine died completely with blinking caps lock - and no oops in the log for this case: This is the commit aab2545fdd6641b76af0ae96456c4ca9d1e50dad for the 2.6.26-rc5 I've been in this case. Machine is T61/2GB/C2D I'm attaching also slabinfo in case it would be usable for something. NetworkManager: <info> (eth0): device state change: 1 -> 2 NetworkManager: <info> (eth0): preparing device. NetworkManager: <info> (eth0): deactivating device. <6>[53906.520363] ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17 (level, low) -> IRQ 17 NetworkManager: <info> (wlan0): device state change: 1 -> 2 NetworkManager: <info> (wlan0): bringing up device. <4>[53906.578855] NetworkManager: page allocation failure. order:5, mode:0x1024 <4>[53906.578855] Pid: 2645, comm: NetworkManager Tainted: G W 2.6.26-rc5 #33 <4>[53906.578855] <4>[53906.578855] Call Trace: <4>[53906.578855] [<ffffffff81092c70>] __alloc_pages_internal+0x460/0x590 <4>[53906.578855] [<ffffffff81092dbb>] __alloc_pages+0xb/0x10 <4>[53906.578855] [<ffffffff81011c26>] dma_alloc_pages+0x26/0x30 <4>[53906.578855] [<ffffffff81011cf3>] dma_alloc_coherent+0xc3/0x2a0 <4>[53906.578855] [<ffffffffa01aa06e>] :iwl3945:iwl3945_hw_nic_init+0x8de/0x940 <4>[53906.578855] [<ffffffffa019de01>] :iwl3945:__iwl3945_up+0x91/0x640 <4>[53906.578855] [<ffffffffa019e968>] :iwl3945:iwl3945_mac_start+0x568/0x790 <4>[53906.578855] [<ffffffff8128a67d>] ? __nla_put+0x2d/0x40 <4>[53906.578855] [<ffffffff8128a633>] ? __nla_reserve+0x53/0x70 <4>[53906.578855] [<ffffffff8128a67d>] ? __nla_put+0x2d/0x40 <4>[53906.578855] ...
On Thu, 12 Jun 2008 12:07:34 +0200 It looks like this is because it wants to allocate 2**5 contiguous As you can see, the 128kB free areas have been pretty much exhausted and there is still a good amount of free memory. I am not sure why this last 128kB area was not allocated, but lets face it - it would have blown up the next allocation anyway. Doing such a large allocation from a driver is probably not the best idea. -- All rights reversed. --
64-bit system I assume?
The allocation should be 256 * 20 * sizeof(struct sk_buff *).
Try the patch below. It should improve code generation too.
I discussed this with Tomas previously and he says the hw is capable of
doing 20 fragments per frame (wonder why, Broadcom can do an unlimited
number...) but he complained about the networking stack not being able
to. Well, the hardware needs to support IP checksumming for that, hence,
afaik, only two fragments can ever be used (one for hw header, one for
frame)
This cuts the allocation to 10%, or (under) a page in all cases.
johannes
--- everything.orig/drivers/net/wireless/iwlwifi/iwl-3945.h 2008-06-12 15:50:29.000000000 +0200
+++ everything/drivers/net/wireless/iwlwifi/iwl-3945.h 2008-06-12 15:50:31.000000000 +0200
@@ -120,7 +120,7 @@ struct iwl3945_queue {
int iwl3945_queue_space(const struct iwl3945_queue *q);
int iwl3945_x2_queue_used(const struct iwl3945_queue *q, int i);
-#define MAX_NUM_OF_TBS (20)
+#define MAX_NUM_OF_TBS (2)
/* One for each TFD */
struct iwl3945_tx_info {
--- everything.orig/drivers/net/wireless/iwlwifi/iwl-dev.h 2008-06-12 15:50:18.000000000 +0200
+++ everything/drivers/net/wireless/iwlwifi/iwl-dev.h 2008-06-12 15:50:24.000000000 +0200
@@ -115,7 +115,7 @@ struct iwl_queue {
* space less than this */
} __attribute__ ((packed));
-#define MAX_NUM_OF_TBS (20)
+#define MAX_NUM_OF_TBS (2)
/* One for each TFD */
struct iwl_tx_info {
--
I'll surely try you patch - but is the iwl the only driver which needs 128kB continuous memory chunk? I think that if the 128kB memchunks are exhausted in 2 days while there is over 1GB of free RAM in buffers & caches I think some defragmentation is needed then ? btw: Does it really means that within those buffers kernel could not find any adjacent 32 pages which could be made free ? Zdenek --
I don't know. But I think it'll probably fail right after, trying to allocate a 32kb buffer with pci_alloc_something.... johannes
On Thu, Jun 12, 2008 at 5:12 PM, Zdenek Kabelac We do some stupid free-alloc sequence on restart this is where it fails. I'm still polishing a patch that eliminates it. Tomas --
On Thu, Jun 12, 2008 at 4:54 PM, Johannes Berg This is scatter gather buffers that can be kicked in one DMA transaction. This I still don't understand why but everybody is already tired to Probably. it would be safe to use vmalloc for allocating txb anyway. I'll give it a try. There was already discussion on LKML about memory allocation problems on X86_64, which might explain this regression. This didn't happen before. Tomas --
So vmalloc didn't break anything on the 32bit machine I'm just about to install 64 one so it will take hour or two.. I will issue some This is the thread title if you are interested. 'x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY' Tomas --
Well, I disagree, and I'll push my patch as soon as somebody confirms Like I said, it doesn't matter, there's no need to _waste_ 18*256*sizeof(void *) bytes memory. johannes
On Thu, Jun 12, 2008 at 8:05 PM, Johannes Berg Remember you are not a maintainer of this driver and second we are open to all suggestions you don't have to use this kind of --
Yeah, you're right, I can't really do that. But I can submit the patch to akpm, and I'm sure he'll take it after you provide your counter argument about hope never dying again ;) Frankly, I don't see why you're so opposed to this patch even if it doesn't solve anything it probably leads to better code generation and using a lot less memory. Also, I know you cannot actually need those descriptors since mac80211 will never ever pass such frames, and _that_ is an area I do have at Well, thing is, my patch saves 18 KiB memory on 32-bit and 36 on 64-bit, so I think we should merge it regardless. Yes, the pci allocation is icky, and yes, it would be good to just do it once instead of over and over again, but even if you change it to do _all_ those allocations just once we should not be wasting those 18/36 KiB memory for nothing. johannes
On Thu, Jun 12, 2008 at 8:46 PM, Johannes Berg I'm not against it. You;v decided that I'm fighting you because I gave another solution. Frankly we probably don't need this allocation at all. maybe one skb is just enough even with my never dying hope all fragments are in skb fragment list. This still probably won't save pci memory allocation problem --
Ok, no, I'm not saying you shouldn't rewrite all the code to get rid of it, but I think you can use a patch like mine interim as such a rewrite Yeah, true, that one needs to be done, but it could probably be done only once when hw is probed rather than every time it is brought up. Most likely not something you'll get to fix in 2.6.26 either though. johannes
Well - it's great that there will be saved few kB in allocation of never used pointers in iwl driver - but does this really solve the problem that kernel gets relatively quickly out of memory for allocations of this size - I guess iwl isn't the only driver requesting 32 sequential pages. Is it possible to track how this memory gets fragment/lost - who owns the block and why they are not back in the pool? btw with 8hour uptime at this moment I can see this: DMA: 26*4kB 37*8kB 72*16kB 65*32kB 3*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 7920kB DMA32: 203*4kB 79*8kB 26*16kB 11*32kB 6*64kB 9*128kB 3*256kB 2*512kB 2*1024kB 0*2048kB 0*4096kB = 7588kB so at this moment I can see quiet a lot of free DMA memory - but in my trace at the thread beginig after several suspend/resumes this memory was gone.... Zdenek --
On Thu, Jun 12, 2008 at 11:11 PM, Zdenek Kabelac Currently the driver frees the memory on down and allocates it back on up. This is done even in initial reset for sake of flow simplicity. I'm not sure yet why the memory is actually accumulating, whether the bug is in the driver or memory system is not clear to me yet. I haven't seen this on older kernels or driver. As I wrote I'm also polishing a patch that doesn't do this free-alloc loop hope this will remedy somehow this problem. It has a drawback as it will hold on memory even if devices is down. Tomas --
On Thu, 12 Jun 2008 22:11:32 +0200 I hope it is the only one. Doing a 128kb GFP_ATOMIC allocation is hopelessly unreliable. Even a 128k GFP_KERNEL allocation will fail all over the place. Please convert the driver to allocate no more than 4k at a time. --
Is there some reason you think I would need to be cut out of the loop?? akpm isn't the only one who likes solutions... John -- John W. Linville linville@tuxdriver.com --
And you can safely decrease the allocation to 10% as I do in my patch because once you understand you'll see that you cannot possibly use Yeah, but why bother if we can just allocate 10% of the size, waste a lot less memory etc. mac80211 isn't going to pass in a scatter/gather Doesn't really matter, iwlwifi is _wasting_ this allocation, it cannot possibly use all those buffers anyway. The more interesting thing is the pci_alloc_consistent allocation right below that is also _huge_, but that's because of the stupid hardware design, or can the hardware cope with having the descriptors non-linear in memory? johannes
On Thu, Jun 12, 2008 at 8:03 PM, Johannes Berg Hope never dies. I actually have seen this speed up the throughput so We talk after your next HW design. How will configure 265 * 16 descriptors separately. Tomas --
Well, you can always add it back later if you make the networking stack Well, considering that other hardware does manage to do things differently (say Broadcom because I know their DMA engine), I don't know why your hw designers went wild with this. All you need is an "end-of-frame" flag. But that's not really interesting to discuss, unless this is actually controlled by the microcode and you can change it. johannes
On Thu, Jun 12, 2008 at 8:39 PM, Johannes Berg --
On Thu, 12 Jun 2008 18:43:37 +0300 The only thing that makes no sense to me is why your driver "needs" to allocate 10x as much memory in that buffer than it will ever use. What is the problem with the simpler solution, which just reduces the size of the buffer to an amount of memory that might actually get used? -- All Rights Reversed --
Why it wouldn't be "safe". I suggested it to you already, since allocating 64k by kmalloc for descriptors accessed only in kernel is crud. Moreover you're mixing the buffer with its descriptors here? Or what you're considering to vmalloc? --
Not that. I just wasn't sure when I dropped the line I'm not doing it under some spinlock or something like that. Tomas --
Is this a regression from 2.6.25, BTW? Rafael --
Well I've never seen this with 2.6.25 kernel - on the other hand usually I've not been running machine for a longer period of time, because suspend was failing too often I guess. Now it's more stable so this bug has shown up. It might be related to this issue as well http://lkml.org/lkml/2008/5/22/308 Zdenek --
I'd like to point out - that -rc8 kernel without the iwl patch from this thread is still failing (even though the OOM patch for memory allocation on x86_64 is in the /mm directory. Also as far as I can see - there is actually DMA memory chunk to satisfy order 5 allocation in the log - so why is it failing ? Zdenek ---- ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 17 (level, low) -> IRQ 17 NetworkManager: page allocation failure. order:5, mode:0x24 Pid: 2656, comm: NetworkManager Tainted: G W 2.6.26-rc8 #37 Call Trace: [<ffffffff81092de0>] __alloc_pages_internal+0x460/0x5a0 [<ffffffffa0228818>] ? :iwl3945:iwl3945_hw_tx_queue_init+0x38/0x1a0 [<ffffffff81092f3b>] __alloc_pages+0xb/0x10 [<ffffffff81011c86>] dma_alloc_pages+0x26/0x30 [<ffffffff81011d74>] dma_alloc_coherent+0xe4/0x2d0 [<ffffffffa02273d3>] :iwl3945:iwl3945_tx_queue_init+0x63/0x1e0 [<ffffffffa022a08e>] :iwl3945:iwl3945_hw_nic_init+0x8de/0x940 [<ffffffffa021de01>] :iwl3945:__iwl3945_up+0x91/0x640 [<ffffffffa021e968>] :iwl3945:iwl3945_mac_start+0x568/0x790 [<ffffffff8128b30d>] ? __nla_put+0x2d/0x40 [<ffffffff8128b2c3>] ? __nla_reserve+0x53/0x70 [<ffffffff810b3714>] ? deactivate_slab+0x194/0x1c0 [<ffffffffa0184dff>] :mac80211:ieee80211_open+0x13f/0x590 [<ffffffff81274738>] ? dev_set_rx_mode+0x48/0x60 [<ffffffff81276809>] dev_open+0x89/0xf0 [<ffffffff81276031>] dev_change_flags+0xa1/0x1e0 [<ffffffff81273ca9>] ? dev_get_by_index+0x19/0x80 [<ffffffff8127f214>] do_setlink+0x214/0x3a0 [<ffffffff812f6c20>] ? _read_unlock+0x30/0x60 [<ffffffff8127f4ad>] rtnl_setlink+0x10d/0x150 [<ffffffff8128069d>] rtnetlink_rcv_msg+0x18d/0x240 [<ffffffff81280510>] ? rtnetlink_rcv_msg+0x0/0x240 [<ffffffff8128b079>] netlink_rcv_skb+0x89/0xb0 [<ffffffff812804f9>] rtnetlink_rcv+0x29/0x40 [<ffffffff8128aa95>] netlink_unicast+0x2d5/0x2f0 [<ffffffff8126ef7e>] ? __alloc_skb+0x6e/0x150 [<ffffffff8128acb4>] netlink_sendmsg+0x204/0x300 [<ffffffff812f6c20>] ? _read_unlock+0x30/0x60 [<ffffffff81266887>] ...
