Re: [2.6.35-rc1] page alloc failure order:1, mode:0x4020

Previous thread: [PATCH]Device tree update for the 460ex DWC SATA<resubmission> by Rupjyoti Sarmah on Friday, June 4, 2010 - 1:04 am. (1 message)

Next thread: [GIT PULL] cleanups and fixes for oprofile by Robert Richter on Friday, June 4, 2010 - 3:01 am. (1 message)
From: Michael Guntsche
Date: Friday, June 4, 2010 - 2:20 am

Hi list,

Testing 2.6.35-rc1 on my powerpc based routerboard I saw the following page allocation
error happening during an apt-get update with a semi loaded wlan
interface

[309611.189267] __alloc_pages_slowpath: 52 callbacks suppressed
[309611.194959] gzip: page allocation failure. order:1, mode:0x4020
[309611.200981] Call Trace:
[309611.203547] [c399bc50] [c0008144] show_stack+0x48/0x15c (unreliable)
[309611.210041] [c399bc80] [c006268c] __alloc_pages_nodemask+0x3d4/0x52c
[309611.216512] [c399bd20] [c008619c] __slab_alloc+0x560/0x570
[309611.222111] [c399bd60] [c0086a98] __kmalloc_track_caller+0xd4/0x104
[309611.228505] [c399bd80] [c01dd220] __alloc_skb+0x64/0x124
[309611.233944] [c399bda0] [c994e034] ath_rxbuf_alloc+0x34/0xbc [ath]
[309611.240178] [c399bdc0] [c9a1ec9c] ath_rx_tasklet+0x480/0x7c4 [ath9k]
[309611.246658] [c399be80] [c9a1dae0] ath9k_tasklet+0x114/0x13c [ath9k]
[309611.253055] [c399bea0] [c002532c] tasklet_action+0x88/0x104
[309611.258746] [c399bec0] [c0025e30] __do_softirq+0xb4/0x134
[309611.264261] [c399bf00] [c0005ec4] do_softirq+0x58/0x5c
[309611.269514] [c399bf10] [c0025c20] irq_exit+0x7c/0x9c
[309611.274591] [c399bf20] [c0005f64] do_IRQ+0x9c/0xb4
[309611.279509] [c399bf40] [c00117d8] ret_from_except+0x0/0x14
[309611.285112] --- Exception: 501 at 0xff31f0c
[309611.285121]     LR = 0xff32548
[309611.292536] Mem-Info:
[309611.294899] DMA per-cpu:
[309611.297528] CPU    0: hi:   42, btch:   7 usd:  18
[309611.302444] active_anon:1040 inactive_anon:1160 isolated_anon:0
[309611.302455]  active_file:14871 inactive_file:9440 isolated_file:0
[309611.302467]  unevictable:491 dirty:1258 writeback:0 unstable:0
[309611.302478]  free:628 slab_reclaimable:832 slab_unreclaimable:2312
[309611.302490]  mapped:2254 shmem:36 pagetables:202 bounce:0
[309611.332409] DMA free:2512kB min:1440kB low:1800kB high:2160kB active_anon:4160kB inactive_anon:4640kB active_file:59484kB inactive_file:37760kB unevictable:1964kB isolated(anon):0kB isolated(file):0kB present:130048kB ...
From: Eric Dumazet
Date: Friday, June 4, 2010 - 5:53 am

order-1 allocations are unfortunate, since this hardware should use
order-0 ones if possible, and it seems it was its goal.

3872 (0xF20) comes from 

#define IEEE80211_MAX_MPDU_LEN     (3840 + FCS_LEN +
	(IEEE80211_WEP_IVLEN +  \
	IEEE80211_WEP_KIDLEN + \
	IEEE80211_WEP_CRCLEN))

common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
	ah-&gt;caps.rx_status_len,
	min(common-&gt;cachelsz, (u16)64));

Then __dev_alloc_skb() adds two more blocs :

NET_SKB_PAD  (64 bytes on your platform ?)

sizeof(struct skb_shared_info) 
(on 32bit : 0x104 ... oh well that might be the problem : it is rounded
to 0x140)


And ath driver adds common-&gt;cachelsz  (I dont know its value)

-&gt; more than 4096 bytes

1) Maybe rx_bufsize should not include the roundup() since 
ath_rxbuf_alloc() also do an alignment adjustment ?

2) We should try to reduce skb_shared_info by four bytes.

Could you try this patch ?


We make sure rx_bufsize + various overhead &lt;= PAGE_SIZE
But I am not sure its legal for the hardware...

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index ca6065b..0a0dc3a 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -226,10 +226,10 @@ static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
 	u32 size;
 
 
-	common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
-				     ah-&gt;caps.rx_status_len,
-				     min(common-&gt;cachelsz, (u16)64));
-
+	size = roundup(IEEE80211_MAX_MPDU_LEN + ah-&gt;caps.rx_status_len,
+		       min(common-&gt;cachelsz, (u16)64));
+	common-&gt;rx_bufsize = max_t(u32, size,
+				   SKB_MAX_ORDER(NET_SKB_PAD + common-&gt;cachelsz, 0));
 	ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
 				    ah-&gt;caps.rx_status_len);
 


--

From: Michael Guntsche
Date: Friday, June 4, 2010 - 9:16 am

I applied the patch recompiled and run it on the routerboard, trying
to trigger the bug again.

Kind regards,
Michael
--

From: Michael Guntsche
Date: Sunday, June 6, 2010 - 2:56 am

Hi Eric,

Up to now I was not able to reproduce the bug, do you think this patch
can be pushed to mainline or is there a &quot;better&quot;/other  fix for it?

Kind regards,
Michael


--

From: Eric Dumazet
Date: Sunday, June 6, 2010 - 3:42 am

Thanks Michael for testing.

I'll submit ASAP an official patch, sent to all people involved in this
driver to get their Ack (or Nack).

IEEE80211_MAX_MPDU_LEN being 3840 + somebits is suspect, since it doesnt
match 802.11 specs.

It should be more close of 2304 + MAC header (32bytes) + FCS (4 bytes) ?



--

From: Nikhil Sethi (निखिल सेठी)
Date: Wednesday, August 25, 2010 - 1:25 pm

Hello.

I am running 2.5.35.3 with the above patch, and I still get these
failures. Though they are much less often than without the patch. A
snippet from dmesg below.

Please let me know what other details I should provide. Thanks

skbuff alloc of size 3872 failed
java: page allocation failure. order:1, mode:0x4020
Pid: 11464, comm: java Not tainted 2.6.35.3 #3
Call Trace:
 [&lt;c0243d26&gt;] ? __alloc_pages_nodemask+0x3e6/0x513
 [&lt;c025c293&gt;] ? __slab_alloc+0x2d7/0x2eb
 [&lt;c025c946&gt;] ? __kmalloc_track_caller+0x74/0x95
 [&lt;d09e801a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;d09e801a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;c0334097&gt;] ? __alloc_skb+0x57/0x100
 [&lt;d09e801a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;d0af4100&gt;] ? ath_rx_tasklet+0x2fb/0x808 [ath9k]
 [&lt;d0cbc89f&gt;] ? br_handle_frame+0x1b3/0x1c3 [bridge]
 [&lt;c033d18a&gt;] ? __netif_receive_skb+0x141/0x25f
 [&lt;d0af23c1&gt;] ? ath9k_tasklet+0xcc/0x107 [ath9k]
 [&lt;c02195bf&gt;] ? tasklet_action+0x5f/0x65
 [&lt;c0219873&gt;] ? __do_softirq+0x60/0xc6
 [&lt;c0219907&gt;] ? do_softirq+0x2e/0x30
 [&lt;c02199f9&gt;] ? irq_exit+0x53/0x55
 [&lt;c020392c&gt;] ? do_IRQ+0x3a/0x72
 [&lt;c0202be9&gt;] ? common_interrupt+0x29/0x30
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   90, btch:  15 usd:  32
active_anon:6158 inactive_anon:15748 isolated_anon:0
 active_file:11477 inactive_file:22926 isolated_file:0
 unevictable:468 dirty:5788 writeback:0 unstable:0
 free:990 slab_reclaimable:3091 slab_unreclaimable:2349
 mapped:1410 shmem:5 pagetables:227 bounce:0
DMA free:1000kB min:124kB low:152kB high:184kB active_anon:1304kB
inactive_anon:2304kB active_file:2584kB inactive_file:4900kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB
mlocked:0kB dirty:0kB writeback:0kB mapped:284kB shmem:0kB
slab_reclaimable:648kB slab_unreclaimable:752kB kernel_stack:152kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 238 238
Normal free:2960kB min:1908kB low:2384kB ...
From: Eric Dumazet
Date: Wednesday, August 25, 2010 - 1:44 pm

Le mercredi 25 août 2010 à 13:25 -0700, Nikhil Sethi (निखिल सेठी) a


reading my former patch, I believe I mistakenly used a max_t() instead
of a min_() macro :

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index ca6065b..0a0dc3a 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -226,10 +226,10 @@ static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
        u32 size;
 
 
-       common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
-                                    ah-&gt;caps.rx_status_len,
-                                    min(common-&gt;cachelsz, (u16)64));
-
+       size = roundup(IEEE80211_MAX_MPDU_LEN + ah-&gt;caps.rx_status_len,
+                      min(common-&gt;cachelsz, (u16)64));
+       common-&gt;rx_bufsize = max_t(u32, size,
+                                  SKB_MAX_ORDER(NET_SKB_PAD + common-&gt;cachelsz, 0));
        ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
                                    ah-&gt;caps.rx_status_len);
 


Please use instead this one :

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index ca6065b..0a0dc3a 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -226,10 +226,10 @@ static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
        u32 size;
 
 
-       common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
-                                    ah-&gt;caps.rx_status_len,
-                                    min(common-&gt;cachelsz, (u16)64));
-
+       size = roundup(IEEE80211_MAX_MPDU_LEN + ah-&gt;caps.rx_status_len,
+                      min(common-&gt;cachelsz, (u16)64));
+       common-&gt;rx_bufsize = min_t(u32, size,
+                                  SKB_MAX_ORDER(NET_SKB_PAD + common-&gt;cachelsz, 0));
        ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
                                    ah-&gt;caps.rx_status_len);
 



--

From: Nikhil Sethi (निखिल सेठी)
Date: Thursday, August 26, 2010 - 10:53 am

Thanks Eric. I have not been able to reproduce this problem after
applying your new patch in about 12 hours of operation. Will let you
know if the bug re-appears.

Thanks
Nikhil

--

From: Nikhil Sethi (निखिल सेठी)
Date: Sunday, September 5, 2010 - 8:57 am

This came back. Here is the dmesg output. It might be the problem only
happens if I have wmm_enabled=1 in my hostapd.conf. That is the only
thing that I changed yesterday, and today I saw these messages in my
logs. I will try to conform this.

I am running 2.6.35.4 with this patch:

*** 225,234 ****
  	int error = 0, i;
  	u32 size;

!
! 	common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
! 				     ah-&gt;caps.rx_status_len,
! 				     min(common-&gt;cachelsz, (u16)64));

  	ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
  				    ah-&gt;caps.rx_status_len);
--- 225,234 ----
  	int error = 0, i;
  	u32 size;

! 	size = roundup(IEEE80211_MAX_MPDU_LEN + ah-&gt;caps.rx_status_len,
!                        min(common-&gt;cachelsz, (u16)64));
!         common-&gt;rx_bufsize = min_t(u32, size,
!                                    SKB_MAX_ORDER(NET_SKB_PAD +
common-&gt;cachelsz, 0));

  	ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
  				    ah-&gt;caps.rx_status_len);

dmesg output:

skbuff alloc of size 3872 failed
ps: page allocation failure. order:1, mode:0x4020
Pid: 16022, comm: ps Tainted: G        W   2.6.35.4 #1
Call Trace:
 [&lt;c0243bfe&gt;] ? __alloc_pages_nodemask+0x3e6/0x513
 [&lt;c025be97&gt;] ? slob_new_pages+0xc/0x23
 [&lt;c025c0a1&gt;] ? __kmalloc_node+0x39/0x72
 [&lt;c03338d8&gt;] ? __alloc_skb+0x5c/0x105
 [&lt;d09fa01a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;d0ab7109&gt;] ? ath_rx_tasklet+0x2fb/0x808 [ath9k]
 [&lt;c030fd49&gt;] ? scsi_next_command+0x27/0x34
 [&lt;d0ab53c5&gt;] ? ath9k_tasklet+0xcc/0x107 [ath9k]
 [&lt;c02193c3&gt;] ? tasklet_action+0x5f/0x65
 [&lt;c0219677&gt;] ? __do_softirq+0x60/0xc6
 [&lt;c021970b&gt;] ? do_softirq+0x2e/0x30
 [&lt;c02197fd&gt;] ? irq_exit+0x53/0x55
 [&lt;c020392c&gt;] ? do_IRQ+0x3a/0x72
 [&lt;c0202be9&gt;] ? common_interrupt+0x29/0x30
 [&lt;c0247874&gt;] ? __remove_mapping+0x4b/0x8d
 [&lt;c023ea9f&gt;] ? try_to_release_page+0x20/0x30
 [&lt;c0247d36&gt;] ? shrink_page_list+0x2b8/0x46c
 [&lt;c03158ea&gt;] ? sd_prep_fn+0x1a5/0x92b
 [&lt;c024816b&gt;] ? shrink_inactive_list+0x281/0x552
 [&lt;c02445d9&gt;] ? ...
From: Eric Dumazet
Date: Sunday, September 5, 2010 - 1:25 pm

Le dimanche 05 septembre 2010 à 08:51 -0700, Nikhil Sethi (निखिल सेठी) a

Unfortunately you use SLOB, and it seems to ask order-1 pages, while
we want less than a page.




--

From: Nikhil Sethi (निखिल सेठी)
Date: Monday, November 8, 2010 - 9:25 am

Thanks Eric, for all the help.
I get very occasional errors with vanilla 2.6.36 kernel with your
patch. Error messages are slightly different, and very rare compared
to without your patch. 2.6.35.7 seems fine, and does not show this
error.

rsync: page allocation failure. order:1, mode:0x4020
Pid: 31479, comm: rsync Not tainted 2.6.36-rc1 #1
Call Trace:
 [&lt;c0246b2c&gt;] ? __alloc_pages_nodemask+0x3f1/0x52e
 [&lt;c0261330&gt;] ? __slab_alloc+0x40f/0x45a
 [&lt;c02617ca&gt;] ? __kmalloc_track_caller+0x91/0x97
 [&lt;d09db01a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;d09db01a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;c033a8d0&gt;] ? __alloc_skb+0x57/0x100
 [&lt;d09db01a&gt;] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [&lt;d0b01f2e&gt;] ? ath_rx_tasklet+0x680/0xcfe [ath9k]
 [&lt;c02234ae&gt;] ? __queue_work+0xe4/0x1a2
 [&lt;d0b00244&gt;] ? ath9k_tasklet+0xd0/0x134 [ath9k]
 [&lt;c0219a43&gt;] ? tasklet_action+0x5f/0x65
 [&lt;c0219cf4&gt;] ? __do_softirq+0x60/0xc6
 [&lt;c0219d88&gt;] ? do_softirq+0x2e/0x30
 [&lt;c0219e7b&gt;] ? irq_exit+0x54/0x56
 [&lt;c02038e0&gt;] ? do_IRQ+0x3a/0x72
 [&lt;c0202bc9&gt;] ? common_interrupt+0x29/0x30
 [&lt;c02d89a0&gt;] ? cfq_may_queue+0x0/0x92
 [&lt;c02d1567&gt;] ? get_request+0x121/0x23e
 [&lt;c02d16a4&gt;] ? get_request_wait+0x20/0xfd
 [&lt;c02d8f16&gt;] ? cfq_merge+0x0/0x69
 [&lt;c02cfd40&gt;] ? elv_merge+0x14b/0x170
 [&lt;c02d2375&gt;] ? __make_request+0x71/0x364
 [&lt;c02d1fe2&gt;] ? generic_make_request+0x14f/0x318
 [&lt;c0243fb1&gt;] ? mempool_alloc+0x35/0xb9
 [&lt;c02d21e9&gt;] ? submit_bio+0x3e/0x90
 [&lt;c0281c14&gt;] ? bio_alloc_bioset+0x33/0x9e
 [&lt;c027e780&gt;] ? submit_bh+0xbc/0xec
 [&lt;c027f4e2&gt;] ? ll_rw_block+0x72/0x74
 [&lt;c027f504&gt;] ? __breadahead+0x20/0x30
 [&lt;c02a3a8f&gt;] ? __ext4_get_inode_loc+0x308/0x416
 [&lt;c02a8154&gt;] ? ext4_iget+0x52/0x652
 [&lt;c02aa327&gt;] ? ext4_lookup+0x65/0xbf
 [&lt;c026a6bb&gt;] ? d_alloc_and_lookup+0x30/0x58
 [&lt;c026b84a&gt;] ? do_lookup+0xc5/0xfe
 [&lt;c026c239&gt;] ? link_path_walk+0x4d1/0x8d6
 [&lt;c026c722&gt;] ? path_walk+0x48/0xa9
 [&lt;c026c7be&gt;] ? do_path_lookup+0x3b/0x3f
 [&lt;c026d06d&gt;] ? user_path_at+0x30/0x64
 [&lt;c0266446&gt;] ? cp_new_stat64+0xee/0x100
 [&lt;c026664a&gt;] ? ...
From: Eric Dumazet
Date: Monday, November 8, 2010 - 10:10 am

Le lundi 08 novembre 2010 à 08:25 -0800, Nikhil Sethi (निखिल सेठी) a


Its very hard to follow this thread, with a message every two months.


As I said, this patch got a min_t() instead of max_t()

http://kerneltrap.org/mailarchive/linux-kernel/2010/8/25/4611322

If you still have kmalloc-8192 allocations, I suspect you dont use right
patch.



--

From: Nikhil Sethi (निखिल सेठी)
Date: Monday, November 8, 2010 - 10:22 am

As I said, this seems like a regression in 2.6.36 from 2.6.35 (which
is why I emailed after 2 months :) ). You mentioned earlier in the
thread that you were going to submit the patch to head, if there is a
separate bug/thread tracking that, I can send followups to that.

I believe I have the right patch:

*** 225,234 ****
  	int error = 0, i;
  	u32 size;

!
! 	common-&gt;rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
! 				     ah-&gt;caps.rx_status_len,
! 				     min(common-&gt;cachelsz, (u16)64));

  	ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
  				    ah-&gt;caps.rx_status_len);
--- 225,234 ----
  	int error = 0, i;
  	u32 size;

! 	size = roundup(IEEE80211_MAX_MPDU_LEN + ah-&gt;caps.rx_status_len,
!                        min(common-&gt;cachelsz, (u16)64));
!         common-&gt;rx_bufsize = min_t(u32, size,
!                                    SKB_MAX_ORDER(NET_SKB_PAD +
common-&gt;cachelsz, 0));

  	ath9k_hw_set_rx_bufsize(ah, common-&gt;rx_bufsize -
  				    ah-&gt;caps.rx_status_len);


--

Previous thread: [PATCH]Device tree update for the 460ex DWC SATA<resubmission> by Rupjyoti Sarmah on Friday, June 4, 2010 - 1:04 am. (1 message)

Next thread: [GIT PULL] cleanups and fixes for oprofile by Robert Richter on Friday, June 4, 2010 - 3:01 am. (1 message)