netconsole: tulip: possible remote DoS? due to kernel freeze on heavy RX traffic after Order-1 allocation failure

Previous thread: Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced by Eric Dumazet on Thursday, November 5, 2009 - 3:30 am. (6 messages)

Next thread: Libertas related kernel crash by Daniel Mack on Thursday, November 5, 2009 - 5:05 am. (5 messages)
From: Tobias Diedrich
Date: Thursday, November 5, 2009 - 4:31 am

On one of my rootservers, which is using the tulip driver for the
onboard network interface, I am seeing Order-1 allocation failures
on heavy RX traffic, which usually hang the machine.
As in I'm unable to ping it and after forcing a reboot using the
management interface I don't see the allocation failure message in
/var/log/kern.log, even though I saw (parts) of it over the
netconsole.

Unfortunately the netconsole target is not on the LAN, but a
different rootserver on the internet a few hops away, which means
bursts of udp Packets are lossy and can get reordered...

I first thought this was introduced in 2.6.31, but it is only easier
to trigger there.  Reducing vm.min_free_pages made it easy enough to
trigger also on 2.6.30.

Example from netconsole log:
|perl: page allocation failure. order:1, mode:0x20
|Pid: 3541, comm: perl Tainted: G        W  2.6.30.9-tomodachi #16
|Call Trace:
| [<c013e56d>] ? __alloc_pages_internal+0x353/0x36f
| [<c0154f2c>] ? cache_alloc_refill+0x2ab/0x544
| [<c0355479>] ? dev_alloc_skb+0x11/0x25
| [<c015526f>] ? __kmalloc_track_caller+0xaa/0xf9
| [<c0354ae5>] ? __alloc_skb+0x48/0xff
| [<c0355479>] ? dev_alloc_skb+0x11/0x25
| [<c02d4ba9>] ? tulip_refill_rx+0x3c/0x115
| [<c02d4fff>] ? tulip_poll+0x37d/0x416
| [<c0359763>] ? net_rx_action+0x6b/0x12f
| [<c0121ad7>] ? __do_softirq+0x4e/0xbf
| [<c0121a89>] ? __do_softirq+0x0/0xbf
| <IRQ>  [<c0107700>] ? do_IRQ+0x53/0x63
| [<c0106610>] ? common_interrupt+0x30/0x38
|Mem-Info:
|DMA per-cpu:
|CPU    0: hi:    0, btch:   1 usd:   0
|Normal per-cpu:
|CPU    0: hi:   90, btch:  15 usd:  85
|Active_anon:6380 active_file:1186 inactive_anon:6426
| inactive_file:2729 unevictable:40962 dirty:0 writeback:324 unstable:0
| free:300 slab:2083 mapped:2310 pagetables:684 bounce:0
|DMA free:932kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB act
|lowmem_reserve[]: 0 230 230
[after this the machine no longer responds to pings and has to be rebooted]

Since I can trigger this bug by heavy RX ...

I don't see anything in this trace to implicate netconsole? This is the
normal network receive path running out of input buffers then running
into memory fragmentation.

-- 
http://selenic.com : development and support for Mercurial and Linux


--

Previous thread: Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced by Eric Dumazet on Thursday, November 5, 2009 - 3:30 am. (6 messages)

Next thread: Libertas related kernel crash by Daniel Mack on Thursday, November 5, 2009 - 5:05 am. (5 messages)