I have recently attempted to boot an 8641D based board from an NFS root. The boot process grinds to a halt not long after the first access of the NFS root and I receive multiple "nfs: server 192.168.0.1 not responding, still trying" messages. Wireshark suggests that there is no further traffic from this board at this point on. The NFS server seems to eventually try sending duplicate packets it's already sent, which results in "nfs: server 192.168.0.1 OK" messages, but the "not responding" messages resume with no further traffic from the board. I am able to boot to a ramdisk fine and the network seems to work - though I haven't really pushed the interface from it. I have attempted to git bisect, though I wasn't able to get much further than discovering the problem was introduced in the 2.6.33 merge window - at which point the gianfar network driver fails to compile (I have tried to git bisect skip many, many times to no avail). NFS booting fails for this board on todays linux-next, the master branch of Kumar's PPC tree and the head of the main tree. I have also been able to NFS boot from a random x86 based board that I have, using the head of the main tree and the linux-next tree. Copying the gianfar drivers from 2.6.32 into the head of the main tree restores the correct behaviour and I'm able to NFS boot. I have heard from others that the latest drivers work on 83xx and 85xx based boards, but it seems to be broken on at least the 8641D. I can see there has been a fair amount of work done on the gianfar driver, I assume that this is a bug introduced by the multiple queue support, but I'm way out of my depth on this. I'm also off for the next week - so if I'm quiet, it'll be because of that. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com ...
I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
and am having still experiencing the problems outlined in my previous
email, though I have noticed that I tend to be able to boot from cold,
but my boot fails on reboot. Hitting the reset button doesn't help, I
need to actually power the machine on and off again for it to work.
As before, I'm way out of my depth in this, any one have any ideas?
Below is a dump of the failed boot process:
U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)
Freescale PowerPC
CPU:
Core: E600 Core 0, Version: 0.2, (0x80040202)
System: Unknown, Version: 2.0, (0x80900120)
Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC: 25 MHz
L2: Enabled
Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
0x22
I2C: ready
DRAM: DDR: 1 GB
FLASH: 8 MB
Invalid ID (ff ff ff ff)
Scanning PCI bus 01
PCI-EXPRESS 1 on bus 00 - 02
PCI-EXPRESS 2 on bus 03 - 03
Video: No radeon video card found!
In: serial
Out: serial
Err: serial
SCSI: AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl IDE mode
flags: ncq ilck pm led clo pmp pio slum part
scanning bus for devices...
Net: eTSEC1, eTSEC2, eTSEC3, eTSEC4
=> tftp 4000000 hpcn/uImage-torvalds-linux-2.6
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/uImage-torvalds-linux-2.6'.
Load address: 0x4000000
Loading: #################################################################
#################################################################
#######################################################
done
Bytes transferred = 2709050 (29563a hex)
=> tftp 5000000 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
Load address: 0x5000000
Loading: #
done
Bytes transferred = 11523 ...Further testing has shown that this isn't restricted to warm reboots, it happens from cold as well. In addition, the exact timing of the failure seems to vary, some boots have got further before failing. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote: Unfortunately I don't have any 8641 boards near me, so I can't debug this myself. Though, I tested gianfar on MPC8568E-MDS with 2.6.33 kernel, and it seems to work just fine. I see you use SMP. Can you try to turn it off? If that will fix the issue, then it'll be a good data point. Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you know the results. Thanks, -- Anton Vorontsov email: cbouatmailru@gmail.com irc://irc.freenode.net/bd2 --
Nope, no luck. Can't trigger the issue. :-/ Tested with NFS boot, TCP and UDP netperf tests. -- Anton Vorontsov email: cbouatmailru@gmail.com irc://irc.freenode.net/bd2 --
On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov
I was able to reproduce it on an 8641D and bisected it down to this:
-----------
commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
Author: Anton Vorontsov <avorontsov@ru.mvista.com>
Date: Tue Nov 10 14:11:10 2009 +0000
gianfar: Revive SKB recycling
Before calling gfar_clean_tx_ring() the driver grabs an irqsave
spinlock, and then tries to recycle skbs. But since
skb_recycle_check() returns 0 with IRQs disabled, we'll never
recycle any skbs.
It appears that gfar_clean_tx_ring() and gfar_start_xmit() are
mostly idependent and can work in parallel, except when they
modify num_txbdfree.
So we can drop the lock from most sections and thus fix the skb
recycling.
-----------
...which probably explains why you weren't seeing it on non-SMP.
I'd imagine it would show up on any of the e500mc boards too.
I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then
cherry-picked those onto a 32 baseline to reduce the scale of
the bisection, but I don't think that should impact the final
result I got in any meaningful way.
Paul.
--
On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv->tx_queue[rx_queue->qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned && !tx_queue->num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; --
Anton, There is also one more issue that I have been observing with the patch "gianfar: Revive SKB recycling". The issue is when I do a IPV4 forwarding test scenario with bidirectional flows (SMP environment). I am using Spirent smart bits (smartflow) for automation testing and I frequently observe smart flow reporting "Rx packet counte greater than Tx packet count. Duplicate packets might have been received". To just get over the issue I have removed this patch and I didn't see the issue. To a certain extent I could get over the problem by using atomic_t for num_txbdfree (atomic_add and atomic_dec instructions for updating the num_txbdfree) and completely removing the spin_locks in the tx routines. Also, I feel we might want to make some more changes to the gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can operate parallely. I am really sorry for not posting it a bit earlier as I am caught up with some urgent issues. -- Thanks Sandeep --
-- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
Hm.. I found a p2020 board and I was able to reproduce the issue. The patch down below fixed it completely for me... hm. -- Anton Vorontsov email: cbouatmailru@gmail.com irc://irc.freenode.net/bd2 --
Interesting. I just tested the patch on the sbc8641d, and it
still has the issue with your patch applied. I'm using NFSroot
just like Martyn was and it still appears bound up on that
gianfar tx lock. I'll see if I can get a SysRq backtrace in
case that will help you see how it manages to get there...
Paul.
----
nfs: server not responding, still trying
[repeated ~15 times, then...]
INFO: task rc.sysinit:837 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rc.sysinit D 0fef73f4 0 837 836 0x00000000
Call Trace:
[dfb7d9b0] [c000a144] __switch_to+0x8c/0xf8
[dfb7d9d0] [c03443dc] schedule+0x380/0x954
[dfb7da50] [c0344a0c] io_schedule+0x5c/0x90
[dfb7da70] [c0074b0c] sync_page+0x4c/0x74
[dfb7da80] [c0344f44] __wait_on_bit_lock+0xb0/0x148
[dfb7dab0] [c0074a8c] __lock_page+0x94/0xa4
[dfb7dae0] [c0074d5c] find_lock_page+0x8c/0xa4
[dfb7db00] [c0075674] filemap_fault+0x1ec/0x4fc
[dfb7db40] [c008d548] __do_fault+0x98/0x53c
[dfb7dba0] [c0018478] do_page_fault+0x2d0/0x500
[dfb7dc50] [c00149d4] handle_page_fault+0xc/0x80
--- Exception: 301 at __clear_user+0x14/0x7c
LR = load_elf_binary+0x670/0x1270
[dfb7dd10] [c00f6ca0] load_elf_binary+0x620/0x1270 (unreliable)
[dfb7dd90] [c00b1f78] search_binary_handler+0x17c/0x394
[dfb7dde0] [c00f4f50] load_script+0x274/0x288 ...I've got a p2020ds here as well, so I'll give NFSroot on that a try with your patch. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
Out of 10 boot attempts, 7 failed. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote: OK, I see why. With ip=on (dhcp boot) it's much harder to trigger it. With static ip config can I see the same. --
I'd kind of expected to see us stuck in gianfar on that lock, but the SysRQ-T doesn't show us hung up anywhere in gianfar itself. [This was on a base 2.6.33, with just a small sysrq fix patch] Paul. ---------- SysRq : Changing Loglevel Loglevel set to 9 nfs: server not responding, still trying SysRq : Show State task PC stack pid father init D 0ff1c380 0 1 0 0x00000000 Call Trace: [df841a30] [c0009fc4] __switch_to+0x8c/0xf8 [df841a50] [c0350160] schedule+0x354/0x92c [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230 [df841d50] [c00b9680] do_lookup+0x118/0x288 [df841d80] [c00bb904] link_path_walk+0x194/0x1118 [df841df0] [c00bcb08] path_walk+0x8c/0x168 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c [df841e40] [c00be148] do_filp_open+0x5d4/0xba4 [df841f10] [c00abe94] ...
Yeah, I don't think this is gianfar-related. It must be something else triggered by the fact that gianfar no longer sends stuff. OK, I think I found what's happening in gianfar. Some background... start_xmit() prepares new skb for transmitting, generally it does three things: 1. sets up all BDs (marks them ready to send), except the first one. 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring() would cleanup it later. 3. sets up the first BD, i.e. marks it ready. Here is what clean_tx_ring() does: 1. reads skbs from tx_queue->tx_skbuff 2. Checks if the *last* BD is ready. If it's still ready [to send] then it it isn't transmitted, so clean_tx_ring() returns. Otherwise it actually cleanups BDs. All is OK. Now, if there is just one BD, code flow: - start_xmit(): stores skb into tx_skbuff. Note that the first BD (which is also the last one) isn't marked as ready, yet. - clean_tx_ring(): sees that skb is not null, *and* its lstatus says that it is NOT ready (like if BD was sent), so it cleans it up (bad!) - start_xmit(): marks BD as ready [to send], but it's too late. We can fix this simply by reordering lstatus/tx_skbuff writes. It works flawlessly on my p2020, please try it. Thanks! diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..cccb409 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) } /* setup the TxBD length and buffer pointer for the first BD */ - tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb; txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data, skb_headlen(skb), DMA_TO_DEVICE); @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txbdp_start->lstatus = lstatus; + eieio(); /* force lstatus write before tx_skbuff */ + + tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb; + /* Update the current skb ...
I've skipped right to the test part (I'll think about the description more later) and it passed 5 out of 5 boot tests on NFSroot sbc8641d. Looks like you've got a solution. --
Anton, Understood, and thanks for the explanation. Am I correct in saying that this is due to the out-of-order execution capability on powerpc ? I have one more question, why don't we use use atomic_t for num_txbdfree and completely do away with spin_locks in gfar_clean_tx_ring() and gfar_start_xmit(). In an non-SMP, scenario I would feel there is absolutely no requirement of spin_locks and in case of SMP atomic operation would be much more safer on powerpc rather than spin_locks. What is your suggestion ? -- Thanks --
Hi! On Sat, Feb 27, 2010 at 11:05:32AM +0530, Kumar Gopalpet-B05799 wrote: Nope, that was just a logic issue in the driver. Though, with the patch, the eieio() is needed so that compiler (or CPU) I think that's a good idea. However, in start_xmit() we'll have to keep the spinlock anyway since it also protects from gfar_error(), which can modify regs->tstat. Thanks! -- Anton Vorontsov email: cbouatmailru@gmail.com irc://irc.freenode.net/bd2 --
I can confirm 10/10 successful boots on p2020ds and mpc8641_hpcn. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
I removed the second core from the dts file rather than truly disabling SMP in the kernel config. Doing this allowed the board to boot reliably. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch@ge.com | M2 3AB VAT:GB 927559189 --
what mechanism do you use for warm resets? - k --
