[PATCH] ucc_geth: Rework the TX logic.

Previous thread: [PATCH] smsc911x: enforce read-after-write timing restriction on eeprom access by Steve Glendinning on Thursday, March 26, 2009 - 10:14 am. (2 messages)

Next thread: netfilter 00/12: Netfilter fixes/2.6.30 update part II by Patrick McHardy on Thursday, March 26, 2009 - 12:02 pm. (14 messages)
From: Joakim Tjernlund
Date: Thursday, March 26, 2009 - 10:44 am

The line:
 if ((bd == ugeth->txBd[txQ]) && (netif_queue_stopped(dev) == 0))
       break;
in ucc_geth_tx() didn not make sense to me. Rework & cleanup
this logic to something understandable.
---

Reworked the patch according to Antons comments.

 drivers/net/ucc_geth.c |   66 +++++++++++++++++++++++++++--------------------
 1 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 7fc91aa..465de3a 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -161,6 +161,17 @@ static struct ucc_geth_info ugeth_primary_info = {
 
 static struct ucc_geth_info ugeth_info[8];
 
+
+static inline u32 __iomem *bd2buf(u8 __iomem *bd)
+{
+	return (u32 __iomem *)(bd+4);
+}
+
+static inline u32 __iomem *bd2status(u8 __iomem *bd)
+{
+	return (u32 __iomem *)bd;
+}
+
 #ifdef DEBUG
 static void mem_disp(u8 *addr, int size)
 {
@@ -3048,6 +3059,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	u8 __iomem *bd;			/* BD pointer */
 	u32 bd_status;
 	u8 txQ = 0;
+	int tx_ind;
 
 	ugeth_vdbg("%s: IN", __func__);
 
@@ -3057,21 +3069,19 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* Start from the next BD that should be filled */
 	bd = ugeth->txBd[txQ];
-	bd_status = in_be32((u32 __iomem *)bd);
+	bd_status = in_be32(bd2status(bd));
 	/* Save the skb pointer so we can free it later */
-	ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]] = skb;
+	tx_ind = ugeth->skb_curtx[txQ];
+	ugeth->tx_skbuff[txQ][tx_ind] = skb;
 
 	/* Update the current skb pointer (wrapping if this was the last) */
-	ugeth->skb_curtx[txQ] =
-	    (ugeth->skb_curtx[txQ] +
-	     1) & TX_RING_MOD_MASK(ugeth->ug_info->bdRingLenTx[txQ]);
+	tx_ind = (tx_ind + 1) & TX_RING_MOD_MASK(ugeth->ug_info->bdRingLenTx[txQ]);
+	ugeth->skb_curtx[txQ] = tx_ind;
 
 	/* set up the buffer descriptor */
-	out_be32(&((struct qe_bd __iomem *)bd)->buf,
-		      dma_map_single(&ugeth->dev->dev, ...
From: Anton Vorontsov
Date: Thursday, March 26, 2009 - 11:03 am

Spaces around "+"...

Also, now it's obvious that we're just reading status or buf.

So instead of these inlines we could use struct qe_bd as described
in asm/qe.h.

I.e.

struct qe_bd *bd = ...;

in_be32(&bd->buf);
in_be16(&bd->status);

Oh, wait. We can't, ucc_geth assumes status is u32, which includes
the length field, i.e. ucc_fast.h defines:


The cleanup work surely desires a separate patch, so bd2buf and
bd2status are OK, for now.

I'll test the patch as soon as I'll get some QE board back on
my table (actually I have one QE board handy, but it's a 1GHz
one, while I'd like to test the patch on a slow machine, where
we'll actually see performance regressions).


Line over 80 columns.


Ditto.

Please make sure your patches pass scripts/checkpatch.pl.

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2
--

From: Joakim Tjernlund
Date: Thursday, March 26, 2009 - 11:26 am

Exactly, I did look at using struct qe_bd *bd, but that will
affect lots of stuff. Reading the status and len in one go is

Good, I have more coming but it touches the same code so I really want
to sort out this one first.

 Jocke
--

From: Li Yang
Date: Friday, March 27, 2009 - 2:45 am

On Fri, Mar 27, 2009 at 1:44 AM, Joakim Tjernlund


When the driver was reviewed for upstream merge, I was told to remove
this kind of thing in favor of direct struct qe_bd access.  So do we
really have a guideline on this?  Or it's just a matter of personal

Why does this make more sense?  The BD with a NULL buffer means the

Now I see why you depend on the buffer to judge if the BD ring is
full.  If you want to do this, make sure it's well documented in code,
or others will be confused.  And as Anton already commented, in/out
access is slow.  I think the original way can be fixed if you think
--

From: Joakim Tjernlund
Date: Friday, March 27, 2009 - 3:23 am

This is a bit better than the current type casts. Moving over to qe_bd
accesses is a bigger cleanup that will have to be a seperate patch.
I am not sure it is an all win as you probably loose the ability

The code could use a better comment, but I really think this method
is superior and more robust. The problem is that in/out functions is
much slower than it has to be for QE IO memory.

The original way is broken and I tired to fix it but it always broke
as soon one applied some load.

 Jocke 

--

From: Li Yang
Date: Friday, March 27, 2009 - 3:39 am

On Fri, Mar 27, 2009 at 6:23 PM, Joakim Tjernlund

The only difference I see between your method and the original code is
that you update the cleanup progress on every BD.  The original code
updates progress only after all sent BDs are processed.  You can try
move ugeth->confBd[txQ] = bd; into the loop then it will be the same
as your proposed code.

- Leo
--

From: Joakim Tjernlund
Date: Friday, March 27, 2009 - 4:39 am

Already tried that+lots of other combinatios. Doesn't work

--

From: Scott Wood
Date: Friday, March 27, 2009 - 6:26 am

Not if you define the struct to have one status_len field (or a union of 
that with separate fields), as gianfar does.

-Scott
--

From: Joakim Tjernlund
Date: Monday, March 30, 2009 - 9:38 am

gianfar does not seem to use in_/out_ functions for the BDs. Works just
fine that too it seems. I always felt that the in_/out_ functions
was extra baggage that the Freescale CPUs don't need. There is
something in between that is missing.

 Jocke
--

From: Scott Wood
Date: Monday, March 30, 2009 - 10:22 am

It does now that it has explicit barriers in a few places.  Before they 
were added, it would sometimes fail under load.  That was due to a 
compiler reordering, but CPU reordering was possible as well.

-Scott
--

From: Joakim Tjernlund
Date: Monday, March 30, 2009 - 10:34 am

Does not the CPU skip reordering if the guarded bit is set?

Could we not use the same in ucc_geth as well? Then people would
not need to worry about "performance issues".

 Jocke
--

From: Scott Wood
Date: Monday, March 30, 2009 - 10:45 am

The guarded bit is typically not set for DMA buffers.  ucc_geth is a bit 
different since descriptors are in MURAM which is ioremap()ed -- though 
switching to a cacheable mapping with barriers should be a performance 
improvement.

-Scott
--

From: Joakim Tjernlund
Date: Monday, March 30, 2009 - 11:42 am

I always thought that MURAM was very fast. The whole reason to have BDs in
MURAM is that it is faster than normal RAM, at least that is what I 
thought.

 Jocke

--

From: Scott Wood
Date: Monday, March 30, 2009 - 12:32 pm

Yeah, on second thought it probably wouldn't be worth it.  There's also 
the question of under what circumstances the QE's MURAM accesses will be 
cache-coherent.

As for the CPU not reordering guarded+cache inhibited accesses, that 
initially seemed to be true for the new arch stuff (book3e/book3s, but 
not really, see below), but the classic arch documentation only 
guarantees stores to such regions to be in-order (and the 
explicitly-specified operation of eieio on I+G accesses wouldn't make 
much sense if they were already guaranteed to be in-order).

Then I looked at the EREF to see what older book E documents had to say 
on the issue, and it suggests that when the architecture document says 
"out of order", it really means "speculative" (and reading the arch 
doc's definition of "out of order" seems to confirm this -- redefining 
terms is bad, m'kay?).  So it seems that the simple answer is no, 
guarded storage is not guaranteed to be in order, unless the only thing 
that can cause an out-of-order access is speculative execution.

-Scott
--

From: Joakim Tjernlund
Date: Tuesday, March 31, 2009 - 2:07 am

I am a bit confused, what isn't worth it? Currently MURAM isn't used by 
ucc_geth, but
it is easy to change. Swap MEM_PART_SYSTEM to MEM_PART_MURAM, however, 
just
tried that and the driver stopped working. I known this worked earlier 
because
I tried it and I even think I sent a patch to Leo.

What choices do we have, I see three:

1) MEM_PART_SYSTEM, as today.
2) MEM_PART_MURAM. I guess this should be uncacheable memory?
3) as gianfar, dma_alloc_coherent(). I presume this is uncacheable memory?

My guess would be 2 or 3. Do they have the same synchronization

Very informative, thanks.

--

From: Li Yang
Date: Tuesday, March 31, 2009 - 3:58 am

On Tue, Mar 31, 2009 at 5:07 PM, Joakim Tjernlund

1 and 3 are the same.  All of them use cacheable memory as we have a
hardware coherency module to take care of the cache coherency problem.
 However it might be better to use dma_alloc_coherent() for the code
to be more readible.  Thanks.

- Leo
--

From: Scott Wood
Date: Tuesday, March 31, 2009 - 7:37 am

Enabling cacheing on MURAM, at least when used for buffer descriptors. 
The cache line ping-pong would probably outweigh the cost of the 

Hmm.  I looked in the driver and saw numerous muram allocations, but I 
didn't try to follow the driver enough to ensure that they were for the 

It would be uncacheable on systems without coherent DMA, but I don't 

No, unfortunately.  PowerPC sync instructions are a bit complicated. 
For example, you can use eieio to sync between reading the interrupt 
status register and checking the ring buffer, if they're both mapped 
I+G, but not if the former is I+G and the latter is cacheable (you need 
a full sync in that case).

-Scott
--

From: Li Yang
Date: Tuesday, March 31, 2009 - 1:16 am

I noticed that in gianfar these memory access is not protected by
"volatile".  Can this be the reason why the compiler did some unwanted
optimization?

- Leo
--

Previous thread: [PATCH] smsc911x: enforce read-after-write timing restriction on eeprom access by Steve Glendinning on Thursday, March 26, 2009 - 10:14 am. (2 messages)

Next thread: netfilter 00/12: Netfilter fixes/2.6.30 update part II by Patrick McHardy on Thursday, March 26, 2009 - 12:02 pm. (14 messages)