Re: [PATCH] hvc_dcc: Simplify assembly for v6 and v7 ARM

Previous thread: [thisops uV3 00/18] Upgrade of this_cpu_ops V3 by Christoph Lameter on Tuesday, November 30, 2010 - 12:07 pm. (35 messages)

Next thread: [RFC v2 0/8] TI DMM-TILER driver by David Sin on Tuesday, November 30, 2010 - 12:58 pm. (22 messages)
From: Daniel Walker
Date: Tuesday, November 30, 2010 - 12:25 pm

This driver adds a basic console that uses the arm JTAG
DCC to transfer data back and forth. It has support for
ARMv6 and ARMv7.

This console is created under the HVC driver, and should be named
/dev/hvcX (or /dev/hvc0 for example).

Cc: Tony Lindgren <tony@atomide.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Daniel Walker <dwalker@codeaurora.org>
---
 drivers/char/Kconfig   |    9 +++
 drivers/char/Makefile  |    1 +
 drivers/char/hvc_dcc.c |  133 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/hvc_dcc.c

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 43d3395..d4a7776 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -682,6 +682,15 @@ config HVC_UDBG
        select HVC_DRIVER
        default n
 
+config HVC_DCC
+       bool "ARM JTAG DCC console"
+       depends on ARM
+       select HVC_DRIVER
+       help
+         This console uses the JTAG DCC on ARM to create a console under the HVC
+	 driver. This console is used through a JTAG only on ARM. If you don't have
+	 a JTAG then you probably don't want this option.
+
 config VIRTIO_CONSOLE
 	tristate "Virtio console"
 	depends on VIRTIO
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index ba53ec9..fa0b824 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_HVC_CONSOLE)	+= hvc_vio.o hvsi.o
 obj-$(CONFIG_HVC_ISERIES)	+= hvc_iseries.o
 obj-$(CONFIG_HVC_RTAS)		+= hvc_rtas.o
 obj-$(CONFIG_HVC_TILE)		+= hvc_tile.o
+obj-$(CONFIG_HVC_DCC)		+= hvc_dcc.o
 obj-$(CONFIG_HVC_BEAT)		+= hvc_beat.o
 obj-$(CONFIG_HVC_DRIVER)	+= hvc_console.o
 obj-$(CONFIG_HVC_IRQ)		+= hvc_irq.o
diff --git a/drivers/char/hvc_dcc.c b/drivers/char/hvc_dcc.c
new file mode 100644
index 0000000..6470f63
--- /dev/null
+++ b/drivers/char/hvc_dcc.c
@@ ...
From: Nicolas Pitre
Date: Tuesday, November 30, 2010 - 12:57 pm

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>

This doesn't support both ARMv6 and ARMv7 at run time, but this can 
trivially be added later when needed.


Nicolas
--

From: Arnd Bergmann
Date: Tuesday, November 30, 2010 - 2:17 pm

I was about to make a similar comment when I saw yours ;-)
--

From: Stephen Boyd
Date: Tuesday, November 30, 2010 - 10:30 pm

Without marking this asm volatile my compiler decides it can cache the
value of __ret in a register and then check the value of it continually
in hvc_dcc_put_chars() (I had to replace get_wait/put_wait with 1 and
fixup the branch otherwise my disassembler barfed on __dcc_(get|put)char).


00000000 <hvc_dcc_put_chars>:
   0:   ee103e11        mrc     14, 0, r3, cr0, cr1, {0}
   4:   e3a0c000        mov     ip, #0  ; 0x0
   8:   e2033202        and     r3, r3, #536870912      ; 0x20000000
   c:   ea000006        b       2c <hvc_dcc_put_chars+0x2c>
  10:   e3530000        cmp     r3, #0  ; 0x0
  14:   1afffffd        bne     10 <hvc_dcc_put_chars+0x10>
  18:   e7d1000c        ldrb    r0, [r1, ip]
  1c:   ee10fe11        mrc     14, 0, pc, cr0, cr1, {0}
  20:   2afffffd        bcs     1c <hvc_dcc_put_chars+0x1c>
  24:   ee000e15        mcr     14, 0, r0, cr0, cr5, {0}
  28:   e28cc001        add     ip, ip, #1      ; 0x1
  2c:   e15c0002        cmp     ip, r2
  30:   bafffff6        blt     10 <hvc_dcc_put_chars+0x10>
  34:   e1a00002        mov     r0, r2
  38:   e12fff1e        bx      lr

As you can see, the value of the mrc is checked against DCC_STATUS_TX
(bit 29) and then stored in r3 for later use. Marking this volatile
produces the following:

00000000 <hvc_dcc_put_chars>:
   0:   e3a03000        mov     r3, #0  ; 0x0
   4:   ea000007        b       28 <hvc_dcc_put_chars+0x28>
   8:   ee100e11        mrc     14, 0, r0, cr0, cr1, {0}
   c:   e3100202        tst     r0, #536870912  ; 0x20000000
  10:   1afffffc        bne     8 <hvc_dcc_put_chars+0x8>
  14:   e7d10003        ldrb    r0, [r1, r3]
  18:   ee10fe11        mrc     14, 0, pc, cr0, cr1, {0}
  1c:   2afffffd        bcs     18 <hvc_dcc_put_chars+0x18>
  20:   ee000e15        mcr     14, 0, r0, cr0, cr5, {0}
  24:   e2833001        add     r3, r3, #1      ; 0x1
  28:   e1530002        cmp     r3, r2
  2c:   bafffff5        blt     8 <hvc_dcc_put_chars+0x8>
  30:   e1a00002        mov     r0, r2
  34:   ...
From: Daniel Walker
Date: Wednesday, December 1, 2010 - 11:54 am

Are you talking about __dcc_getstatus only? I don't think adding

We could maybe drop the looping for TX, but RX has no C based looping
even tho for v7 it's recommended that we loop (presumably for v6 it's

This is primarily why these function look they way they do. I'd like to
hear from Tony, because he apparent didn't think they did the same


Like this?

	for (i = 0; i < count; ++i) {

		if (__dcc_getstatus() & DCC_STATUS_RX)
			buf[i] = __dcc_getchar();
		else
			break;
	}

It's a micro clean up I guess ..

Daniel
-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Greg KH
Date: Wednesday, December 1, 2010 - 12:28 pm

For this, or any other changes you want, I'll gladly take a follow-on
patch as this one is already in my tty-next tree.

thanks,

greg k-h
--

From: Stephen Boyd
Date: Friday, December 17, 2010 - 10:16 pm

The inline assembly differences for v6 vs. v7 in the hvc_dcc
driver are purely optimizations. On a v7 processor, an mrc with
the pc sets the condition codes to the 28-31 bits of the register
being read. It just so happens that the TX/RX full bits the DCC
driver is testing for are high enough in the register to be put
into the condition codes. On a v6 processor, this "feature" isn't
implemented and thus we have to do the usual read, mask, test
operations to check for TX/RX full.

Since we already test the RX/TX full bits before calling
__dcc_getchar() and __dcc_putchar() we don't actually need to do
anything special for v7 over v6. The only difference is in
hvc_dcc_get_chars(). We would test RX full, poll RX full, and
then read a character from the buffer, whereas now we will test
RX full, read a character from the buffer, and then test RX full
again for the second iteration of the loop. It doesn't seem
possible for the buffer to go from full to empty between testing
the RX full and reading a character. Therefore, replace the v7
versions with the v6 versions and everything works the same.

While we're here, cleanup the for loops a bit and mark the inline
assembly as volatile. Not marking it volatile causes GCC to cache
the results of the status and RX buffer registers causing
lockups.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Daniel Walker <dwalker@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/char/hvc_dcc.c |   43 +++++++------------------------------------
 1 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/char/hvc_dcc.c b/drivers/char/hvc_dcc.c
index 6470f63..435f6fa 100644
--- a/drivers/char/hvc_dcc.c
+++ b/drivers/char/hvc_dcc.c
@@ -33,54 +33,29 @@
 static inline u32 __dcc_getstatus(void)
 {
 	u32 __ret;
-
-	asm("mrc p14, 0, %0, c0, c1, 0	@ read comms ctrl reg"
+	asm volatile("mrc p14, 0, %0, c0, c1, 0	@ read comms ctrl reg"
 ...
From: Daniel Walker
Date: Monday, December 20, 2010 - 10:51 am

I would expect to see three patches. One that adds volatile, which
appears to be a good fix. Another patch that changes the assembly lines,
and another that does the clean up. The last two are more controversial
ones.

Daniel

-- 
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.


--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 11:39 am

Ok. I'll send a series later today to give some more time for feedback.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Nicolas Pitre
Date: Monday, December 20, 2010 - 11:46 am

I think you can do that right away.  Splitting the patch will make that 
feedback easier to provide.


Nicolas
--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 1:08 pm

Casting and anding with 0xff is unnecessary in
hvc_dcc_put_chars() since buf is already a char[].
__dcc_get_char() can't return an int less than 0 since it only
returns a char. Simplify the if statement in hvc_dcc_get_chars()
to take this into account.

Cc: Daniel Walker <dwalker@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/char/hvc_dcc.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/char/hvc_dcc.c b/drivers/char/hvc_dcc.c
index 155ec10..ad23cc8 100644
--- a/drivers/char/hvc_dcc.c
+++ b/drivers/char/hvc_dcc.c
@@ -89,7 +89,7 @@ static int hvc_dcc_put_chars(uint32_t vt, const char *buf, int count)
 		while (__dcc_getstatus() & DCC_STATUS_TX)
 			cpu_relax();
 
-		__dcc_putchar((char)(buf[i] & 0xFF));
+		__dcc_putchar(buf[i]);
 	}
 
 	return count;
@@ -99,15 +99,11 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
 {
 	int i;
 
-	for (i = 0; i < count; ++i) {
-		int c = -1;
-
+	for (i = 0; i < count; ++i)
 		if (__dcc_getstatus() & DCC_STATUS_RX)
-			c = __dcc_getchar();
-		if (c < 0)
+			buf[i] = __dcc_getchar();
+		else
 			break;
-		buf[i] = c;
-	}
 
 	return i;
 }
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 1:08 pm

Here are the split patches.

The first two patches cleanup and fix the hvc_dcc driver for my
compiler. The final patch is more controversial, it removes the
v6 and v7 differences in this driver.

Stephen Boyd (3):
  hvc_dcc: Fix bad code generation by marking assembly volatile
  hvc_dcc: Simplify put_chars()/get_chars() loops
  hvc_dcc: Simplify assembly for v6 and v7 ARM

 drivers/char/hvc_dcc.c |   43 +++++++------------------------------------
 1 files changed, 7 insertions(+), 36 deletions(-)

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 1:08 pm

Without marking the asm __dcc_getstatus() volatile my compiler
decides it can cache the value of __ret in a register and then
check the value of it continually in hvc_dcc_put_chars() (I had
to replace get_wait/put_wait with 1 and fixup the branch
otherwise my disassembler barfed on __dcc_(get|put)char).

00000000 <hvc_dcc_put_chars>:
   0:   ee103e11        mrc     14, 0, r3, cr0, cr1, {0}
   4:   e3a0c000        mov     ip, #0  ; 0x0
   8:   e2033202        and     r3, r3, #536870912      ; 0x20000000
   c:   ea000006        b       2c <hvc_dcc_put_chars+0x2c>
  10:   e3530000        cmp     r3, #0  ; 0x0
  14:   1afffffd        bne     10 <hvc_dcc_put_chars+0x10>
  18:   e7d1000c        ldrb    r0, [r1, ip]
  1c:   ee10fe11        mrc     14, 0, pc, cr0, cr1, {0}
  20:   2afffffd        bcs     1c <hvc_dcc_put_chars+0x1c>
  24:   ee000e15        mcr     14, 0, r0, cr0, cr5, {0}
  28:   e28cc001        add     ip, ip, #1      ; 0x1
  2c:   e15c0002        cmp     ip, r2
  30:   bafffff6        blt     10 <hvc_dcc_put_chars+0x10>
  34:   e1a00002        mov     r0, r2
  38:   e12fff1e        bx      lr

As you can see, the value of the mrc is checked against
DCC_STATUS_TX (bit 29) and then stored in r3 for later use.
Marking the asm volatile produces the following:

00000000 <hvc_dcc_put_chars>:
   0:   e3a03000        mov     r3, #0  ; 0x0
   4:   ea000007        b       28 <hvc_dcc_put_chars+0x28>
   8:   ee100e11        mrc     14, 0, r0, cr0, cr1, {0}
   c:   e3100202        tst     r0, #536870912  ; 0x20000000
  10:   1afffffc        bne     8 <hvc_dcc_put_chars+0x8>
  14:   e7d10003        ldrb    r0, [r1, r3]
  18:   ee10fe11        mrc     14, 0, pc, cr0, cr1, {0}
  1c:   2afffffd        bcs     18 <hvc_dcc_put_chars+0x18>
  20:   ee000e15        mcr     14, 0, r0, cr0, cr5, {0}
  24:   e2833001        add     r3, r3, #1      ; 0x1
  28:   e1530002        cmp     r3, r2
  2c:   bafffff5        blt     8 <hvc_dcc_put_chars+0x8>
  30:   e1a00002        mov     r0, ...
From: Nicolas Pitre
Date: Monday, December 20, 2010 - 2:39 pm

From: Pavel Machek
Date: Sunday, January 2, 2011 - 2:00 am

Is volatile needed here? If __dcc_getstatus() return value is

Same here?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: David Brown
Date: Sunday, January 2, 2011 - 11:49 am

That's not really the issue being fixed.  Without the volatile, the
compiler is free to cache and reuse a previously loaded status value.
It is important that the status be read each time.

I don't think there is a way of indicating that assembly needs to happen
for each use, but that it is OK to discard if the value isn't used.
'volatile' is a bit overloaded.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--

From: Pavel Machek
Date: Sunday, January 2, 2011 - 10:50 pm

From: Tony Lindgren
Date: Tuesday, January 4, 2011 - 11:49 am

Acked-by: Tony Lindgren <tony@atomide.com>
--

From: Arnaud Lacombe
Date: Monday, December 20, 2010 - 2:49 pm

Hi,

What compiler ? That might be a usefull information to know,
espectially 5 years from now when tracing code history. There has been
similar issue with gcc 4.5 recently. AFAIK, in the same idea, the
final change has been to mark the asm volatile.

Thanks,
 - Arnaud
--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 2:52 pm

Sure, we can replace "my compiler" with "my compiler (arm-eabi-gcc (GCC)
4.4.0)".

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Nicolas Pitre
Date: Monday, December 20, 2010 - 3:10 pm

Compiler version doesn't matter -- this is a simple correctness issue.

If an inline asm statement provides an output value then the compiler is 
free to cache that value in a register, unless the inline asm is marked 
so that the value may change from one invocation to another.  Also, the 
compiler is free to eliminate the inline asm statement entirely as well 
if the output value provided by that inline asm is never used... unless 
if the inline asm is marked as having side effects.  In both cases the 
volatile qualifier does indicate that the returned value may change 
(new status flag state) and 
that the asm code therein has side effects (e.g. pop a character off a 
FIFO).

Sometimes the desired effect can be indicated by clever usage of 
parameter constraints, but in this case the volatile keyword is most 
appropriate.


Nicolas
--

From: Stephen Boyd
Date: Monday, December 20, 2010 - 1:08 pm

The inline assembly differences for v6 vs. v7 in the hvc_dcc
driver are purely optimizations. On a v7 processor, an mrc with
the pc sets the condition codes to the 28-31 bits of the register
being read. It just so happens that the TX/RX full bits the DCC
driver is testing for are high enough in the register to be put
into the condition codes. On a v6 processor, this "feature" isn't
implemented and thus we have to do the usual read, mask, test
operations to check for TX/RX full.

Since we already test the RX/TX full bits before calling
__dcc_getchar() and __dcc_putchar() we don't actually need to do
anything special for v7 over v6. The only difference is in
hvc_dcc_get_chars(). We would test RX full, poll RX full, and
then read a character from the buffer, whereas now we will test
RX full, read a character from the buffer, and then test RX full
again for the second iteration of the loop. It doesn't seem
possible for the buffer to go from full to empty between testing
the RX full and reading a character. Therefore, replace the v7
versions with the v6 versions and everything works the same.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Daniel Walker <dwalker@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/char/hvc_dcc.c |   24 ------------------------
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/char/hvc_dcc.c b/drivers/char/hvc_dcc.c
index ad23cc8..435f6fa 100644
--- a/drivers/char/hvc_dcc.c
+++ b/drivers/char/hvc_dcc.c
@@ -40,19 +40,6 @@ static inline u32 __dcc_getstatus(void)
 }
 
 
-#if defined(CONFIG_CPU_V7)
-static inline char __dcc_getchar(void)
-{
-	char __c;
-
-	asm volatile("get_wait:	mrc p14, 0, pc, c0, c1, 0                  \n\
-			bne get_wait                                       \n\
-			mrc p14, 0, %0, c0, c5, 0	@ read comms data reg"
-		: "=r" (__c) : : "cc");
-
-	return __c;
-}
-#else
 static inline char ...
From: Nicolas Pitre
Date: Monday, December 20, 2010 - 2:44 pm

From: Tony Lindgren
Date: Tuesday, January 4, 2011 - 11:52 am

Acked-by: Tony Lindgren <tony@atomide.com>
--

From: Stephen Boyd
Date: Wednesday, December 1, 2010 - 1:20 pm

Just marking __dcc_getstatus volatile gives me

00000038 <hvc_dcc_get_chars>:
  38:   ee10fe11        mrc     14, 0, pc, cr0, cr1, {0}
  3c:   1afffffd        bne     38 <hvc_dcc_get_chars>
  40:   ee103e15        mrc     14, 0, r3, cr0, cr5, {0}
  44:   e3a00000        mov     r0, #0  ; 0x0
  48:   e6ef3073        uxtb    r3, r3
  4c:   ea000004        b       64 <hvc_dcc_get_chars+0x2c>
  50:   ee10ce11        mrc     14, 0, ip, cr0, cr1, {0}
  54:   e31c0101        tst     ip, #1073741824 ; 0x40000000
  58:   012fff1e        bxeq    lr
  5c:   e7c13000        strb    r3, [r1, r0]
  60:   e2800001        add     r0, r0, #1      ; 0x1
  64:   e1500002        cmp     r0, r2
  68:   bafffff8        blt     50 <hvc_dcc_get_chars+0x18>
  6c:   e12fff1e        bx      lr

Seems the compiler keeps the value of __dcc_getchar() in r3 for the
duration of the loop. So we need to mark that one volatile too. I don't
think __dcc_putchar() needs to be marked volatile but it probably

Definitely for TX since it seems like a redundant loop, but I agree RX
code has changed. Instead of

If RX buffer full
Poll for RX buffer full
Read character from RX buffer

we would have

If RX buffer full
Read character from RX buffer

which doesn't seem all that different assuming the RX buffer doesn't go

Yes, it's much clearer that way.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

From: Stephen Boyd
Date: Tuesday, December 7, 2010 - 12:10 pm

Tony, any thoughts?

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--

Previous thread: [thisops uV3 00/18] Upgrade of this_cpu_ops V3 by Christoph Lameter on Tuesday, November 30, 2010 - 12:07 pm. (35 messages)

Next thread: [RFC v2 0/8] TI DMM-TILER driver by David Sin on Tuesday, November 30, 2010 - 12:58 pm. (22 messages)