Re: [PATCH] padata: section cleanup

Previous thread: [PATCH 2/7] fbdev: section cleanup in efifb by Henrik Kretzschmar on Friday, March 26, 2010 - 11:29 pm. (7 messages)

Next thread: Status of a patch to export built-in kernel module info to sysfs by Yangkook Kim on Saturday, March 27, 2010 - 12:20 am. (1 message)
From: Henrik Kretzschmar
Date: Friday, March 26, 2010 - 11:53 pm

This patch removes the __cupinit from padata_cpu_callback(),
which is refered by the exportet function padata_alloc().

This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
which should happen very often.

WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
The function padata_alloc() references
the function __cpuinit padata_cpu_callback().
This is often because padata_alloc lacks a __cpuinit
annotation or the annotation of padata_cpu_callback is wrong.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
---
 kernel/padata.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 93caf65..32fdd5f 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -569,7 +569,7 @@ void padata_stop(struct padata_instance *pinst)
 }
 EXPORT_SYMBOL(padata_stop);
 
-static int __cpuinit padata_cpu_callback(struct notifier_block *nfb,
+static int padata_cpu_callback(struct notifier_block *nfb,
 					 unsigned long action, void *hcpu)
 {
 	int err;
-- 
1.7.0

--

From: Steffen Klassert
Date: Monday, March 29, 2010 - 12:42 am

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>

Thanks,

Steffen
--

From: Herbert Xu
Date: Monday, March 29, 2010 - 1:15 am

Patch applied.  Thanks!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Andrew Morton
Date: Wednesday, March 31, 2010 - 2:58 pm

On Mon, 29 Mar 2010 16:15:50 +0800

(wtf?)

OK, on behalf of thousands I ask: what the heck is kernel/padata.c? 

Seems to have popped up in 2.6.34, positioned as generic kernel-wide
code only it has been secreted away on the linux-crypto list.  Please
don't do this.

What is it for, how does it work, where might it otherwise be used,
what are the dynamics, why does it use softirqs rather than (say)
kernel threads and how do we stop it from using yield()?

--

From: Herbert Xu
Date: Wednesday, March 31, 2010 - 5:24 pm

It was posted to linux-kernel multiple times.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Andrew Morton
Date: Wednesday, March 31, 2010 - 3:04 pm

I can find it once, in December 2009, in the middle of a massive thread
called "workqueue thing".  It had no replies.

--

From: Herbert Xu
Date: Wednesday, March 31, 2010 - 6:38 pm

OK you're right.  It's only been posted to lkml once.  However,
this has been discussed on netdev since 2008 at least.  The patch
in its current form has also been acked by David Miller.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Andrew Morton
Date: Wednesday, March 31, 2010 - 5:29 pm

Thing is, this isn't net code and it isn't crypto code - it's a
kernel-wide utility.  Everyone needs to know at least a bit about it
and code-reviewers such as myself need to know a lot about it, so we
can tell people "hey dummy, shouldn't you use padata".

So please, can we belatedly go through the process of educating the
developers about this thing you guys have written for us?

--

From: David Miller
Date: Thursday, April 1, 2010 - 12:15 am

From: Andrew Morton <akpm@linux-foundation.org>

Thanks goodness, because if we had put a private copy in the
networking or the crypto code someone would have given us a hard time.

Wait a second... we're being given a hard time anyways. :-)

Kidding aside, why didn't anyone show any interest in the patch when
it was posted to lkml?  You can say it was smoothered together with
some crypto stuff, but that was absolutely the logical thing to do
because the follow-on crypto patches showed what the thing was going
to be used for.
--

From: Andrew Morton
Date: Thursday, April 1, 2010 - 6:59 am

Because the target audience for this work weren't informed about it,

I dunno - these things happen.  The padata patch appears to have been
sent once to netdev as an RFC back in 2008 and didn't get any replies
there either.  Personally I pay less attention to patches which appear
in the middle of some discussion because I assume they're for
discussion purposes and will come back later.

If I'd known this code was coming I would have reviewed it at an
appropriate time.  Probably others are interested also.
--

From: Steffen Klassert
Date: Thursday, April 1, 2010 - 1:11 am

padata can be used to spread cpu intensive work over multiple cpus. It came
into being with the idea to parallelize IPsec. We separated the generic
parts out to padata in the hope it can be useful to others too.
So padata is used to parallelize cpu intensive codepaths, like crypto
operations. As soon as the cpu intensive work is done, it is possible to
serialize again without getting reorder of the parallelized objects.
This is in particular important for IPsec to keep the network packets
in the right order.

To achieve a parallelization of the cpu intensive parts of IPsec, we
created the pcrypt parallel crypto template (crypto/pcrypt.c). This
wraps around an existing crypto algorithm and does the
parallelization/serialization by using padata. So pcrypt is an example on
how to use padata.

I'm about to write some documentation for padata/pcrypt. The big picture
is as follows. A user can allocate a padata instance by using padata_alloc().
The user can specify the cpumask of the cpus and the workqueue he wants to
use for the parallel codepath on allocation time. After allocation it is
possible to start/stop the padata instance with padata_start() padata_stop().
The actual parallel codepath is entered by calling padata_do_parallel().
The parallelized objects, in case of pcrypt the crypto requests, need to
embed padata's control structure of type struct padata_priv. This structure
is the anchor for padata to handle the object. padata_do_parallel() takes
three parameters, the padata instance on which to parallelize, the control
structure that is embedded in the object and a callback cpu on which the
object will appear after serialization. Once the cpu intensive work is done,
it is possible to serialize by calling padata_do_serial(). This function
takes the padata control structure as parameter. It brings the parallelized
objects back to the order they were before the parallelization and sends
them to the cpu which was specified as the callback cpu.

Internally padata uses a ...
From: Andrew Morton
Date: Thursday, April 1, 2010 - 2:59 pm

On Thu, 1 Apr 2010 10:11:26 +0200


OK.  Doing it in threads is better because it lets the CPU scheduler
regulate things and the load becomes visible and correctly attributed

yield() is a bit problematic - it can sometimes take enormous amounts
of time.  It wasn't always that way - it changed vastly in 2002 and has
since got a bit better (I think).  But generally a yield-based busywait
is a concern and it'd be better to use some more well-defined primitive
such as a lock or wait_for_completion(), etc.

I'd suggest at least loading the system up with 100 busywait processes
and verify that the padata code still behaves appropriately.


Some other stuff:

- The code does

	might_sleep()
	mutex_lock()

  in a lot of places.  But mutex_lock() does might_sleep() too, so
  it's a waste of space and will cause a double-warning if it triggers.

- The code does local_bh_disable() and spin_trylock_bh().  I assume
  that this is to support this code being used from networking
  softirqs.  So the code is usable frmo softirq context and from
  process context but not from hard IRQ context?

  It'd be useful if these designed decisions were described somewhere:
  what's the thinking behind it and what are the implications.

- padata_reorder() does a trylock.  It's quite unobvious to the
  reader why it didn't just do spin_lock().  Needs a code comment.

- the try-again loop in that function would benefit from a comment
  too.  Why is it there, and in what circumstances will the goto be
  taken?

  Once that's understood, we can see under which conditions the code
  will livelock ;)

- did __padata_add_cpu() need to test cpu_active_mask?  Wouldn't it
  be a bug for this to be called against an inactive CPU?

- similarly, does __padata_remove_cpu() need to test cpu_online_mask?

- It appears that the cpu-hotplug support in this code will be
  compiled-in even if the kernel doesn't support CPU hotplug.  It's a
  sneaky feature of the hotcpu_notifier()->cpu_notifier() macros ...
From: Herbert Xu
Date: Thursday, April 1, 2010 - 6:09 pm

Padata must guarantee that even though the work may occur in
parallel, anyone submitting work to it in serial must observe
the result strictly in the order that they were submitted.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Steffen Klassert
Date: Friday, April 2, 2010 - 4:23 am

Yes, wait_for_completion() is probably the better way to do the





The try-again loop it to handle a corner case that appears with the trylock.

Well, the idea behind that was to maintain a cpumask the user wishes to
use for parallelization. The cpumask that is actually used, is the logical and
between user's cpumask and the cpu_active_mask. The reason for maintaining
such a cpumask was to keep the user supplied cpumask untouched on hotplug
events. If a cpu goes down and then up again padata can simply reuse this cpu
if it is in the user's cpumask. So this made it possible to add even offline
cpus to the user's cpumask. Once the cpu comes online, it will be used.

Yes, I noticed that already. A patch that ifdef the cpu hotplug code is

Hm, good point. For the moment I think it would be even sufficient to touch
just the logical and between the supplied and the active mask. I'll look


The sequence numbers are in fact the key to maintain the order of the
parallelized objects. padata_priv.seq_nr is the sequence number that
identifies a object unique. The object is equipped with this number before
it is queued for the parallel codepath. On exit of the parallel codepath
this number is used to bring the objects back to the right order.
parallel_data.seq_nr maintains the 'next free sequence number' of the
padata instance. The next object that appears will be equipped with this
number as it's private sequence number (padata_priv.seq_nr).
parallel_data.seq_nr is incremented by one and then again contains the


It is possible to start/stop the padata instance. PADATA_INIT maintains
whether the instance is running. padata_do_parallel() returns 0 if the
instance is not initialized or stopped. It's up to the caller what to
do in this case. Maybe 0 is not the right thing to return in this case,

It returns -EINPROGRESS because the object is queued to a workqueue and
it returns asynchronous. It's at least in the crypto layer quite common
to do this if a crypto request is queued to ...
From: Jonathan Corbet
Date: Tuesday, April 6, 2010 - 11:00 am

On Thu, 1 Apr 2010 10:11:26 +0200

I hope that nobody minds if I went ahead and bashed something out?

	http://lwn.net/SubscriberLink/382257/fe415ce242a12c15/

Comments welcome.  I'd be happy to rework it into a Documentation/ file
if that would be useful.

jon
--

From: Steffen Klassert
Date: Wednesday, April 7, 2010 - 6:34 am

I think your article reflects patada and it's API as it

That would be great! I'd be glad to provide you with further
information if you need it.

Thanks a lot,

Steffen
--

From: Steffen Klassert
Date: Thursday, April 29, 2010 - 5:14 am

I wrote an article about parallelizing IPsec for the Linuxtag Conference
proceedings. This article covers also a section that describes the ideas of
padata on a formal level. I can send it if anybody is interested.

I'll send out the patches based on Andrew's review later today.

Steffen

PS: I'll be off for a week starting tomorrow evening, so I might not
respond during this time.
--

Previous thread: [PATCH 2/7] fbdev: section cleanup in efifb by Henrik Kretzschmar on Friday, March 26, 2010 - 11:29 pm. (7 messages)

Next thread: Status of a patch to export built-in kernel module info to sysfs by Yangkook Kim on Saturday, March 27, 2010 - 12:20 am. (1 message)