Re: Initcall ordering problem (TTY vs modprobe vs MD5) and cryptomgr problem

Previous thread: [PATCH 2/2 RESEND] regulator: lp3971 - remove redundant checking for count in lp3971_i2c_read() by Axel Lin on Thursday, August 5, 2010 - 5:11 pm. (1 message)

Next thread: [PATCH 05/10] perf hists: Fixup addr snprintf width on 32 bit arches by Arnaldo Carvalho de Melo on Thursday, August 5, 2010 - 6:46 pm. (5 messages)
From: David Howells
Date: Thursday, August 5, 2010 - 6:01 pm

My test box is imploding during boot due to an init ordering problem between
md5 and /dev/console and modprobe failure.  Furthermore, the md5 module isn't
finding cryptomgr, despite it being compiled in and initialised.


I've made __request_module() print some stuff and I've set initcall_debug=1 on
the command line, and what I see is this:

  (1) The compiled-in MD5 crypto module tries to load the cryptomgr module
      during the initcall phase of boot up:

	calling  md5_mod_init+0x0/0x12 @ 1
	============================================
	__request_module(cryptomgr)
	Pid: 1, comm: swapper Not tainted 2.6.35-cachefs+ #308
	Call Trace:
	 [<ffffffff8104227f>] __request_module+0xb1/0x18e
	 [<ffffffff81049638>] ? up_read+0x1e/0x36
	 [<ffffffff8104a166>] ? __blocking_notifier_call_chain+0x56/0x62
	 [<ffffffff811cec26>] crypto_probing_notify+0x34/0x4b
	 [<ffffffff811d0414>] crypto_wait_for_test+0x1d/0x66
	 [<ffffffff811d0565>] crypto_register_alg+0x4e/0x55
	 [<ffffffff816a68b7>] ? md5_mod_init+0x0/0x12
	 [<ffffffff811d4504>] crypto_register_shash+0x92/0x94
	 [<ffffffff816a68c7>] md5_mod_init+0x10/0x12
	 [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
	 [<ffffffff8168b6ca>] kernel_init+0x184/0x20d
	 [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
	 [<ffffffff813de3fc>] ? restore_args+0x0/0x30
	 [<ffffffff8168b546>] ? kernel_init+0x0/0x20d
	 [<ffffffff81002cd0>] ? kernel_thread_helper+0x0/0x10
	============================================

  (2) modprobe can't find the cryptomgr module (because it's not installed
      where modprobe wants to look).  modprobe then attempts to open
      /dev/console, presumably to write an error message to it:

	============================================
	__request_module(char-major-5-1)
	Pid: 345, comm: modprobe Not tainted 2.6.35-cachefs+ #308
	Call Trace:
	 [<ffffffff8104227f>] __request_module+0xb1/0x18e
	 [<ffffffff810537ac>] ? mark_held_locks+0x52/0x70
	 [<ffffffff813dcad6>] ? mutex_lock_nested+0x274/0x28f
	 ...
From: Herbert Xu
Date: Thursday, August 5, 2010 - 6:17 pm

Indeed this changeset is buggy.  It makes cryptomgr return a value
as if it was not loaded.  This triggers the module load.

I'll send you a patch.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Thursday, August 5, 2010 - 6:40 pm

This patch should do the trick:

commit 326a6346ffb5b19eb593530d9d3096d409e46f62
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Fri Aug 6 09:40:28 2010 +0800

    crypto: testmgr - Fix test disabling option
    
    This patch fixes a serious bug in the test disabling patch where
    it can cause an spurious load of the cryptomgr module even when
    it's compiled in.
    
    It also negates the test disabling option so that its absence
    causes tests to be enabled.
    
    The Kconfig option is also now behind EMBEDDED.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 1cd497d..6f5c50f 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -101,13 +101,12 @@ config CRYPTO_MANAGER2
 	select CRYPTO_BLKCIPHER2
 	select CRYPTO_PCOMP2
 
-config CRYPTO_MANAGER_TESTS
-	bool "Run algolithms' self-tests"
-	default y
-	depends on CRYPTO_MANAGER2
+config CRYPTO_MANAGER_DISABLE_TESTS
+	bool "Disable run-time self tests"
+	depends on CRYPTO_MANAGER2 && EMBEDDED
 	help
-	  Run cryptomanager's tests for the new crypto algorithms being
-	  registered.
+	  Disable run-time self tests that normally take place at
+	  algorithm registration.
 
 config CRYPTO_GF128MUL
 	tristate "GF(2^128) multiplication functions (EXPERIMENTAL)"
diff --git a/crypto/algboss.c b/crypto/algboss.c
index 40bd391..791d194 100644
--- a/crypto/algboss.c
+++ b/crypto/algboss.c
@@ -206,13 +206,16 @@ err:
 	return NOTIFY_OK;
 }
 
-#ifdef CONFIG_CRYPTO_MANAGER_TESTS
 static int cryptomgr_test(void *data)
 {
 	struct crypto_test_param *param = data;
 	u32 type = param->type;
 	int err = 0;
 
+#ifdef CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
+	goto skiptest;
+#endif
+
 	if (type & CRYPTO_ALG_TESTED)
 		goto skiptest;
 
@@ -267,7 +270,6 @@ err_put_module:
 err:
 	return NOTIFY_OK;
 }
-#endif /* CONFIG_CRYPTO_MANAGER_TESTS */
 
 static int cryptomgr_notify(struct notifier_block *this, unsigned long msg,
 			    void ...
From: Linus Torvalds
Date: Thursday, August 5, 2010 - 7:01 pm

Why do you still want to force-enable those tests? I was going to
complain about the "default y" anyway, now I'm _really_ complaining,
because you've now made it impossible to disable those tests. Why?

People always think that their magical code is so important. I tell
you up-front that is absolutely is not. Just remove the crap entirely,
please.

            Linus
--

From: Herbert Xu
Date: Thursday, August 5, 2010 - 7:35 pm

Because it can save data.  Each cryptographic algorithm (such as
AES) may have multiple impelmentations, some of which are hardware-
based.

The purpose of these tests are to make a particular driver or
implementation available only if it passes them.  So your encrypted
disk/file system will not be subject to a hardware/software combo
without at least some semblance of testing.

The last thing you want to is to upgrade your kernel with a new
hardware crypto driver that detects that you have a piece of rarely-
used crypto hardeware, decides to use it and ends up making your
data toast.

But whatever, if you want the default to be no tests, that's fine.
Here's the patch to do just that.

commit 00ca28a507b215dcd121735f16764ea4173c4ff9
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Fri Aug 6 10:34:00 2010 +0800

    crypto: testmgr - Default to no tests
    
    On Thu, Aug 05, 2010 at 07:01:21PM -0700, Linus Torvalds wrote:
    > On Thu, Aug 5, 2010 at 6:40 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote:
    > >
    > > -config CRYPTO_MANAGER_TESTS
    > > -       bool "Run algolithms' self-tests"
    > > -       default y
    > > -       depends on CRYPTO_MANAGER2
    > > +config CRYPTO_MANAGER_DISABLE_TESTS
    > > +       bool "Disable run-time self tests"
    > > +       depends on CRYPTO_MANAGER2 && EMBEDDED
    >
    > Why do you still want to force-enable those tests? I was going to
    > complain about the "default y" anyway, now I'm _really_ complaining,
    > because you've now made it impossible to disable those tests. Why?
    
    As requested, this patch sets the default to y and removes the
    EMBEDDED dependency.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 6f5c50f..e573077 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -103,7 +103,8 @@ config CRYPTO_MANAGER2
 
 config CRYPTO_MANAGER_DISABLE_TESTS
 	bool "Disable run-time self tests"
-	depends on ...
From: Linus Torvalds
Date: Thursday, August 5, 2010 - 9:20 pm

Umm. The _developer_ had better test the thing. That is absolutely
_zero_ excuse for then forcing every boot for every poor user to re-do
the test over and over again.

Guys, this comes up every single time: you as a developer may think
that your code is really important, but get over yourself already.
It's not so important that everybody must be forced to do it.

                      Linus
--

From: Kyle Moffett
Date: Thursday, August 5, 2010 - 9:50 pm

On Fri, Aug 6, 2010 at 00:20, Linus Torvalds

Speaking as a user whose been bitten several times by bad crypto
implementations, I'd personally rather have this testing on by default
(if the crypto API it depends on is on).  It's pretty damn inexpensive
to do a few brief crypto operations during initialization as a quick
smoke test.  We already do something somewhat similar when loading the
RAID5/RAID6 driver, although admittedly that's a speed-test for
picking an optimized algorithm.

You should also realize that crypto drivers are very much *NOT* in the
same situation as most other drivers.  Without this test, adding a new
crypto hardware driver to the kernel is a completely unsafe operation,
because it could completely break users setups.  You have previously
said you're fine accepting new drivers even after the initial merge
window because they can't break anything, but in crypto that's not
true.

I've actually had it trigger in exactly the described situation.  I
had a box with an encrypted filesystem that I downloaded a new distro
kernel on with new drivers.  The new kernel included a bunch of new
"EXPERIMENTAL" drivers for hardware, none of which I thought I cared
about until I noticed in "dmesg" that one of them was getting enabled
and then failing tests.

So there are unique and compelling reasons for default-enabled basic
smoke tests of cryptographic support during boot.  To be honest, the
test and integration engineer in me would like it if there were more
intensive in-kernel POST tests that could be enabled by a kernel
parameter or something for high-reliability embedded devices.

Cheers,
Kyle Moffett
--

From: Olivier Galibert
Date: Friday, August 6, 2010 - 1:06 am

Maybe Linus would be happier if the self-tests were limited (by
default) to the hardware accelerators?  Having a software backup and
the risk of data loss indeed makes things different.

Of course in practice without the tests your boot would probably just
have failed.  Badly-decrypted root partitions tend to be noticed as
such long before trying to write to them.  Then you would have bitched
on the list and the driver would have been fixed or removed faster
than having to wait for you (or other people with the hardware issue)
to notice the spew in dmesg.

  OG.

--

From: Linus Torvalds
Date: Friday, August 6, 2010 - 8:22 am

No. I'd be happier if it was an OPTION.

And it damn well defaults to "off". Like all other options.

Then, for people who use it, and worry (and distro test kernels etc),
turn it on. But don't make it a forced feature, and don't make it
something that people think they _should_ turn on.

I have crypto enabled, but I don't _use_ it. The upside for me is
zero. Nil. Nada. And I bet that's the common case.

And dammit, it you don't trust the hardware, don't send the driver
upstreams. And if you worry about alpha-particles, you should run a
RAM test on every boot. But don't ask _me_ to run one.

It's that simple.

                    Linus
--

From: David Howells
Date: Thursday, August 5, 2010 - 7:23 pm

Even if he does remove it, that still leaves the problem that modprobe can be
invoked and fail before tty_init() gets called:-/

I wonder if tty_init() should be moved up, perhaps to immediately after
chrdev_init().

David
--

From: Linus Torvalds
Date: Thursday, August 5, 2010 - 7:27 pm

I do think that sounds sane. The tty layer is kind of special.

I wouldn't call it _after_ chrdev_init(), though, I'd call it _from_
chrdev_init(). Doesn't that make more sense (and keep it out of
fs/dcache.c, which is an odd place to have some tty init).

But maybe there's some reason why it's an initcall. Unlikely.

             Linus
--

From: David Howells
Date: Thursday, August 5, 2010 - 7:27 pm

It does work.

David
--

Previous thread: [PATCH 2/2 RESEND] regulator: lp3971 - remove redundant checking for count in lp3971_i2c_read() by Axel Lin on Thursday, August 5, 2010 - 5:11 pm. (1 message)

Next thread: [PATCH 05/10] perf hists: Fixup addr snprintf width on 32 bit arches by Arnaldo Carvalho de Melo on Thursday, August 5, 2010 - 6:46 pm. (5 messages)