The OpenBSD random number subsystem uses an in-kernel entropy pool. This
data isn't used directly. When entropy is requested, the contents of the
pool are hashed with MD5, and the massaged output used to seed an RC4 PRNG.
In looking at the code, however, I notice we actually fold the MD5 output in
half. From extract_entropy():
MD5Final(buffer, &tmp);
/*
* In case the hash function has some recognizable
* output pattern, we fold it in half.
*/
buffer[0] ^= buffer[15];
buffer[1] ^= buffer[14];
buffer[2] ^= buffer[13];
buffer[3] ^= buffer[12];
buffer[4] ^= buffer[11];
buffer[5] ^= buffer[10];
buffer[6] ^= buffer[ 9];
buffer[7] ^= buffer[ 8];
/* Copy data to destination buffer */
bcopy(buffer, buf, i);
nbytes -= i;
buf += i;
My question: Why? What exactly are we protecting against, and is this really
protection? (the comment indicates "some recognizable output pattern, but
that means little to me as is) Can we really be sure it doesn't make things
worse?
Is this done elsewhere, or is it our particular brand of voodoo?
Happy ho ho,
-kj
First thought would be, in the event that there's a bias in MD5 (bit 12 is set 75% of the time), it would "help"? No, it doesn't. Maybe if output bit 12 is always the same as input bit 12 and we want to avoid revealing the input? That would work, assuming the xor bit is random. Despite its flaws, MD5 doesn't have any biases I'm aware of and should have an even distribution of bits, so the fold neither adds anything nor takes any more away (other than the obvious cut half).
I'm not aware of it being done elsewhere. Usually the recommendation is to truncate, rather than fold hash output. IMO we should reassess the output hash. Something like Whirlpool might be significantly faster given its large block size. -d
How would a preimage attack matter in this case? Even if I could pull one off, (i.e. guess the contents of the entropy pool based on the output of the hash), we perturb it again right afterwards. Furthermore, how would this be any different than choosing just the upper or lower half? And again, is this *ever* used directly, or is it simply the input to the RC4 generator? In which case, pulling off a preimage attack would be essentially miraculous. In any case, if there is not a good reason to keep it (since it's not standard practice), why not eliminate it? I just don't see what it accomplishes. Sure, it may be worth investigating another hash function. Maybe even another stream cipher for use with the PRNG, but we should be clear about what the requirements for such a hash / cipher should be. Certainly, speed is a factor for both. RC4 is hella fast, but since we need to dump k*256 bytes of keystream every time, something else may end up being faster. Lack of a distinguisher in the stream cipher would also seem to be important, of course, given the application. Set-up time matters for the stream cipher, since we routinely set up new instances for large random requests, and so on. Anyway, I'm muttering aloud now. In the meantime, is there any reason to keep the fold?
It gives you knowledge of the collection pool, which is what the very I don't know; like I said, most cryptographers that I have spoken to prefer It is used to seed RC4. In the past it used to be used for other things, This matters for userspace, but not for the kernel. We only start up one RC4 instance, so RC4's low key agility doesn't really bother us. -d
Hi Damien. But again, we perturb it immediately afterwards, so what good is such So one would have to guess the MD5 output from the RC4 output in order to even pull off an attack like this. This kind of complete break would seem... unlikely... That was what I was getting at with my first query (how would a Yeah, so without any good reason to truncate it, let's just use the whole hash, and hence, use all the entropy There are arc4random_buf () calls in the kernel. Those can use the arc4random_buf_large() mechanism, can thy not? Or are the requests typically too small? -kj
