If it happens it just won't work on 32bit.
I ran the numbers and the numbers showed that you need > 1.5GB of lowmem
with a somewhat realistic scenario (32K per thread) at 50k threads. And
subtracting 4k from that 32k number won't make any significant
difference (still 1.3GB)
If you claim that works on a 32bit system with typically 300-600MB
lowmem available (which is also shared by other subsystem) I know who
sounds foolish.
-Andi
--