> Hi,
>
> I've noticed that vmalloc seems to be rather slow. I wrote a test kernel
> module to track down what was going wrong. The kernel module does one
> million vmalloc/touch mem/vfree in a loop and prints out how long it
> takes.
>
> The source of the test kernel module can be found as an attachment to
> this bz:
https://bugzilla.redhat.com/show_bug.cgi?id=581459
>
> When this module is run on my x86_64, 8 core, 12 Gb machine, then on an
> otherwise idle system I get the following results:
>
> vmalloc took 148798983 us
> vmalloc took 151664529 us
> vmalloc took 152416398 us
> vmalloc took 151837733 us
>
> After applying the two line patch (see the same bz) which disabled the
> delayed removal of the structures, which appears to be intended to
> improve performance in the smp case by reducing TLB flushes across cpus,
> I get the following results:
>
> vmalloc took 15363634 us
> vmalloc took 15358026 us
> vmalloc took 15240955 us
> vmalloc took 15402302 us
>
> So thats a speed up of around 10x, which isn't too bad. The question is
> whether it is possible to come to a compromise where it is possible to
> retain the benefits of the delayed TLB flushing code, but reduce the
> overhead for other users. My two line patch basically disables the delay
> by forcing a removal on each and every vfree.
>
> What is the correct way to fix this I wonder?
>
> Steve.
>