Inexplicable I/O latency using worker threads

Submitted by mcatos
on December 12, 2008 - 5:45am

Hello all.

I am facing a weird problem with a virtual block driver I made concerning excessive I/O latency.

My block driver intercepts requests and redirects them to a real block device,
but not just be setting the bio->bi_bdev field, I create new bios.

Anyway, my problem is that for load balancing reasons I need per-CPU worker threads
where I enqueue requests and let them do all the work. If I use 2 threads in a round
robin manner (request 1 served by CPU 0, 2 by CPU1, 3 by CPU0 and so on), performance
is inexplicably low.

If I choose only one CPU to act as a worker the problem is gone. The difference of measured
I/O latency is more than 30 times.

What could be happening?

I'm using a vanilla 2.6.18.8.

Thanx in advance.