Re: FCNTL Performance problem

Previous thread: Re: Attempted summary of suspend-blockers LKML thread by Alan Stern on Sunday, August 8, 2010 - 12:07 pm. (1 message)

Next thread: [PATCHv2 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full() by Michal Nazarewicz on Sunday, August 8, 2010 - 12:29 pm. (10 messages)
From: Rob Donovan
Date: Sunday, August 8, 2010 - 12:26 pm

Hi,

We use CISAM files  a lot in our application, which uses the FCNTL system
call for record locking.

I've noticed a possible problem in though with FCNTL, after a lot of work
using the systemtap tracing program.

The problem is, when you have lots of F_RDLCK locks being created and
released, then it slows down any F_WRLCK with F_SETLKW locks massively.

It's because the F_RDLCK seems to 'drown out' the write locks. Because our
system (it's a large system with 700-800 users, so lots of activity) does
lots more reads than writes, it causes the writes to be very slow. 

This is because (I think), if I have say 15 processes doing read locks, and
1 process doing write wait locks, then when the write tries to get a lock.
It can't, because process 1 has a read lock, so it. Then I think how it
works is that when the read lock gets released it then wakes up any other
locks waiting (i.e. the write), so that it can then try to lock. The problem
is that, if process 1 creates a read lock, then the write process tries to
get its lock and cant, so it sleeps, then process 2 gets a read lock (which
it can at this point) and then process 1 releases its lock, wakes up the
write process, but because process 2 got its read lock, the write process
still can't get its lock, so its sleeps again. This goes on for quite some
time, until eventually, the write process gets lucky and actually grabs a
lock.

(I think the write lock actually sits in the 'for' loop in
do_lock_file_wait() in fs/locks.c,  waiting for the lock to be freed)

Obviously, this slows down the write locks a lot.

I can show this by running some code (not the actual application code, just
a test example to show it happening a lot).

If you touch a file 'control.dat' in your current dir, and run test_read
(code example below) in the background with 15 sessions, and then run
test_write once. test_write will hardly ever gets a write lock (seen by
systemtap or strace) and will just wait. It's not that bad in our
application, ...
From: Chris Friesen
Date: Monday, August 9, 2010 - 2:41 pm

What you're seeing is classical "reader priority" behaviour.  The
alternative is "writer priority".  I don't think POSIX specifies which
behaviour to use, so it's up to the various implementations.

If you really need writer priority, how about building your own lock
object in userspace on top of fcntl locks?

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com
--

From: Rob Donovan
Date: Wednesday, August 11, 2010 - 9:19 am

Hi,

Not sure it's about read or write 'priority' so much is it?

I wouldn't want to particularly favour writes over reads either, or it will
just make the problem happen for reads wouldn't it?

And to do this, and make it favour writes, I presume it would have to be
coded into the kernel to do this, there isn't any 'switch' for me to try?

Could we not have it 'fairly' process locks? So that if a read lock comes
along, and there is a write lock waiting for another read lock to unlock,
then the 2nd read has to wait for the write lock. Not particularly because
the write lock has priority, but because it was requested after the write
lock was.

In my example, if you run 15 of the read process, the write process never
gets the chance to lock, ever, as its continually blocked by 1 or more of
the reads.

Running 15 of the read processes is much more load than our real system
gets, so we don't get writes blocked totally like that, but they can block
for 10 or more seconds sometimes. Which is quite excessive for 1 write.

To me, it seems like there needs to be something in the fcntl() routines so
that when a lock is called with F_SETLKW, if it gets blocked then it needs
to put its 'request' in some kind of queue, so that if any more reads come
along, they know there is already a lock waiting to get the lock before it,
so they queue up behind it. 

Or is that kind of checking / queuing going to slow down the calls to much,
maybe?

Example of what is happening in my test:

Process 1, creates a read lock
Process 2, tries to create a write wait lock, but cant because of process 1,
so it sleeps.
Process 3, creates a read lock (since nothing is blocking this) 
Process 1, unlocks and wakes up any waiting locks, i.e. the write lock
process 2.
Process 2, gets waken up, and tries to lock, but cant because of process 3
read lock, so sleeps again.
Process 4, creates a read lock (since nothing is blocking this)
Process 3, unlocks and wakes up any waiting locks, i.e. the write ...
From: Chris Friesen
Date: Wednesday, August 11, 2010 - 10:00 am

No, because readers can always share the lock with other readers if
there is no writer waiting.

If you have one or more readers already holding the lock, with a writer
waiting, you have two choices:  1) let the new reader in under the
assumption that they'll be quick and won't extend the current "read"
usage by much, or 2) block the new reader until after any waiting
writers get a chance to get in.  The first is called reader priority,

The locks are written by glibc and the kernel.  I haven't looked at
fcntl locking so I'm not sure where the bulk of the code is.  I'd


Again, this would be implementing writer priority.

POSIX doesn't guarantee either form, so if you need a writer-priority
lock then fcntl() isn't a good choice.  In fact in most cases I suspect
you'll find that read/write locks are implemented as reader priority
since the expectation is that writes are infrequent.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com
--

Previous thread: Re: Attempted summary of suspend-blockers LKML thread by Alan Stern on Sunday, August 8, 2010 - 12:07 pm. (1 message)

Next thread: [PATCHv2 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full() by Michal Nazarewicz on Sunday, August 8, 2010 - 12:29 pm. (10 messages)