I'm getting tons of this, and X fails to start
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_DEBUG_PREEMPT=yBUG: sleeping function called from invalid context at kernel/rwsem.c:47
in_atomic():1, irqs_disabled():0
no locks held by X/5879.
[<c012bcb1>] down_write+0x15/0x50
[<c01b53af>] do_shmat+0x235/0x3a0
[<c0106be2>] sys_ipc+0x146/0x263
[<c0102892>] sysenter_past_esp+0xa7/0xb5
[<c0102856>] sysenter_past_esp+0x6b/0xb5
=======================
BUG: scheduling while atomic: X/5879/0x00000005
no locks held by X/5879.
irq event stamp: 401318
hardirqs last enabled at (401317): [<c02d96c8>] __mutex_unlock_slowpath+0xb4/0x170
hardirqs last disabled at (401318): [<c010286f>] sysenter_past_esp+0x84/0xb5
softirqs last enabled at (397628): [<c01051ce>] do_softirq+0xa6/0xf9
softirqs last disabled at (397585): [<c01051ce>] do_softirq+0xa6/0xf9
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0106bf7>] sys_ipc+0x15b/0x263
[<c01029ae>] work_resched+0x5/0x31
=======================
note: X[5879] exited with preempt_count 7
BUG: scheduling while atomic: X/5879/0x10000008
no locks held by X/5879.
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0114ad5>] __cond_resched+0x21/0x3b
[<c02d8c21>] cond_resched+0x2a/0x31
[<c014c877>] unmap_vmas+0x518/0x560
[<c014f5d6>] exit_mmap+0x6f/0x104
[<c0115d41>] mmput+0x3b/0xbe
[<c011ae16>] do_exit+0x142/0x870
[<c01114cb>] do_page_fault+0x0/0x675
[<c011b569>] do_group_exit+0x25/0x6b
[<c01114cb>] do_page_fault+0x0/0x675
[<c01221c5>] get_signal_to_deliver+0x286/0x439
[<c0120e00>] force_sig_info+0x80/0x8a
[<c01114cb>] do_page_fault+0x0/0x675
[<c0101e0e>] do_notify_resume+0x79/0x6bc
[<c015025a>] vma_link+0x97/0xd5
[<c015025a>] vma_link+0x97/0xd5
[<c01...
OK, this fixes the locking here:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, strucspin_lock_init(&new->lock);
new->deleted = 0;
- rcu_read_lock();
spin_lock(&new->lock);
return id;
}
@@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}_
But I'm not very confident in what's happening in there. The patches take
away a _lot_ of the RCU locking, and it's unlear what's protecting the idr
tree. Is it rcu, or is it the mutex? The code seems confused and the
comments are all wrong.
-
Well, reviewing the code I found another place where the
rcu_read_unlock() was missing.
I'm so sorry for the inconvenience. It's true that I should have tested
with CONFIG_PREEMPT=y :-(
Now, the ltp tests pass even with this option set...In attachment you'll find a patch thhat
1) adds the missing rcu_read_unlock()
2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
exactly as it was done in the ref code.Regards,
Nadia
On 18-09-2007 16:55, Nadia Derbey wrote:
BTW, probably I miss something, but I wonder, how this RCU is working
here. E.g. in msg.c do_msgsnd() there is:msq = msg_lock_check(ns, msqid);
...msg_unlock(msq);
schedule();ipc_lock_by_ptr(&msq->q_perm);
Since msq_lock_check() gets msq with ipc_lock_check() under
rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
with rcu_read_unlock(), is it valid to use this with
ipc_lock_by_ptr() yet?Regards,
Jarek P.
-
Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
refcount in the rcu header for the msg structure. This guarantees that
the the structure won't be freed before they relock it. Once the
structure is relocked by ipc_lock_by_ptr(), they do the symmetric
operation i.e. ipc_rcu_putref().Regards,
Nadia
-
Yes, I've found this later too - sorry for bothering. I was mislead
by the code like this:struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
{
struct kern_ipc_perm *out;
int lid = ipcid_to_idx(id);rcu_read_lock();
out = idr_find(&ids->ipcs_idr, lid);
if (out == NULL) {
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}which seems to suggest "out" is an RCU protected pointer, so, I
thought these refcounts were for something else. But, after looking
at how it's used it turns out to be ~90% wrong: probably 9 out of 10
places use refcouning around this, so, these rcu_read_locks() don't
work here at all. So, probably I miss something again, but IMHO,
these rcu_read_locks/unlocks could be removed here or in
ipc_lock_by_ptr() and it should be enough to use them directly, where
really needed, e.g., in msg.c do_msgrcv().BTW, I've found this comment, which, at least for me, explains very
good, what's going on here:/* Lockless receive, part 3:
* Acquire the queue spinlock.
*/
ipc_lock_by_ptr(&msq->q_perm);Thanks,
Jarek P.
-
Actually, ipc_lock() is called most of the time without the
ipc_ids.mutex held and without refcounting (maybe you didn't look for
the msg_lock() sem_lock() and shm_lock() too).I have to check for the ipc_lock_by_ptr(): may be you're right!
Regards,
Nadia-
Yes, you are 100% right and I'm 90% lier, 10% blind (maybe backward
Since this whole locking scheme is really quite a puzzle, and needs
more than one or two looks to figure it out, I'd better try stop to
discredit myself more.Anyway it looks to me like the most sophisticated way of achieving
locklessness I've seen so far. I hope, there is still some gain after
this RCU + refcounting vs. simple locking. (It seems somebody had
similar doubts writing the "Lockless receive, part 1:" comment in
do_msgrcv(); and probably again I'm very wrong, but this checking of
validity of RCU protected structure with an r_msg value, which is
done to avoid refcounting, looks like not very different and has
some cost too).Regards,
Jarek P.
-
So, here is the ipc_lock_by_ptr() status:
1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
call it inside a refcounting.
==> no rcu read section needed.2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
ipc_ids mutex lock.
==> no rcu read section needed.3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
under refcounting
==> rcu read section + some more checks needed once the spnlock is
taken.So I completely agree with you: we might remove the rcu_read_lock() from
the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
is already explicitly called in do_msgrcv()).Regards,
Nadia-
Yes, IMHO, it should be at least more readable when we can see where
this RCU is really needed.But, after 3-rd look, I have a few more doubts (btw., 3 looks are
still not enough for me with this code, so I cerainly can miss many
things here, and, alas, I manged to see util and msg code only):1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
but it's probably wrong: they call idr_find() with ipc_ids pointer
which needs this mutex, just like in similar code in: ipc_findkey(),
ipc_get_maxid() or sysvipc_find_ipc().2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
safe (memory barriers): it's not atomic, so locking is needed, but
e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
freeque() calls ipc_rcu_putref() with ipc_ids mutex only.3. Probably similar problem is possible with msr_d.r_msg which is
read in do_msgrcv() under rcu_read_lock() only.Regards,
Jarek P.
-
Jarek,
I'm realizing I did'nt give you an answer to issues # 1 and 3. Sorry for
I think you're completely right: the rcu_read_lock() is not enough in
this case.
I have to solve this issue, but keeping the original way the ipc
developers have done it: I think they didn't want to take the mutex lock
for every single operation. E.g. sending a message to a given message
queue shouldn't avoid creating new message queues.In think here they have avoided refcoutning by using r_msg:
r_msg is initialzed to -EAGAIN before releasing the msq lock. if
freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
Setting r_msg is always done under the msq lock (expunge_all() /
pipelined_Sned()).
Since rcu_read_lock is called right after schedule, they are sure the
msq pointer is still valid when they re-lock it once a msg is present in
the receive queue.Please tell me if I'm not clear ;-)
Regards,
Nadia-
All clear!
Except... this r_msg is still unclear to me. Since it's read without
msq lock I doubt this first check after schedule() is of any value. A
comment: "Lockless receive, part 2" tells about some safety against a
race with pipeline_send() and expunge_all() when in wake_up_process(),
but how can we be sure this r_msg is not just to be changed and this
wake_up_process() is running while "while" check still sees
ERR_PTR(-EAGAIN) instead of NULL?Thanks & sorry for delay,
Jarek P.
-
OOPS!
At last I've found enough time to look at this more exactly plus the
right comment in sem.c, and it looks like it's all right and really
thought up, with all variants foreseen. So, I withdraw this problem
nr 3 too.Best regards,
Jarek P.
-
OK, but freeque() freeary() and shm_destroy() are special cases:
we have the following path:mutex_lock(ipc_ids.mutex)
...
ipc_lock(ipcp)
... do whatever cleaning is needed ...
ipc_rmid(ipcp)
ipc_unlock(ipcp)
....
ipc_rcu_putref(ipcp)Once the rmid has been done the ipc structure is considered as not
visible anymore from the user side ==> any syscall called with the
corresponding id will return invalid.
The only thing that could happen is that this structure be reused for a
newly allocated ipc structure. But this too cannot happen since we are
under the ipc_ids mutex lock.Am I wrong?
Answers to the other questions in separate e-mails
Regards,
Nadia-
I hope not! But, then it would be probably another logical trick:
ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
if it's used in do_msgsnd() there should be a risk something can do
this kfree at this moment, and it seems freeque() is the only one,
which both: can do this and cares for this refcount. Then, e.g., if
any of them does ipc_rcu_getref() a bit later and sees old (cached)
value - kfree can be skipped forever. On the other hand, if there is
no such possibility because of other reasons, this all refcounting
looks like a code beautifier only...Thanks,
Jarek P.-
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
After rethinking, this scenario seems to be wrong or very unprobable
(I'm not sure of all ways "if (--container...)" could be compiled),
so there should be no such risk - double kfree/vfree is more probable,
so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
pointer acquired by do_msgsend() before freeque() started); then,
after schedule(), do_msgsnd() can work with kfreed msq_queue structure
(at least considering classic RCU).Regards,
Jarek P.
-
If msgsnd() acquires the pointer first, it does it under lock +
rcu_getref(). ==> refcount = 2
When schedule() is called if freeque() takes the pointer it will call
msg_rmid() that sets the deleted field in the msg queue. When the lock
is released by freeque(), we have either 1) or 2):
1) freeque()'s putref called 1st ==> refocunt = 1
Then msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then msgsnd()'s putref is called ==> refcount = 0
But this is done under RCU lock, so should be no problem
Then the deleted field is checked ==> return
2) msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then we don't mind in which order are done the other operations
since we under rcu_lock: the structure won't disappear till we test
the deleted field.Regards,
Nadia-
I see this scenario is even more impossible, so you were right,
it's all right at this point.Jarek P.
-
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
Should be:
"any of them does ipc_rcu_putref() a bit later and sees old (cached)"Sorry,
Jarek P.
-
On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
Impossible!
I've to look at this once more in the evening (and try to recall
what I've seen yesterday and forgotten today...).Cheers,
Jarek P.
-
...
...So I missed it again: after all this RCU protection works before
and after refcounting.Sorry,
Jarek P.
-
Thanks. Could you please check the code comments in ipc/util.c for
accuracy and completeness sometime?
-
Done - see attachment.
Hope I didn't forget / add some wrong stuff ;-)
BTW this patch also fixes a missing lock in shm_get_stat().Regards,
Nadia
Ditto. Thanks!
-
Here's a bug:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}_
but it still doesn't fix it.
Nadia, please review Documentation/SubmitChecklist. It contains stuff
which would have prevented this.-
Andrew,
Actually the rcu_read_lock is released in ipc_unlock(). So I think we
shouldn't add an rcu_read_unlock() before leaving ipc_lock().
This is a part that has not changed since the ref code.I'm looking also on my side.
Regards,
Nadia-
Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
a bit dirty and we might choose to not do that.Would be interested in knowing the locking rules in there. For example,
this:/**
* ipc_findkey - find a key in an ipc identifier set
* @ids: Identifier set
* @key: The key to find
*
* Requires ipc_ids.mutex locked.
* Returns the LOCKED pointer to the ipc structure if found or NULL
* if not.
* If key is found ipc contains its ipc structure
*/appears to be hopelessly out of date?
-
Someone got their locking imbalanced. Seems that I lost the suitable config
settings to catch that.Hang about while I do bisection search #800.
-
| Andrew Morton | -mm merge plans for 2.6.23 |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Gabriel C | Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] |
git: | |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| David Miller | [GIT]: Networking |
| Thomas Jarosch | Re: TCP connection stalls under 2.6.24.7 |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
