Hi,
On Thu, 2010-02-18 at 16:04 -0500, David Teigland wrote:
No, I didn't hit it. I'm not sure how to reproduce whatever situation
led to this in the first place.
There was a clue though in the patch prior to the one you pointed out in
the git tree, the comment in this patch doesn't make a lot of sense
until without the context from that patch. I noticed that where the
sysfs function does this:
it isn't primarily a ref count operation. Yes, it does get a ref count
on the object if it is successful, but the main purpose is testing to
see if the shutdown process has started (i.e. is the lockspace still on
the ls_list). If the list removal used a list_del_init rather than a
list del, the dlm_find_lockspace_local() call could be replaced with:
spin_lock(&lslist_lock);
ret = list_empty(&ls->ls_list);
if (!ret)
ls->ls_count++;
spin_unlock(&lslist_lock);
if (ret)
return -EINVAL;
which might be a bit less confusing, and also saves traversing the list
of lockspaces. This is basically a "hold" operation, rather than a
find/get type operation.
My confusion has arisen from the fact that there are three ref counters
for the lockspace object. One is ls_count, one is ls_create_count and
the other the is kobject ref count.
ls_create_count seems to deal with user references, ls_count seems to be
used for internal references and the kobject ref count only seems to be
incremented/decremented on initial object creation/removal.
Probably the correct long term solution is to at least merge the
ls_count into kobject ref count system, and maybe the ls_create_count
too. I'll have to do some more investigation before I can see whether
there are any reasons why that isn't possible.
Either way, we are getting away from what was originally a small and
simple patch, so I'll suggest to ignore this one for now, and just apply
the first one of the two which I sent. I'll have another look at this in
the mean time,
Steve.
--