Re: [PATCH 2/2] module: fix bne2 "gave up waiting for init of module libcrc32c"

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Tuesday, June 1, 2010 - 7:58 am

On Tue, 1 Jun 2010, Rusty Russell wrote:

Umm. Did you read the first few emails I sent out on this thread 
originally? They were actually _politer_ than your suggestion above (no 
crap mentioned), and did exactly what you ask for. Let me quote the one 
where I suggested tightening the lock, for example (along with saying that 
I don't want to see pointless semantic changes):

   I'm not re-applying it with the pointless semantic changes that are 
   visible to modules. It doesn't matter if they were informed, if it means 
   that they'll then just have some odd version dependency and add crazy 
   "#if LINUX_VERSION" tests that aren't even exact.
   
   I also wonder exactly what that module_mutex() actually ends up 
   protecting. 99% of load_module() seems to be stuff that is purely about 
   local issues. Maybe we could tighten the actual lock section to just the
   parts that actually expose the vmalloc'ed area to others?
   
   It's generally pointless releasing a lock in the middle - that just makes 
   the lock even less valid. If it's valid to just release the lock (without 
   some retry logic starting everything from the beginning), it likely the 
   lock shouldn't have been held there in the first place.

See? How bad was that? 


See above. That wasn't what I said. That "you're wrong" was what I got to 
when I got frustrated with you for complaining about me actually trying to 
fix it.

And "crap locking" only came up in the long explanation to Andrew about 
what the original problem was, and why the EBUSY error wasn't a problem. 
And we all know the module locking was/is crap. You must have known it 
too, since you apparently had a "fix up locking" patch from before.

So what was so bad about calling the locking crap? Really?

So yes, we both get frustrated here. But read the thread again, Rusty, and 
look who it is that actually starts complaining about other peoples code. 
I called the locking crap, you're the one who called something "wrong, 
complex and fundamentally broken".

Kettle. Pot. Black.


I agree with "less ambitious". But I'd also call it "papering over bad 
code", or "making bad locking much worse" or "likely introduce new and 
even more subtle problems".

In fact, I might even go so far as using "crap".

Because it really is fundamentally wrong to drop a lock in the middle of 
an operation. It may _work_, but it certainly doesn't make the code 
better.

The thing is, when we have a problem in some code, we should try to make 
sure that the fixed code is _better_ than the old code. Not just fix the 
symptoms. Fix the underlying _reason_ for why the bug happened in the 
first place.

And the underlying reason the bug happened is (I think we both agree) 
simply that the locking was broken. Your patch fixed the symptoms, yes, 
but it did so by making the locking _worse_. That's why I hated it. I 
really don't think it was a good approach. 

And I do think you agree in the end, and know that's true.


I agree with 1-2, and even had a comment about #2 pointing that out, and 
thinking that it would be good for other reasons (ie /proc/modules).

And as to #4, the "wait for everything after the fact" is actually the 
faster approach, although in practice it probably doesn't matter (you'd 
hope that modules depending on other modules that are still busy loading 
is such a rare case that it does _not_ matter whether you wait for them to 
load serially or batched up).

And as to #5, I would not actually suggest that we do "wait later _or_ fix 
the locking", I'd do both. It really isn't a either-or situation, Rusty.

So I don't think those ones matter.

HOWEVER. 

But as to #3, I think you bring up an interesting issue:


I agree. And you're right, we'd have to move even more code from the 
MODULE_UNLOAD case into !MODULE_UNLOAD to fix it. At which point I do 
agree that if we want to keep !MODULE_UNLOD (and I do think we do), my 
approach really doesn't work. Or we'd basically make !MODULE_UNLOAD a 
pointless exercise where unloading is not allowed, but we still do 
everything else the same way.

So I do agree that your later two-patch series is the way to go.

I would like to note that your original "fix things by dropping the lock" 
patch that I hated so much had this exact bug too. Making this a good 
example of _why_ it's basically always wrong to drop locks in the middle - 
even if you claimed you knew and understood the locking.

So I do hope we can agree to call the module locking "total and utter 
crap", ok? 

And it really wasn't a personal complaint about your prowess. Crap locking 
(or code in general) is crap, but calling the code crap shouldn't be 
something you take personally. Especially not when there are various valid 
historical reasons _why_ it is all crap. 

And yes, my code was crap too. I wrote it for the MODULE_UNLOAD case, and 
only added a comment saying the !unload case was broken, but didn't look 
at just _how_ broken it was. My bad.

			Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[Regression] Crash in load_module() while freeing args, Rafael J. Wysocki, (Tue May 25, 2:00 pm)
Re: [Regression] Crash in load_module() while freeing args, Rafael J. Wysocki, (Tue May 25, 3:54 pm)
Re: [Regression] Crash in load_module() while freeing args, Linus Torvalds, (Tue May 25, 4:47 pm)
Re: [Regression] Crash in load_module() while freeing args, Linus Torvalds, (Wed May 26, 8:41 am)
Re: [Regression] Crash in load_module() while freeing args, Rafael J. Wysocki, (Wed May 26, 3:56 pm)
Re: [Regression] Crash in load_module() while freeing args, Linus Torvalds, (Wed May 26, 4:07 pm)
Re: [Regression] Crash in load_module() while freeing args, Rusty Russell, (Wed May 26, 10:26 pm)
Re: [Regression] Crash in load_module() while freeing args, Brandon Philips, (Thu May 27, 11:46 am)
Re: [Regression] Crash in load_module() while freeing args, Rafael J. Wysocki, (Thu May 27, 2:57 pm)
Re: [Regression] Crash in load_module() while freeing args, Rusty Russell, (Mon May 31, 12:54 am)
[PATCH 0/2] kernel/module.c locking changes, Rusty Russell, (Mon May 31, 5:00 am)
[PATCH 1/2] module: make locking more fine-grained., Rusty Russell, (Mon May 31, 5:01 am)
[PATCH 1/2] Make the module 'usage' lists be two-way, Linus Torvalds, (Mon May 31, 1:16 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Rusty Russell, (Mon May 31, 6:37 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Américo Wang, (Mon May 31, 7:44 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Rusty Russell, (Mon May 31, 8:42 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Linus Torvalds, (Mon May 31, 8:51 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Linus Torvalds, (Mon May 31, 9:00 pm)
Re: [PATCH 1/2] Make the module 'usage' lists be two-way, Linus Torvalds, (Mon May 31, 9:05 pm)
Re: [PATCH 1/2] module: make locking more fine-grained., Américo Wang, (Mon May 31, 10:38 pm)
Re: [PATCH 1/2] module: make locking more fine-grained., Rusty Russell, (Mon May 31, 10:55 pm)
Re: [PATCH 2/2] module: fix bne2 "gave up waiting for init ..., Linus Torvalds, (Tue Jun 1, 7:58 am)