Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Andi Kleen
Date: Monday, December 3, 2007 - 3:27 am

> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.

What should it do when the NFS server doesn't answer anymore or 
when the network to the SAN RAID array located a few hundred KM away develops 
some hickup?  Or just the SCSI driver decides to do lengthy error 
recovery  -- you could argue that is broken if it takes longer 
than 2 minutes, but in practice these things are hard to test
and to fix.


The way to find that would be to use source auditing, not break
perfectly fine error handling paths. Especially since this at least
traditionally hasn't been considered a bug, but a fundamental design
parameter of network/block/etc. file systems 


Actually that's not true. You can umount -f and then kill for at least
NFS and CIFS. Not sure it is true for the other network file systems
though.

You could in theory do TASK_KILLABLE for all block devices too (not 
a bad thing; I would love to have it). 

But it would be equivalent in work (has to patch all the same places with 
similar code) to Suparna's big old fs AIO retry
patchkit that never went forward because everyone was too worried
about excessive code impact. Maybe that has changed, maybe not ... 

And even then you would need to check all possible error handling
paths (scsi_error and low level drivers at least) that they all 
finish in less than two minutes.


Sorry that's total bogus. Throwing a stack trace is the kernel
equivalent of sending S.O.S. and worrying the user significantly,
taxing reporting resources etc.  and in the interest of saving
everybody trouble it should only do that when it is really
sure it is truly broken. 


You are wrong on that.

-Andi

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [feature] automatically detect hung TASK_UNINTERRUPTIB ..., Radoslaw Szkodzinski, (Mon Dec 3, 3:15 am)
Re: [feature] automatically detect hung TASK_UNINTERRUPTIB ..., Andi Kleen, (Mon Dec 3, 3:27 am)
Re: [feature] automatically detect hung TASK_UNINTERRUPTIB ..., Rafael J. Wysocki, (Mon Dec 3, 11:28 am)