Re: BUG in VFS or block layer

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Andrew Morton
Date: Wednesday, August 6, 2008 - 2:28 pm

On Wed, 6 Aug 2008 16:40:02 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:


What the VFS will do is

- lock the page

- put the page into a BIO and send it down to the block layer

- later, wait for IO completion.  It does this by running
  lock_page[_killable](), which will waiting for the page to come unlocked.

  The page comes unlocked via the device driver, usually within the
  IO completion interrupt.


A common cause of userspace lockups during IO errors is that the driver
layer screwed up and didn't run the completion callback.

Now, according to the above trace, the above code sequence _did_ work
OK.  Or at least, it ran to completion.  It was later, when we tried to
truncate a file that we stumbled across a permanently-locked page.

So it would appear that the VFS read() code successfully completed, but
left locked pages behind it, which caused the truncate to hang.


Aside: why does this code in do_generic_file_read() return -EIO when it
got a signal?


page_not_up_to_date:
                /* Get exclusive access to the page ... */
                if (lock_page_killable(page))
                        goto readpage_eio;



One possible problem is here:

readpage:
		/* Start the actual read. The read will unlock the page. */
		error = mapping->a_ops->readpage(filp, page);

		if (unlikely(error)) {
			if (error == AOP_TRUNCATED_PAGE) {
				page_cache_release(page);
				goto find_page;
			}
			goto readpage_error;
		}

the VFS layer assumes that if ->readpage() returned a synchronous error
then the page was already unlocked within ->readpage().  Usually this
means that the driver layer had to run the BIO completion callback to
do that unlocking.  It is possible that the USB code forgot to do this.
This would explain what you're seeing.

So...  would you be able to verify that the USB, layer is correctly
calling bio->bi_end_io() for the offending requests?


Aside2: why does this code:

readpage:
		/* Start the actual read. The read will unlock the page. */
		error = mapping->a_ops->readpage(filp, page);

		if (unlikely(error)) {
			if (error == AOP_TRUNCATED_PAGE) {
				page_cache_release(page);
				goto find_page;
			}
			goto readpage_error;
		}

		if (!PageUptodate(page)) {
			if (lock_page_killable(page))
				goto readpage_eio;

return EIO if lock_page_killable() saw a signal?
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
BUG in VFS or block layer, Alan Stern, (Wed Aug 6, 1:40 pm)
Re: BUG in VFS or block layer, Andrew Morton, (Wed Aug 6, 2:28 pm)
Re: BUG in VFS or block layer, Alan Stern, (Wed Aug 6, 3:40 pm)
Re: BUG in VFS or block layer, Andrew Morton, (Wed Aug 6, 3:55 pm)
Re: BUG in VFS or block layer, Nick Piggin, (Wed Aug 6, 7:12 pm)
Re: BUG in VFS or block layer, Alan Stern, (Thu Aug 7, 7:17 am)