Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: J. Bruce Fields <bfields@...>
Cc: <linux-kernel@...>
Date: Monday, September 17, 2007 - 7:54 pm

On 17 Sep 2007, J. Bruce Fields stated:

Indeed, with more debugging, all the failures I see come from the call
to exp_find(), which is digging out exports...


Ah! I keep forgetting that mountd isn't just used at mount time: damned
misleading names, grumble.

Restarting mountd clears the problem up temporarily, so you are
definitely right.


Aha! I'm on 3b55934b9baefecee17aefc3ea139e261a4b03b8, over a month older.


Debian Sid recently upgraded to glibc 2.6.x, as did I... earlier
versions of glibc will have had this behaviour too, but it may have been
less frequent.


It is expected, judging from my reading of the
code. stdio-common/vfprintf.c emits single chars using the outchar()
macro, and strings using the outstring() macro, using functions in libio
to do the job. The string output routine then calls _IO_file_xsputn(),
which, tracing through libio's jump tables and symbol aliases, ends up
calling _IO_new_file_xsputn() in libio/fileops.c. (I've only just
started to understand libio. It's basically undocumented as far as I can
tell, but it's deeply nifty. Think of stdio, only made entirely out of
hookable components. :) )

(Actual writing then thunks down through _IO_new_do_write() and
new_do_write() in the same file, which finally calls __write().  If
there's any kind of error this returns EOF after some opaque messing
about with a _cur_column value, which is as far as I can tell never
used!)

The code which calls new_do_write() looks like this:

,----[ libio/fileops.c:_IO_new_file_xsputn() ]
|  if (do_write)
|    {
|      count = new_do_write (f, s, do_write);
|      to_do -= count;
|      if (count < do_write)
|        return n - to_do;
|    }
`----

This code handles partial writes followed by errors by returning a
suitable nonzero value, and immediate errors by returning -1.

In either case the buffer will have been filled as much as possible by
that point, and will still be filled when (vf)printf() is next called.


This behaviour is, IIRC, mandated by the C Standard: I can find no
reference in the Standard to streams being flushed on error, only
on fclose(), fflush(), or program termination.


I'm upgrading now: thank you!

-- 
`Some people don't think performance issues are "real bugs", and I think 
such people shouldn't be allowed to program.' --- Linus Torvalds
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots ..., Nix, (Mon Sep 17, 7:54 pm)