Re: [PATCH] fs: Correct SuS compliance for open of large file without options

Previous thread: [PATCH] sysfs: backport of sysfs_readdir fix from 2.6.22.y to 2.6.16.y (CVE-2007-3104) by Miloslav Semler on Thursday, September 27, 2007 - 5:25 am. (2 messages)

Next thread: [PATCH] JBD2: debug code cleanup. by Jose R. Santos on Thursday, September 27, 2007 - 6:39 am. (1 message)
From: Alan Cox
Date: Thursday, September 27, 2007 - 6:29 am

The early LFS work that Linux uses favours EFBIG in various places. SuSv3
specifically uses EOVERFLOW for this as noted by Michael (Bug 7253)

--

[EOVERFLOW]
    The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t. We should therefore
transition to the proper error return code

Signed-off-by: Alan Cox <alan@redhat.com>

diff -u --new-file --exclude-from /usr/src/exclude --recursive linux.vanilla-2.6.23rc8-mm1/fs/gfs2/ops_file.c linux-2.6.23rc8-mm1/fs/gfs2/ops_file.c
--- linux.vanilla-2.6.23rc8-mm1/fs/gfs2/ops_file.c	2007-09-26 16:46:54.000000000 +0100
+++ linux-2.6.23rc8-mm1/fs/gfs2/ops_file.c	2007-09-27 13:45:48.000000000 +0100
@@ -406,7 +406,7 @@
 
 		if (!(file->f_flags & O_LARGEFILE) &&
 		    ip->i_di.di_size > MAX_NON_LFS) {
-			error = -EFBIG;
+			error = -EOVERFLOW;
 			goto fail_gunlock;
 		}
 
diff -u --new-file --exclude-from /usr/src/exclude --recursive linux.vanilla-2.6.23rc8-mm1/fs/ntfs/file.c linux-2.6.23rc8-mm1/fs/ntfs/file.c
--- linux.vanilla-2.6.23rc8-mm1/fs/ntfs/file.c	2007-09-26 16:46:55.000000000 +0100
+++ linux-2.6.23rc8-mm1/fs/ntfs/file.c	2007-09-27 13:47:35.000000000 +0100
@@ -61,7 +61,7 @@
 {
 	if (sizeof(unsigned long) < 8) {
 		if (i_size_read(vi) > MAX_LFS_FILESIZE)
-			return -EFBIG;
+			return -EOVERFLOW;
 	}
 	return generic_file_open(vi, filp);
 }
diff -u --new-file --exclude-from /usr/src/exclude --recursive linux.vanilla-2.6.23rc8-mm1/fs/open.c linux-2.6.23rc8-mm1/fs/open.c
--- linux.vanilla-2.6.23rc8-mm1/fs/open.c	2007-09-26 16:46:55.000000000 +0100
+++ linux-2.6.23rc8-mm1/fs/open.c	2007-09-27 13:45:10.000000000 +0100
@@ -1210,7 +1210,7 @@
 int generic_file_open(struct inode * inode, struct file * filp)
 {
 	if (!(filp->f_flags & O_LARGEFILE) && i_size_read(inode) > MAX_NON_LFS)
-		return -EFBIG;
+		return -EOVERFLOW;
 	return 0;
 }
 

-

From: Arjan van de Ven
Date: Thursday, September 27, 2007 - 7:01 am

On Thu, 27 Sep 2007 14:29:19 +0100


isn't this an ABI change?
What's the gain for doing this ABI change?
-

From: Alan Cox
Date: Thursday, September 27, 2007 - 7:19 am

On Thu, 27 Sep 2007 07:01:18 -0700

Its a change of a specific error return from the wrong error to the right
one, nothing more. Fixing the returned error gives us correct behaviour
according to the standards and other systems.

Alan
-

From: Jens Axboe
Date: Thursday, September 27, 2007 - 7:35 am

It may still break applications. Waving some standard at them if they
complain is unlikely to impress them.

-- 
Jens Axboe

-

From: Alan Cox
Date: Thursday, September 27, 2007 - 7:44 am

And our existing behaviour may well break correctly written
portable applications, and is incorrect as well.

Testing so far says it doesn't break anything, which is no suprise if you
apply about ten braincells to the case under consideration.

Alan
-

From: Jens Axboe
Date: Thursday, September 27, 2007 - 8:08 am

It's been that way for ages, how likely do you think that is? Not very,

Well it's not my call, just seems like a really bad idea to change the
error value. You can't claim full coverage for such testing anyway, it's
one of those things that people will complain about two releases later
saying it broke app foo.

-- 
Jens Axboe

-

From: Alan Cox
Date: Thursday, September 27, 2007 - 8:19 am

Strange since we've spent years changing error values and getting them
right in the past. 

There are real things to worry about - sysfs, sysfs, sysfs, ... and all
the other crap which is continually breaking stuff, not spec compliance
corrections that don't break things but move us into compliance with the
standard

Alan
-

From: Theodore Tso
Date: Thursday, September 27, 2007 - 8:59 am

I doubt there any apps which are going to specifically check for EFBIG
and do soemthing different if they get EOVERFLOW instead.  If it was
something like EAGAIN or EPERM, I'd be more concerned, but EFBIG

I've got to agree with Alan, the sysfs/udev breakages that we've done
are far more significant, and the fact that we continue to expose
internal data structures via sysfs is a gaping open pit is far more
likely to cause any kind of problems than changing an error return.

       	  	    	    	     	  - Ted
-

From: Andrew Morton
Date: Thursday, September 27, 2007 - 10:23 am

Yeah.  There's no correct answer here (apart from "get it right the first

Funny you should mention that.  I was staring in astonishment at the
pending sysfs patch pile last night.  Forty syfs patches and twenty-odd
patches against driver core and the kobject layer.

That's a huge amount of churn for a core piece of kernel infrastructure
which has been there for four or five years.  Not a good sign.  I mean,
it's not as if, say, the CPU scheduler guys keep on rewriting all their
junk.

oh, wait..
-

From: Greg KH
Date: Thursday, September 27, 2007 - 10:59 am

I'm sorry, have I missed a breakage lately?  I don't know of one in over

Come on now, I'm _very_ tired of this kind of discussion.  Please go
read the documentation on how to _use_ sysfs from userspace in such a
way that you can properly access these data structures so that no
breakage occurs.

And if you want to propose some other kind of alternative to exporting
this kind of _needed_ information to userspace, in a simple and
easy-to-use manner, please do so.  Until then, stop complaining

And _none_ of them change any userspace interaction.  Well, ok, the
block one can, if the CONFIG_SYSFS_DEPRECATED is disabled, but that's

The sysfs changes are almost all due to the need for the
containers/vserver/whatever people to be able to see different views of
sysfs depending on the user/container.  That is a radical change that
was never designed for in the beginning.  The other changes that Tejun
has made have actually cleaned up the code and made it simpler to use
and more robust and fixed bugs.

Same thing goes for the driver core changes.  We are cleaning things up,
fixing bugs that have been found when the driver core has been used in
ways that we never originally anticipated.  We are also trying to make
it easier to use, and simpler overall, as the original design was quite
half-hazard at best in numerous places (kset/subsystem/ktype anyone?)

So these aren't being done just because we like to break things, we are
trying to make things better, and fix real bugs here.

thanks,

greg k-h
-

From: Theodore Tso
Date: Thursday, September 27, 2007 - 11:37 am

I've read it; the question is whether every single application
programmer or system shell script programmer who writes code my system
depends upon has read it this document buried in the kernel sources,
or whether things will break spectacularly --- one of those things
that leaves me in suspense each time I update the kernel.

I'm reminded of Rusty's 2003 OLS Keynote, where he points out that
what's important is not making an interface easy to use, but _hard_
_to_ _misuse_.  That fact that sysfs is all laid out in a directory,
but for which some directories/symlinks are OK to use, and some are
NOT OK to use --- is why I call the sysfs interface "an open pit".
Sure, if you have the map to the minefield, a minefield is perfectly
safe when you know what to avoid.  But is that the best way to
construct a path/interface for an application programmer to get from
point A to point B?  Maybe, maybe not.

					- Ted
-

From: Matthew Wilcox
Date: Thursday, September 27, 2007 - 11:45 am

I agree with your criticism of sysfs (indeed, I think I've made many of
them before in somewhat stronger language), but do you have a suggestion
as to an interface that would be harder to misuse?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Greg KH
Date: Thursday, September 27, 2007 - 2:34 pm

Ok, how then should I advertise this better?  What can we do better to

Me and Pat Mochel sat in that talk and instantly had an "oh shit" moment
when it came to the in-kernel usage of sysfs and the driver model.  Ever
since then, I have been working to change the code to make it better.
With the exception of the recent help from Kay, I am the only one doing

We (well, Kay mostly) have also been working on fixing this all up to
make it much harder to use sysfs incorrectly.  We will have a single
device tree (well, almost a single tree, it's getting there), so that
all of the information is only in one place, and you don't have to go
searching all over the place for it.  That is a direct improvement over
the old design where somethings were in one place, and others in
another.

And because of the original design mistakes, we have only been able to
change things for the better in a slow manner.  We have had userspace
programs fixed up for _years_ before we are able to make the
corresponding changes in the kernel, so as to not break the distros that
are slow to upgrade packages and kernels (like Debian.)

If I had my druthers, we could instantly put some patches into the tree
to fix up the sysfs "mess" once and for all, creating a unified, single
tree, with only a handful of needed symlinks to be able to categorize
certain things.  We have the patches (Kay wrote them over a year ago),
and userspace programs work just fine with them (udev and HAL), but
because we need to support 5 year old userspace programs running
tomorrows kernel, we must take very tiny, slow steps to get there.

And yes, sysfs has slowly changed over the years, and along the way we
have kept things working, with only very minor problems.  You have no
idea the crazy mismatch of kernels and userspace programs we have had to
deal with.  And it will continue to change, slowly, until we reach the
unified-tree goal, and all of those old crufty userspace programs are
dead and buried (I got a bug report about RHEL's ...
From: Kyle Moffett
Date: Thursday, September 27, 2007 - 3:27 pm

Hey!  No poking fingers at Debian here; it's been *MUCH* improved  
lately.  I far more frequently have problems with boxes still running  
some ancient release of RHEL-4 or something than I do with those  
running Debian stable (virtually always the latest Debian stable).

Cheers,
Kyle Moffett

-

From: Greg KH
Date: Thursday, September 27, 2007 - 4:11 pm

Heh, sorry, but Debian in the past had a lot of problems in this area.
It's good to know that this is no longer a issue :)

thanks,

greg k-h
-

From: Theodore Tso
Date: Thursday, September 27, 2007 - 4:19 pm

Would you accept a patch which causes the deprecated sysfs
files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
defined, via a boot-time parameter?  Many people and distros are
likely to keep CONFIG_SYS_DEPRECATED defined just our of paranoia that
things might break.  Doing a quick google, I note that Fedora has been
going back and forth of turning it off, watching things break, and
then turning it back on.  The latest time, the changelog said:

* Fri Jan 26 23:00:00 2007 Bill Nottingham <notting{%}redhat{*}com>

    - turn on CONFIG_SYSFS_DEPRECATED so that things actually work. *sigh*

(and I've checked, Fedora's CVS still has CONFIG_SYSFS_DEPRECATED
defined; it's not just Debian at fault here.)

So having a boot-time parameter would make it much easier for
application programmers (who run distro kernels and who are unlikely
to want to compile their own custom kernel) to test to see what breaks
without CONFIG_SYS_DEPRECATED.

	   	     	     	      	- Ted
-

From: Matthew Wilcox
Date: Thursday, September 27, 2007 - 4:28 pm

How about a mount option?  That way people can test without a reboot:

mount -o remount,deprecated={yes,no} /sys

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-

From: Theodore Tso
Date: Thursday, September 27, 2007 - 7:21 pm

It would be nice if that would be easy to make work, but the problem
is that remounting /sysfs doesn't change the entries in the sysfs tree
that have already been made in the tree.  We could do something such
as creating an sysfs_create_link_deprecated() call which created a
kobject with a new flag indicating it's deprecated, so it could be
filtered out dynamically when /sys is remounted, or when some file
such as /sys/kernel/deprecated_sysfs_files has "0" or "1" written to
it.

The question is whether it's worth it, since we'd have to bloat the
kobject structure by 4 bytes (it currently doesn't have a flags field
from which we could borrow a bit), or whether it's OK just to make the
user reboot.  (I do agree it would be nicer if the user didn't have to
reboot, but most of the time they will need to test the initrd and
init scripts anyway.

							- Ted
-

From: Greg KH
Date: Thursday, September 27, 2007 - 8:22 pm

Unfortunatly, due to the way sysfs and kobjects are built up, this is
pretty impossible to do.

sorry,

greg k-h
-

From: Greg KH
Date: Thursday, September 27, 2007 - 8:21 pm

As discussed in the kernel summit talk about this very topic, Kay is

That's odd, SuSE and Gentoo have been working for quite some time just
fine with that option disabled :)

thanks,

greg k-h
-

From: Jens Axboe
Date: Thursday, September 27, 2007 - 4:41 pm

It's not checking EFBIG vs EOVERFLOW, it's checking one and not the
other. But I digress, not trying to NAK the patch, just voicing my
opinion on the matter. It's not something you can easily test and claim
good app coverage, though.

-- 
Jens Axboe

-

Previous thread: [PATCH] sysfs: backport of sysfs_readdir fix from 2.6.22.y to 2.6.16.y (CVE-2007-3104) by Miloslav Semler on Thursday, September 27, 2007 - 5:25 am. (2 messages)

Next thread: [PATCH] JBD2: debug code cleanup. by Jose R. Santos on Thursday, September 27, 2007 - 6:39 am. (1 message)