Re: copy_from_user again()

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andi Kleen <andi@...>
Cc: <linux-kernel@...>
Date: Tuesday, June 17, 2008 - 7:41 pm

On Wed, 18 Jun 2008, Andi Kleen wrote:

Well, it doesn't become any less accurate in reporting, quite the reverse: 
now it *is* accurate in reporting what it actually *did*. Before, it would 
report something that simply wasn't _true_. Being closer to what you might 
_wish_ would happen doesn't make it accurate, if it doesn'y match reality!

However, you're right that the routine has always had (and I didn't change 
that) a 32-byte granularity in what it does and what it doesn't do in the 
unrolled part. 

Before, it _claimed_ to be more precise than it actually was. It claimed 
to be able to detect non-existing pages at a 8-byte granular level, but it 
never actually did the copy at that level, so the claim wasn't backed up 
by what it did. 

Which was why the users were unhappy: the write() system call _thought_ it 
had copied 16 bytes more than it actually had, resulting in a 16-byte hole 
(which happened to be zero-filled, but could actually in theory be leaking 
information) in the page cache. 

Which gets us back to the original bug report by Bron Gondwana: the end 
result was a corrupt page cache (and on-disk, for that matter), because 
__copy_user() claimed to have copied more than it actually did.

And yes, I do agree that it would be nicer to have byte-granular reporting 
and copying - and I even made that very clear in the email that contained 
the patch. But you must never report finer granularity than you actually 
copy, becuase at that point it's not "more accurate" - it's just plain 
WRONG.

So what I would actually prefer (as I outlined earlier) would be that the 
failure case woudl then fall back on trying to do a byte-by-byte slow copy 
until it hits the exact byte that fails.

However, that is a much bigger patch, and it's actually not something 
we've ever done. Even the original old x86-32 routines would fail at that, 
they just did it with a 4-byte granularity (so max 3 bytes of uncopied 
data close to the end of the page, rather than at a 32-byte (max 31 bytes 
of uncopied data close to the end of the page).

And as mentioned, the cases that really care (like mm/filemap.c) actually 
know to always try to do at least _one_ whole iovec entry. So they are ok 
with the lack of accuracy, because if the atomic version fails, they'll 
fall back to doing something slow and careful.

		Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
copy_from_user again(), Andi Kleen, (Tue Jun 17, 6:19 pm)
Re: copy_from_user again(), Linus Torvalds, (Tue Jun 17, 7:41 pm)