Re: ext3: fix ext3_dx_readdir hash collision handling - Regression

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Friday, October 24, 2008 - 9:08 am

On Fri, 24 Oct 2008, Markus Trippelsdorf wrote:

Hmm. I may actually have hit the same problem. I attributed it to a git 
buglet, and left it for later, because the last few days were pretty crazy 
(closing the merge window is when people always send their last-minute 
stuff even if they shouldn't, but to make things worse I was also gone for 
a day-and-a-half).

The "git buglet" looks liek this:

 - build a kernel

 - do "git clean -dqfx". This fails with

	warning: failed to remove 'include/config/'

 - do "git clean -dqfx" again. And now it works - apparently because the 
   first invocation deleted enough of the big directory.

Doing an 'strace' to see what is going on, I see:

	..
	getdents(3, /* 132 entries */, 4096)    = 3888
	lstat("include/config/sgi", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
	open("include/config/sgi", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000) = 4
	fstat(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
	getdents(4, /* 3 entries */, 4096)      = 80
	lstat("include/config/sgi/partition.h", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
	unlink("include/config/sgi/partition.h") = 0
	getdents(4, /* 0 entries */, 4096)      = 0
	close(4)                                = 0
	rmdir("include/config/sgi")             = 0
	lstat("include/config/sgi", 0x7fffdc4d2110) = -1 ENOENT (No such file or directory)
	close(3)                                = 0
	write(2, "warning: failed to remove \'include/config/\'\n", 44) = 44
	..

and notice how it does that

	lstat("include/config/sgi", ..

_twice_. That, in turn (knowing the git implementation), implies that it 
was returned twice from that first "getdents(3, ...)" call.

So what git clean does is to scan over the readdir() return values, see if 
it's a file it knows about, try to remove it if not, lstat() every entry, 
recurse into them if they are directories, and then remove it. If the 
lstat() fails, "git clean" will fail.

So what happens is that it sees that "sgi" entry _twice_ in the readdir 
output, traverse into it once (and delete it), and the second time when it 
does an lstat() it will obviously fail (because now it's deleted!), and 
that will make "git clean" fail the whole thing due to an unexpected 
error.

And I _think_ that what brings this on is:

 - caring about the error value

   This is why "rm -rf" works for you, but "rm -r" does not. With "-f", rm 
   simply won't care. And like "rm -r", "git clean" cares about error 
   values.

 - large enough directories that you have *multiple* "getdents()" calls.

 - likely: removing files in between getdents() calls.

Hmm. Ted? I have not tried to revert that commit that Markus pinpointed 
(6a897cf447a83c9c3fd1b85a1e525c02d6eada7d: "ext3: fix ext3_dx_readdir hash 
collision handling"), but now that I look at that "git bug", I am getting 
pretty damn sure that it's exactly the same issue, and it's not a git bug 
at all.

Markus basically has exactly the same thing:

  rm: cannot remove `linux-2.6.27.2/arch/alpha/include/asm/statfs.h': No such file or directory

and that ENOENT is almost certainly because "rm" already removed that 
filename _once_, and it's the second time that fails.

And yes, that commit is all about returning directory entries twice. It 
claims to fix it, but I bet it breaks it.

		Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: ext3: fix ext3_dx_readdir hash collision handling - Re ..., Linus Torvalds, (Fri Oct 24, 9:08 am)
Re: [PATCH] Re: ext3: fix ext3_dx_readdir hash collision h ..., Markus Trippelsdorf, (Sat Oct 25, 5:25 am)