Re: 'setup_work_tree()' considered harmful

Previous thread: [PATCH v3/RFC] Remove the use of '--' in merge program invocation by Patrick Higgins on Monday, June 16, 2008 - 4:33 pm. (3 messages)

Next thread: [PATCH] gitweb: fix support for repository directories with spaces by Lea Wiemann on Monday, June 16, 2008 - 6:09 pm. (9 messages)
From: Linus Torvalds
Date: Monday, June 16, 2008 - 5:45 pm

[ Dscho cc'd because I think he is the primary culprits for this thing, I 
  think. Commit e90fdc39b6903502192b2dd11e5503cea721a1ad in particular, 
  methinks. ]

So because I was looking at system call traces for object creation (due to 
those "rename to the final resting place" patches I did that got merged 
recently), I've noticed that some commands have *much* worse system call 
patterns than others.

In particular, doing a "git add ." will use absolute pathnames for all git 
files, while a "git diff" will not. And this is quite noticeable - the 
absolute pathnames are not just longer, they have more path components in 
them. Making them a lot slower to look up and use.

Of course, "a lot" depends on things a bit, but it really is noticeable.

To test, I created a kernel tree (no .git), and did a "git init" followed 
by a "git add .". Here's the timings for current git head:

	[torvalds@woody kernel]$ time git add .

	real    0m8.377s
	user    0m6.556s
	sys     0m1.656s

After that, I fixed "write_loose_object()" to not unnecessarily try to 
open a git object file, because every single caller has already done a 
"has_sha1_file(sha1)" or "has_loose_object(sha1)" check before calling 
that function, so trying to open it again is just pointless.

As a result, git add sped up a tiny bit:

	[torvalds@woody kernel]$ time ~/git/git-add .

	real    0m8.341s
	user    0m6.588s
	sys     0m1.548s

but I'll admit that it's not exactly noticeable. Half a percentage point 
is not a big deal.

Then I created a hack that just made "setup_work_tree()" a no-op (because 
it does all that crazy stuff that forces GIT_DIR to be an absolute path 
etc). As a result I got:

	[torvalds@woody kernel]$ time ~/git/git-add .

	real    0m7.849s
	user    0m6.420s
	sys     0m1.296s

ie now we're talking about a 5%+ performance difference.

Of course, this is all for the hot-cache case, and it wouldn't be 
noticeable for a cold-cache case, but it really can be a real performance ...
From: Linus Torvalds
Date: Monday, June 16, 2008 - 5:52 pm

In case anybody cares, the obvious diff for this part is appended. The 
FIXME! was wrong anyway - any collision detection needs to be done by the 
callers when they do the "do we already have this object". Doing it in 
write_loose_object() is too late (not to mention the wrong place to try to 
do so anyway - not all callers generate new objects: some callers just 
turn a packed object into a loose one).

			Linus

---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 16 Jun 2008 17:17:10 -0700
Subject: [PATCH] write_loose_object: don't bother trying to read an old object

Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.

If something races with us on object creation, we handle that in the
final link() that moves it to the right place.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 sha1_file.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index f9ec069..4255099 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2160,20 +2160,6 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	static char tmpfile[PATH_MAX];
 
 	filename = sha1_file_name(sha1);
-	fd = open(filename, O_RDONLY);
-	if (fd >= 0) {
-		/*
-		 * FIXME!!! We might do collision checking here, but we'd
-		 * need to uncompress the old file and check it. Later.
-		 */
-		close(fd);
-		return 0;
-	}
-
-	if (errno != ENOENT) {
-		return error("sha1 file %s: %s\n", filename, strerror(errno));
-	}
-
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
 	if (fd < 0) {
 		if (errno == EPERM)
-- 
1.5.6.rc3.7.g336d0.dirty

--

From: Johannes Schindelin
Date: Tuesday, June 17, 2008 - 4:01 am

Hi,


Yes, I am not happy with work-tree.  In fact, I grew to positively hate 
it.  IMHO it was not well designed when it entered Git, and unfortunately 

Yes, that is true.  However, I think we need it for the case where 
work_tree is set (technically, we could try to be clever when work_tree is 
set in such a manner that we can operate with relative paths, but I think 
that is just not worth it).

So I am thinking about reverting to the old behavior, but _just_ for the 
common case that no work tree was set.

This might be more tricky than it used to be, because of the many special 
cases work trees require.

I briefly considered working on this now, but I simply do not have the 
time, and I seem to be unconcentrated these days.  Probably the best thing 
would be to scrap the whole work-tree thing and throw it out, admitting 
that it was a mistake to begin with.

Ciao,
Dscho

--

From: Mike Hommey
Date: Wednesday, June 18, 2008 - 2:05 am

Maybe using openat, fstatat, etc. when they are available, could be a
good thing, already, though it wouldn't help for other platforms.

Mike
--

From: Linus Torvalds
Date: Wednesday, June 18, 2008 - 9:26 am

I agree that openat() and friends would be nice for this (use a file 
descriptor instead of a string to point to object directories etc), but 
it's unportable enough that the pain of having to have two totally 
different access routines is just not worth it. It could be done, but it 
would need some really nifty abstraction layer.

We do already have _part_ of that abstraction layer with things like 
"git_path()" etc, but it would have to be extended upon quite a bit. 

		Linus
--

Previous thread: [PATCH v3/RFC] Remove the use of '--' in merge program invocation by Patrick Higgins on Monday, June 16, 2008 - 4:33 pm. (3 messages)

Next thread: [PATCH] gitweb: fix support for repository directories with spaces by Lea Wiemann on Monday, June 16, 2008 - 6:09 pm. (9 messages)