Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths

Previous thread: [PATCH 1/2] Add feature release instructions to MaintNotes addendum by rocketraman on Monday, March 30, 2009 - 1:35 am. (2 messages)

Next thread: Re: [PATCH] git-svn: don't output git commits in quiet mode by Junio C Hamano on Monday, March 30, 2009 - 2:44 am. (3 messages)
To: Eric Wong <normalperson@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>
Date: Monday, March 30, 2009 - 2:44 am

Actually, that was what I was trying to demonstrate to be false. Notice
the empty output from the first ls-tree with only -- and no other pathspec
on the command line. "--" should not match "--dashed/*" anything (but

I think that is an independent bug. Not just "--" but it appears "--d"
seems to hit it (and this is an ancient bug---even v1.0.0 seems to have

;-)

I suspect that ls-tree needs a fix, not about "--" but about the pathspec
filtering. It appears that the part that decides if a subtree is worth
traversing into uses the correct "is a pathspec pattern match leading path
components?" semantics (i.e. "--dashed" matches but "--" doesn't), but
after traversing into subtrees, the part that emits the output uses a
broken semantics "does the path have any pathspec patter as its prefix?"
It shouldn't check for "prefix", but for "leading path components", in
other words, the match must happen at directory boundaries.

And I do not think *this* bug is too late to fix. We should fix it.

--

To: Junio C Hamano <gitster@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>
Date: Monday, March 30, 2009 - 1:41 pm

From the ls-tree documentation, I was under the impression that "--"
matching "--dashed" was intended:

When paths are given, show them (note that this isn't really raw
pathnames, but rather a list of patterns to match).

It doesn't make sense to me match like this, either; but I do think it
was intended and it will break things if people depend on the
existing behavior.

--
Eric Wong
--

To: Eric Wong <normalperson@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>
Date: Monday, March 30, 2009 - 2:05 pm

Ok, but then the decision to descend into --dashed should be consistent
with that policy, no? Right now, it appears that giving "--" alone says
"Anything under --dashed can never match that pattern, so I wouldn't
bother recursing into it".
--

To: Junio C Hamano <gitster@...>
Cc: Björn <B.Steinbrink@...>, Anton Gyllenberg <anton@...>, <git@...>
Date: Monday, March 30, 2009 - 6:58 pm

Right. Except in the case when there are multiple files inside --dashed/
as Björn's email illustrated. So there seems to be a bug in the way
the number of files inside --dashed/ affects what "--" does when used
with "--dashed/1" (if --dashed/2 also exists). Very confusing :x

--
Eric Wong
--

To: Eric Wong <normalperson@...>
Cc: Anton Gyllenberg <anton@...>, Junio C Hamano <gitster@...>, <git@...>
Date: Tuesday, March 31, 2009 - 3:11 am

I guess that paragraph was meant to explain why "git ls-tree HEAD
Documentation" and "git ls-tree HEAD Documentation/" give different
results. The first one shows the entry for the tree object, while the
second one shows the contents of the tree object. In contrast to "ls"

It's not the number of files that matters. With just one file, you just
don't notice the buggy behaviour, because showing all files is the same
as showing the specified file.

And interestingly, the problem doesn't seem to be in
show_tree/show_recursive, but in match_tree_entry.

With "git ls-tree HEAD gitweb/git-favicon.png g" we descend into gitweb/
and at some point we get:

match = "g"
base = "gitweb/"

And we have:
if (baselen >= matchlen) {
if (strncmp(base, match, matchlen))
continue;
/* The base is a subdirectory of a path which was specified */
return 1;
}

So we return 1 there. The code doesn't do what the comment says, so I
guess we can be pretty sure that the behaviour is not intended.

Björn
--

To: Eric Wong <normalperson@...>
Cc: Anton Gyllenberg <anton@...>, Junio C Hamano <gitster@...>, <git@...>
Date: Tuesday, March 31, 2009 - 3:31 am

Yup, it's in match_tree_entry, you get the same thing with git show.
With git.git, you can try with:

git show 4fa535a -- Documentation/git-merge.txt D

I'll try to get a patch done, if noone beats me to it.

Björn
--

To: Eric Wong <normalperson@...>
Cc: Anton Gyllenberg <anton@...>, Junio C Hamano <gitster@...>, <git@...>
Date: Tuesday, March 31, 2009 - 5:41 am

Ah, crap, "git show" actually uses a different function,
tree_entry_interesting, which happens to have the same problem, but
needs a slightly different fix.

Björn
--

To: Junio C Hamano <gitster@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>, Eric Wong <normalperson@...>
Date: Tuesday, March 31, 2009 - 11:05 am

Previously the code did a simple prefix match, which means that it
treated for example "foo/" as a subdirectory of "f".

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
---
I'm not exactly happy with the commit message, but that's the best I
could come up with. Probably shows how little I know about that code :-/
The test suite still passes and I'll try to provide a new testcase
tonight or tommorow.

tree-diff.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 9f67af6..b05d0f4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -118,10 +118,16 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
continue;

/*
- * The base is a subdirectory of a path which
- * was specified, so all of them are interesting.
+ * If the base is a subdirectory of a path which
+ * was specified, all of them are interesting.
*/
- return 2;
+ if (!matchlen ||
+ base[matchlen] == '/' ||
+ match[matchlen - 1] == '/')
+ return 2;
+
+ /* Just a random prefix match */
+ continue;
}

/* Does the base match? */
--
1.6.2.1.426.gf94cd
--

To: Björn <B.Steinbrink@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>, Eric Wong <normalperson@...>
Date: Thursday, April 2, 2009 - 12:32 am

I'm planning to queue this.

From: Björn Steinbrink <B.Steinbrink@gmx.de>
Date: Tue, 31 Mar 2009 17:05:01 +0200
Subject: [PATCH] tree_entry_interesting: a pathspec only matches at directory boundary

Previously the code did a simple prefix match, which means that a
path in a directory "frotz/" would have matched with pathspec "f".

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
t/t4010-diff-pathspec.sh | 8 ++++++++
tree-diff.c | 12 +++++++++---
2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index ad3d9e4..4c4c8b1 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -62,4 +62,12 @@ test_expect_success \
'git diff-index --cached $tree -- file0/ >current &&
compare_diff_raw current expected'

+test_expect_success 'diff-tree pathspec' '
+ tree2=$(git write-tree) &&
+ echo "$tree2" &&
+ git diff-tree -r --name-only $tree $tree2 -- pa path1/a >current &&
+ >expected &&
+ test_cmp expected current
+'
+
test_done
diff --git a/tree-diff.c b/tree-diff.c
index 9f67af6..b05d0f4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -118,10 +118,16 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
continue;

/*
- * The base is a subdirectory of a path which
- * was specified, so all of them are interesting.
+ * If the base is a subdirectory of a path which
+ * was specified, all of them are interesting.
*/
- return 2;
+ if (!matchlen ||
+ base[matchlen] == '/' ||
+ match[matchlen - 1] == '/')
+ return 2;
+
+ /* Just a random prefix match */
+ continue;
}

/* Does the base match? */
--

To: Junio C Hamano <gitster@...>
Cc: Anton Gyllenberg <anton@...>, <git@...>, Eric Wong <normalperson@...>
Date: Thursday, April 2, 2009 - 7:38 am

Ah, thanks. Got busy with other stuff, and tried to fix another related
bug in ls-tree which made me forget to send the match_tree_entry fix :-/

That other ls-tree bug is with recursing into subdirectories, because
match_tree_entry matches even when the base is a subdirectory, but
ls-tree doesn't actually want that behaviour, I think. For example:

$ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl
100644 blob ddbe633 git-gui/macosx/AppMain.tcl

$ git ls-tree --abbrev HEAD git-gui/
100644 blob f96112d git-gui/.gitattributes
100644 blob 6483b21 git-gui/.gitignore
100755 blob b3f937e git-gui/GIT-VERSION-GEN
100644 blob 3ad8a21 git-gui/Makefile
100755 blob 12e117e git-gui/git-gui--askpass
100755 blob e018e07 git-gui/git-gui.sh
040000 tree f723285 git-gui/lib
040000 tree 73f3c34 git-gui/macosx
040000 tree 11cd1a0 git-gui/po
040000 tree 144728d git-gui/windows

$ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl git-gui/
100644 blob f96112d git-gui/.gitattributes
100644 blob 6483b21 git-gui/.gitignore
100755 blob b3f937e git-gui/GIT-VERSION-GEN
100644 blob 3ad8a21 git-gui/Makefile
100755 blob 12e117e git-gui/git-gui--askpass
100755 blob e018e07 git-gui/git-gui.sh
040000 tree f723285 git-gui/lib
100644 blob ddbe633 git-gui/macosx/AppMain.tcl
100644 blob b3bf15f git-gui/macosx/Info.plist
100644 blob 77d88a7 git-gui/macosx/git-gui.icns
040000 tree 11cd1a0 git-gui/po
040000 tree 144728d git-gui/windows

The last ls-tree shows all entries from git-gui/macosx/, beacuse the
first pattern makes it descend into that tree and all entries are
matched by the git-gui/ prefix. So the combined ls-tree shows more than
what the individual calls show. Seems wrong to me, but I'm unsure how to
tackle that, assuming that match_tree_entry is right in allowing any
base that's a subdirectory of a specified pathspec.

Björn
--

To: Björn <B.Steinbrink@...>
Cc: Anton Gyllenberg <anton@...>, Linus Torvalds <torvalds@...>, <git@...>, Eric Wong <normalperson@...>
Date: Thursday, April 2, 2009 - 12:41 am

Previously the code did a simple prefix match, which means that a path in
a directory "frotz/" would have matched with pathspec "f".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

* And this is a companion patch to fix ls-tree. The test case uses a
tree that has path3/1.txt and path3/2.txt in it.

The bug Eric diagnosed and worked around in git-svn makes the current
code show these two paths when pathspec "pa" and "path3/a" are given.
The presense of "path3/a" makes the tree walker traverse down to path3
subtree (in case something that matches "a" is in there---this is a
correct behaviour), but then in that subtree, "pa" incorrectly matches
"path3/1.txt".

This logic dates back to 0ca14a5 (Start adding interfaces to read in
partial trees, 2005-07-14). I think it is just a simple oversight and
we should fix it.

t/t3101-ls-tree-dirname.sh | 6 ++++++
tree.c | 8 ++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t3101-ls-tree-dirname.sh b/t/t3101-ls-tree-dirname.sh
index 4dd7d12..51cb4a3 100755
--- a/t/t3101-ls-tree-dirname.sh
+++ b/t/t3101-ls-tree-dirname.sh
@@ -135,4 +135,10 @@ test_expect_success \
EOF
test_output'

+test_expect_success 'ls-tree filter is leading path match' '
+ git ls-tree $tree pa path3/a >current &&
+ >expected &&
+ test_output
+'
+
test_done
diff --git a/tree.c b/tree.c
index 03e782a..d82a047 100644
--- a/tree.c
+++ b/tree.c
@@ -60,8 +60,12 @@ static int match_tree_entry(const char *base, int baselen, const char *path, uns
/* If it doesn't match, move along... */
if (strncmp(base, match, matchlen))
continue;
- /* The base is a subdirectory of a path which was specified. */
- return 1;
+ /* pathspecs match only at the directory boundaries */
+ if (!matchlen ||
+ base[matchlen] == '/' ||
+ match[matchlen - 1] == '/')
+ return 1;
+ continue;
}

/* Does the base matc...

To: Junio C Hamano <gitster@...>
Cc: Björn Steinbrink <B.Steinbrink@...>, Anton Gyllenberg <anton@...>, <git@...>, Eric Wong <normalperson@...>
Date: Thursday, April 2, 2009 - 12:36 pm

I have this suspicion that we should be able to write it more readably,
but yes:

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

because the current code is clearly buggy.

Linus
--

Previous thread: [PATCH 1/2] Add feature release instructions to MaintNotes addendum by rocketraman on Monday, March 30, 2009 - 1:35 am. (2 messages)

Next thread: Re: [PATCH] git-svn: don't output git commits in quiet mode by Junio C Hamano on Monday, March 30, 2009 - 2:44 am. (3 messages)