Ok, here's the patch I sent out earlier, except now it's been split up into a series fo five patches (and has some trivial one-liner cleanups here and there, mainly to shrink the patches a bit) It's a series of five patches: - Add 'df_name_compare()' helper function - Make 'traverse_tree()' use linked structure rather than 'const char *base' - Add return value to 'traverse_tree()' callback - Make 'traverse_trees()' traverse conflicting DF entries in parallel - Move 'unpack_trees()' over to 'traverse_trees()' interface with a more lines added than removed: cache.h | 1 + merge-tree.c | 58 ++++--- read-cache.c | 35 ++++ tree-walk.c | 59 ++++++- tree-walk.h | 23 ++- unpack-trees.c | 530 ++++++++++++++++++++++++++------------------------------ 6 files changed, 387 insertions(+), 319 deletions(-) but especially in this new split-up format, it's now more possible to see how the last patch actually simplifies things. --
This new helper is identical to base_name_compare(), except it compares conflicting directory/file entries as equal in order to help handling DF conflicts (thus the name). Note that while a directory name compares as equal to a regular file with the new helper, they then individually compare _differently_ to a filename that has a dot after the basename (because '\0' < '.' < '/'). So a directory called "foo/" will compare equal to a file "foo", even though "foo.c" will compare after "foo" and before "foo/" This will be used by routines that want to traverse the git namespace but then handle conflicting entries together when possible. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- cache.h | 1 + read-cache.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 0 deletions(-) diff --git a/cache.h b/cache.h index e230302..6eb16cb 100644 --- a/cache.h +++ b/cache.h @@ -536,6 +536,7 @@ extern int create_symref(const char *ref, const char *refs_heads_master, const c extern int validate_headref(const char *ref); extern int base_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2); +extern int df_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2); extern int cache_name_compare(const char *name1, int len1, const char *name2, int len2); extern void *read_object_with_reference(const unsigned char *sha1, diff --git a/read-cache.c b/read-cache.c index 657f0c5..bf649a3 100644 --- a/read-cache.c +++ b/read-cache.c @@ -351,6 +351,41 @@ int base_name_compare(const char *name1, int len1, int mode1, return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0; } +/* + * df_name_compare() is identical to base_name_compare(), except it + * compares conflicting directory/file entries as equal. Note that + * while a directory name compares as equal to a regular file, they + * then individually compare _differently_ to a filename that has + * a dot after ...
Uh, that screams just bloody murder at me with regard to working on sorted material. You'll never even get into the situation where the conflicting "equal" entries will be compared if you presorted them with something in between. Consider the case of a merge of A: blubb blubb.hi B: blubb.hi blubb/ Any traversal that is done reasonably efficiently will never compare blubb to blubb/ and I don't see how to get around this. Basically, I think you need a special traversal routine. This routine will push all non-directory entries during a parallel traverse into a might-conflict-queue, appending a slash in the course. The front of this queue then gets compared with the next processed entry. If it is larger, it is kept at the front, if it is equal, the conflict gets treated/resolved, if it is smaller, it gets popped (since it can't conflict anymore). -- David Kastrup, Kriemhildstr. 15, 44793 Bochum --
Correct. There _is_ no sort order that will find the conflict in a single pass, yet get the sorting right for this case, because there is no sort that can satisfy both the "get DF conflicts together" _and_ the "keep things in the right order" requirements. Btw, the old code was no better (and was probably worse) - it used "strcmp()" to order the things, so it effectively did the same thing, it just wasn't as obvious about it (and got the case of "foo/" vs "foo." Yes, we need to handle it in two passes. Which is actually hopefully not all that hard, but it is totally impossible (at least for me) with the old code that was so hard to follow. So what I did was to rewrite the code so that it can be followed and maintained, and now my plan is to simply put the result in a separate index. We've always really wanted to do that *anyway* (right now we have this "discard and re-read index" cruft in the callers and in the error cases to get back the index we overwrote when merging!), and I bet we could have done that with the old code too, but _I_ couldn't have done it. The problem with having a single index as both source and destination is that you cannot do a two-phase thing, because you are basically destroying one of your sources in your first phase (or put another way: in the second phase, you don't really know whether the index entry you now find was there to begin with, of whether you just generated it yourself earlier). So what that series of five patches do is to not fix a single bug, but to make the thing a bit easier for me to look at (and I think it's good for other reasons too - one less arbitrary tree traversal function!) I'm hoping that people can look at the patches and test them, and say "yeah, that looks like it does the same thing we used to", and then when I get the energy (hopefully later today) I'll just make it take a separate source and destination index. Linus --
Well, as I said: a single pass is ok when additionally supported by a FIFO keeping x around until x/ (or its place in the order of things) passes by. This will be O(1) with regards to comparisons, and typically cheap with regard to memory requirements (things get unfriendly if there are billions of files or even directories obeying the pattern x.*, but only with regard to memory, not speed). -- David Kastrup, Kriemhildstr. 15, 44793 Bochum --
This makes the calling convention a bit less obvious, but a lot more
flexible. Instead of allocating and extending a new 'base' string, we
just link the top-most name into a linked list of the 'info' structure
when traversing a subdirectory, and we can generate the basename by
following the list.
Perhaps even more importantly, the linked list of info structures also
gives us a place to naturally save off other information than just the
directory name.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
merge-tree.c | 51 +++++++++++++++++++++++++++------------------------
tree-walk.c | 35 +++++++++++++++++++++++++++++++++--
tree-walk.h | 20 ++++++++++++++++++--
3 files changed, 78 insertions(+), 28 deletions(-)
diff --git a/merge-tree.c b/merge-tree.c
index e083246..a3511b7 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -168,7 +168,13 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const unsi
return res;
}
-static void resolve(const char *base, struct name_entry *branch1, struct name_entry *result)
+static char *traverse_path(const struct traverse_info *info, const struct name_entry *n)
+{
+ char *path = xmalloc(traverse_path_len(info, n) + 1);
+ return make_traverse_path(path, info, n);
+}
+
+static void resolve(const struct traverse_info *info, struct name_entry *branch1, struct name_entry *result)
{
struct merge_list *orig, *final;
const char *path;
@@ -177,7 +183,7 @@ static void resolve(const char *base, struct name_entry *branch1, struct name_en
if (!branch1)
return;
- path = xstrdup(mkpath("%s%s", base, result->path));
+ path = traverse_path(info, result);
orig = create_entry(2, branch1->mode, branch1->sha1, path);
final = create_entry(0, result->mode, result->sha1, path);
@@ -186,9 +192,8 @@ static void resolve(const char *base, struct name_entry *branch1, struct name_en
add_merge_entry(final);
}
-static int unresolved_directory(const char *base, struct name_entry ...This makes the traverse_trees() entry comparator routine use the more
relaxed form of name comparison that considers files and directories
with the same name identical.
We pass in a separate mask for just the directory entries, so that the
callback routine can decide (if it wants to) to only handle one or the
other type, but generally most (all?) users are expected to really want
to see the case of a name 'foo' showing up in one tree as a file and in
another as a directory at the same time.
In particular, moving 'unpack_trees()' over to use this tree traversal
mechanism requires this.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
merge-tree.c | 2 +-
tree-walk.c | 8 ++++++--
tree-walk.h | 3 ++-
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/merge-tree.c b/merge-tree.c
index 8be0b9f..02fc10f 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -287,7 +287,7 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3])
* The successful merge rules are the same as for the three-way merge
* in git-read-tree.
*/
-static int threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
+static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *info)
{
/* Same in both? */
if (same_entry(entry+1, entry+2)) {
diff --git a/tree-walk.c b/tree-walk.c
index 7170e37..842cb6a 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -62,7 +62,7 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
static int entry_compare(struct name_entry *a, struct name_entry *b)
{
- return base_name_compare(
+ return df_name_compare(
a->path, tree_entry_len(a->path, a->sha1), a->mode,
b->path, tree_entry_len(b->path, b->sha1), b->mode);
}
@@ -142,6 +142,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
for (;;) {
unsigned long mask = ...This not only deletes more code than it adds, it gets rid of a
singularly hard-to-understand function (unpack_trees_rec()), and
replaces it with a set of smaller and simpler functions that use the
generic tree traversal mechanism to walk over one or more git trees in
parallel.
It's still not the most wonderful interface, and by no means is the new
code easy to understand either, but it's at least a bit less opaque.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
unpack-trees.c | 530 ++++++++++++++++++++++++++------------------------------
1 files changed, 249 insertions(+), 281 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index 3e448d8..ee9be29 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -7,270 +7,12 @@
#include "progress.h"
#include "refs.h"
-#define DBRT_DEBUG 1
-
-struct tree_entry_list {
- struct tree_entry_list *next;
- unsigned int mode;
- const char *name;
- const unsigned char *sha1;
-};
-
-static struct tree_entry_list *create_tree_entry_list(struct tree_desc *desc)
-{
- struct name_entry one;
- struct tree_entry_list *ret = NULL;
- struct tree_entry_list **list_p = &ret;
-
- while (tree_entry(desc, &one)) {
- struct tree_entry_list *entry;
-
- entry = xmalloc(sizeof(struct tree_entry_list));
- entry->name = one.path;
- entry->sha1 = one.sha1;
- entry->mode = one.mode;
- entry->next = NULL;
-
- *list_p = entry;
- list_p = &entry->next;
- }
- return ret;
-}
-
-static int entcmp(const char *name1, int dir1, const char *name2, int dir2)
-{
- int len1 = strlen(name1);
- int len2 = strlen(name2);
- int len = len1 < len2 ? len1 : len2;
- int ret = memcmp(name1, name2, len);
- unsigned char c1, c2;
- if (ret)
- return ret;
- c1 = name1[len];
- c2 = name2[len];
- if (!c1 && dir1)
- c1 = '/';
- if (!c2 && dir2)
- c2 = '/';
- ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
- if (c1 && c2 && !ret)
- ret = len1 - len2;
- return ret;
-}
-
static inline void remove_entry(int remove)
...This allows the callback to return an error value, but it can also
specify which of the tree entries that it actually used up by returning
a positive mask value.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
merge-tree.c | 9 +++++----
tree-walk.c | 22 +++++++++++++++-------
tree-walk.h | 4 ++--
3 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/merge-tree.c b/merge-tree.c
index a3511b7..8be0b9f 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -287,31 +287,32 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3])
* The successful merge rules are the same as for the three-way merge
* in git-read-tree.
*/
-static void threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
+static int threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
{
/* Same in both? */
if (same_entry(entry+1, entry+2)) {
if (entry[0].sha1) {
resolve(info, NULL, entry+1);
- return;
+ return mask;
}
}
if (same_entry(entry+0, entry+1)) {
if (entry[2].sha1 && !S_ISDIR(entry[2].mode)) {
resolve(info, entry+1, entry+2);
- return;
+ return mask;
}
}
if (same_entry(entry+0, entry+2)) {
if (entry[1].sha1 && !S_ISDIR(entry[1].mode)) {
resolve(info, NULL, entry+1);
- return;
+ return mask;
}
}
unresolved(info, entry);
+ return mask;
}
static void merge_trees(struct tree_desc t[3], const char *base)
diff --git a/tree-walk.c b/tree-walk.c
index f9f7d22..7170e37 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -135,8 +135,9 @@ char *make_traverse_path(char *path, const struct traverse_info *info, const str
return path;
}
-void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
+int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
{
+ int ret = 0;
struct name_entry *entry = xmalloc(n*sizeof(*entry));
...Oh, and sorry about the To: undisclosed-recipients: ; thing - I thought I'd just bounce the messages from the mbox I did with git-format-patch, but since those things didn't have any "To:" fields, that resulted in some rather ugly email headers. My bad. Linus --
This all looks good to me. -Daniel *This .sig left intentionally blank* --
There's a really stupid bug in function that compares filenames using
the the linked list structure which makes it not compare the first path
component when using "--prefix".
So it can only hit in the case of us having a "prefix" entry that makes
the name of the "root" info structure non-empty, and I suspect that
because of all the other horrid crud we do for --prefix=xyzzy handling in
builtin-read-tree.c you can't actually trigger this bug, but my other
cleanups (which I'll send out once I've tested them a bit more) will make
this bug trigger.
This fairly obvious patch fixes it by just making sure that we always end
up having a ->prev entry if we have a pathname component.
Linus
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu Mar 6 15:44:48 2008 -0800
Fix tree-walking compare_entry() in the presense of --prefix
When we make the "root" tree-walk info entry have a pathname in it, we
need to have a ->prev pointer so that compare_entry will actually notice
and traverse into the root.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
tree-walk.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/tree-walk.c b/tree-walk.c
index 842cb6a..02e2aed 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -107,6 +107,7 @@ int tree_entry(struct tree_desc *desc, struct name_entry *entry)
void setup_traverse_info(struct traverse_info *info, const char *base)
{
int pathlen = strlen(base);
+ static struct traverse_info dummy;
memset(info, 0, sizeof(*info));
if (pathlen && base[pathlen-1] == '/')
@@ -114,6 +115,8 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
info->pathlen = pathlen ? pathlen + 1 : 0;
info->name.path = base;
info->name.sha1 = (void *)(base + pathlen + 1);
+ if (pathlen)
+ info->prev = &dummy;
}
char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n)
--
