Re: [PATCH 12/16] subtree: rewriting outgoing commits

Previous thread: Re: error commiting in Git by Daniel França on Saturday, July 31, 2010 - 3:25 pm. (5 messages)

Next thread: [PATCH/RFC v2] Documentation: flesh out “git pull” description by Jonathan Nieder on Saturday, July 31, 2010 - 7:54 pm. (6 messages)
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Something to play with so we can evaluate which is the best strategy
for non-full clone (or whatever you call it).

The idea is the same: pack only enough to access a subtree, rewrite
commits at client side, rewrite again when pushing. However I put
git-replace into the mix, so at least commit SHA-1 looks as same as from
upstream. git-subtree is not needed (although it's still an option)

With this, I can clone Documentaion/ from git.git, update and push. I
haven't tested it further. Space consumption is 24MB (58MB for full
repo).  Not really impressive, but if one truely cares about disk
space, he/she should also use shallow clone.

Performance is impacted, due to bulk commit replacement. There is a
split second delay for every command. It's the price of replacing 24k
commits every time. I think the delay could be improved a little bit
(caching or mmap..)

Rewriting commits at clone takes time too. Doing individual object
writing takes lots of space and time. I put all new objects directly
to a pack now. Rewriting time now becomes quite acceptable (a few
seconds). Although deep subtree/repo may take longer. Rewriting on
demand can be considered in such cases.

Repo-care commands like fsck, repack, gc are left out for now.

Finally, it's more of a hack just to see how far I can go. It will
break things.

Nguyễn Thái Ngọc Duy (16):
  Add core.subtree
  list-objects: limit traversing within the given subtree if
    core.subtree is set
  parse_object: keep sha1 even when parsing replaced one
  Allow to invalidate a commit in in-memory object store
  Hook up replace-object to allow bulk commit replacement
  upload-pack: use a separate variable to control whether internal
    rev-list is used
  upload-pack: support subtree pack
  fetch-pack: support --subtree
  subtree: rewrite incoming commits
  clone: support subtree clone with parameter --subtree
  pack-objects: add --subtree (for pushing)
  subtree: rewriting outgoing commits
  Update commit_tree() interface to take ...
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

This variable contains the subtree. With core_subtree being non-empty
behavior of git may be totally different.

Perhaps this should not stay in .git/config, rather .git/subtree

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h       |    1 +
 config.c      |    3 +++
 environment.c |    2 ++
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index c9fa3df..04ebe6e 100644
--- a/cache.h
+++ b/cache.h
@@ -551,6 +551,7 @@ extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern const char *core_subtree;
 
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index cdcf583..86ded29 100644
--- a/config.c
+++ b/config.c
@@ -595,6 +595,9 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.subtree"))
+		return git_config_string(&core_subtree, var, value);
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 83d38d3..1365dd0 100644
--- a/environment.c
+++ b/environment.c
@@ -57,6 +57,8 @@ int core_apply_sparse_checkout;
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
 
+const char *core_subtree;
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 list-objects.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 8953548..1b25b54 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -61,12 +61,15 @@ static void process_tree(struct rev_info *revs,
 			 struct tree *tree,
 			 show_object_fn show,
 			 struct name_path *path,
-			 const char *name)
+			 const char *name,
+			 const char *subtree)
 {
 	struct object *obj = &tree->object;
 	struct tree_desc desc;
 	struct name_entry entry;
 	struct name_path me;
+	const char *slash;
+	int subtree_len;
 
 	if (!revs->tree_objects)
 		return;
@@ -82,13 +85,21 @@ static void process_tree(struct rev_info *revs,
 	me.elem = name;
 	me.elem_len = strlen(name);
 
+	if (subtree) {
+		slash = strchr(subtree, '/');
+		subtree_len = slash ? slash - subtree : strlen(subtree);
+	}
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
-		if (S_ISDIR(entry.mode))
-			process_tree(revs,
-				     lookup_tree(entry.sha1),
-				     show, &me, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			if (!subtree || !strncmp(entry.path, subtree, subtree_len))
+				process_tree(revs,
+					     lookup_tree(entry.sha1),
+					     show, &me, entry.path,
+					     slash && slash[1] ? slash+1 : NULL);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.sha1,
 					show, &me, entry.path);
@@ -164,7 +175,7 @@ void traverse_commit_list(struct rev_info *revs,
 		}
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     NULL, name);
+				     NULL, name, core_subtree);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-- 
1.7.1.rc1.69.g24c2f7

--

From: Ævar Arnfjörð Bjarmason
Date: Sunday, August 1, 2010 - 4:30 am

> +       int subtree_len;

Shouldn't that be size_t? strlen returns size_t, and strncmp expects
size_t, not int.
--

From: Nguyen Thai Ngoc Duy
Date: Sunday, August 1, 2010 - 4:11 pm

Hmm.. yeah. The compiler didn't warn me. Anyway subtree_len should be
small enough (i.e. < PATH_MAX) that type does not really matters.
-- 
Duy
--

From: Elijah Newren
Date: Sunday, August 1, 2010 - 9:21 pm

[Empty message]
From: Nguyen Thai Ngoc Duy
Date: Sunday, August 1, 2010 - 11:51 pm

Yes. Thanks. Will fix.
-- 
Duy
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 object.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/object.c b/object.c
index 277b3dd..7adfda7 100644
--- a/object.c
+++ b/object.c
@@ -199,7 +199,7 @@ struct object *parse_object(const unsigned char *sha1)
 			return NULL;
 		}
 
-		obj = parse_object_buffer(repl, type, size, buffer, &eaten);
+		obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
 		if (!eaten)
 			free(buffer);
 		return obj;
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

This is needed if replacing object happens at run time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.c |   15 +++++++++++++++
 commit.h |    2 ++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index e9b0750..d1e30b2 100644
--- a/commit.c
+++ b/commit.c
@@ -315,6 +315,21 @@ int parse_commit(struct commit *item)
 	return ret;
 }
 
+int invalidate_commit(struct commit *item)
+{
+	if (!item)
+		return -1;
+
+	if (item->object.parsed) {
+		item->object.parsed = 0;
+		if (item->buffer) {
+			free(item->buffer);
+			item->buffer = NULL;
+		}
+	}
+	return 0;
+}
+
 struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p)
 {
 	struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
diff --git a/commit.h b/commit.h
index eb2b8ac..d8c01ea 100644
--- a/commit.h
+++ b/commit.h
@@ -41,6 +41,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size);
 
 int parse_commit(struct commit *item);
 
+int invalidate_commit(struct commit *item);
+
 struct commit_list * commit_list_insert(struct commit *item, struct commit_list **list_p);
 unsigned commit_list_count(const struct commit_list *l);
 struct commit_list * insert_by_date(struct commit *item, struct commit_list **list);
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

$GIT_DIR/subtree contains commit mapping in subtree mode. It's quite
large that putting it in $GIT_DIR/refs/replace may slow git down
significantly. Even with this, there will be a split second delay for
every git command.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile         |    2 +
 replace_object.c |    5 ++
 subtree.c        |  117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h        |    2 +
 4 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100644 subtree.c
 create mode 100644 subtree.h

diff --git a/Makefile b/Makefile
index f33648d..0d13538 100644
--- a/Makefile
+++ b/Makefile
@@ -525,6 +525,7 @@ LIB_H += sigchain.h
 LIB_H += strbuf.h
 LIB_H += string-list.h
 LIB_H += submodule.h
+LIB_H += subtree.h
 LIB_H += tag.h
 LIB_H += transport.h
 LIB_H += tree.h
@@ -629,6 +630,7 @@ LIB_OBJS += sigchain.o
 LIB_OBJS += strbuf.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
+LIB_OBJS += subtree.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += trace.o
diff --git a/replace_object.c b/replace_object.c
index eb59604..5fe4099 100644
--- a/replace_object.c
+++ b/replace_object.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "sha1-lookup.h"
 #include "refs.h"
+#include "subtree.h"
 
 static struct replace_object {
 	unsigned char sha1[2][20];
@@ -82,6 +83,7 @@ static void prepare_replace_object(void)
 	if (replace_object_prepared)
 		return;
 
+	prepare_subtree_commit();
 	for_each_replace_ref(register_replace_ref, NULL);
 	replace_object_prepared = 1;
 }
@@ -99,6 +101,9 @@ const unsigned char *lookup_replace_object(const unsigned char *sha1)
 
 	prepare_replace_object();
 
+	if (core_subtree)
+		cur = subtree_lookup_object(cur);
+
 	/* Try to recursively replace the object */
 	do {
 		if (--depth < 0)
diff --git a/subtree.c b/subtree.c
new file mode 100644
index 0000000..601d827
--- /dev/null
+++ b/subtree.c
@@ -0,0 +1,117 @@
+#include "cache.h"
+#include ...
From: Junio C Hamano
Date: Monday, August 2, 2010 - 12:58 pm

I really do not like the use of "replace" for the purpose of narrow
clones.  While "replace" is about fixing a mistake by tweaking trees, a
desire to have a narrow clone at this moment is _not_ a mistake.  You may
want to have wider or full clone of the project tomorrow.  You may want to
push the result of committing on top of such a narrowed clone back to a
full repository.  My gut feeling is that that use of "replace" to stub out
the objects that you do not currently have would make it a nightmare when
you would want to widen (especially to widen over the wire while pushing
into a full repository on the other end), although I haven't looked at all
the patches in the series.

Can you back up a bit and give us a high-level overview of how various
operations in a narrowed clone should work, and how you achieve that
design goal?

Let's take an example of starting from git.git and narrow-clone only its
Documentation/ (as you seem to have used as a guinea-pig) subdirectory.
For the sake of simplicity, let's say the upstream project has only one
commit.

One plausible approach would be to have the commit, its top level tree
object, its Documentation/ tree object and all the blobs below that level,
while other blobs and trees that are reachable from the top level tree
object are left missing, but somehow are marked so that fsck would think
they are OK to be missing.  Your worktree would obviously be narrowed to
the same Documentation/ area, and unlike the narrow checkout codepath, you
do not widen on demand (unless you automatically fetch missing parts of
the tree, which I do not think you should do by default to help people who
work while at 30,000ft).  Instead, any operation that tries to modify
outside the "subtree" area should fail.

When you build a commit that represents a Documentation patch on top of
such a narrowed clone, because you have a full tree of Documentation/
area, you can come up with the updated tree object for that part of the
project.  If "subtree" mode (aka ...
From: Nguyen Thai Ngoc Duy
Date: Monday, August 2, 2010 - 3:42 pm

Indeed. My intention was "hey this repo is too big, I only need some
pieces of it. Let me grab something and do my work. (Then throw away
the cloned repo)". It's best used together with shallow clone to give
low download/disk space, and a minimum tree to fix something quick.

I'm not really sure if such repos are sustainable in long run. And no
I did not want to widen/narrow the tree (as it was to be throw away
tree). Now thinking of widening. The way I do narrow clone is quite
similar with shallow clone. I hope the way shallow clone is deepen can

Operations work as normal (as the incomplete clone is augmented to
become "normal"). In order to make it looks normal, every time a new
commit comes in (either from another repository, or user creates a new
one), the commit needs to be processed/replaced, so that the repo

Changes outside the subtree area are dropped on the floor now, not


This is where git-replace comes in. I do not want to deal with full
flat index. Giving pointers to missing objects may make git commands
nervous. I rewrite the commit so that now it only has Documentation/
and nothing else (which I have all needed objects). The index is
narrowed too. Because the index (even narrowed) is complete (i.e. all
entries reachable), most operations should work.

Then, to hide the helper commit from user, I replace the original
(full) commit with this new commit. So from outside git sees SHA-1 of
the original commit, but its content is from the helper one. These
helper commits guarantee git won't reach out for missing objects.

It's a trade off. Doing full index requires much more effort into git.
Using "git-subtree split", while free git developers to do other
things, might be inconvenient for users (without server support, full
repo must be downloaded, replaced SHA-1 from git-subtree cannot be

That's just a part of the story. Repository integrity is a
prerequisite in git from the beginning. git-merge operates directly on
trees so cache-tree won't help much. ...
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 upload-pack.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index dc464d7..e432e83 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,8 +160,9 @@ static void create_pack_file(void)
 	ssize_t sz;
 	const char *argv[10];
 	int arg = 0;
+	int internal_rev_list = shallow_nr;
 
-	if (shallow_nr) {
+	if (internal_rev_list) {
 		memset(&rev_list, 0, sizeof(rev_list));
 		rev_list.proc = do_rev_list;
 		rev_list.out = -1;
@@ -187,7 +188,7 @@ static void create_pack_file(void)
 	argv[arg++] = NULL;
 
 	memset(&pack_objects, 0, sizeof(pack_objects));
-	pack_objects.in = shallow_nr ? rev_list.out : -1;
+	pack_objects.in = internal_rev_list ? rev_list.out : -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
 	pack_objects.git_cmd = 1;
@@ -197,7 +198,7 @@ static void create_pack_file(void)
 		die("git upload-pack: unable to fork git-pack-objects");
 
 	/* pass on revisions we (don't) want */
-	if (!shallow_nr) {
+	if (!internal_rev_list) {
 		FILE *pipe_fd = xfdopen(pack_objects.in, "w");
 		if (!create_full_pack) {
 			int i;
@@ -311,7 +312,7 @@ static void create_pack_file(void)
 		error("git upload-pack: git-pack-objects died with error.");
 		goto fail;
 	}
-	if (shallow_nr && finish_async(&rev_list))
+	if (internal_rev_list && finish_async(&rev_list))
 		goto fail;	/* error was already reported */
 
 	/* flush the data */
-- 
1.7.1.rc1.69.g24c2f7

--

From: Elijah Newren
Date: Sunday, August 1, 2010 - 9:25 pm

Hi,

<snip>

I've got the exact same changes in one of my in-progress-patches in my
sparse-clone branch.  That is, other than the variable name, but I
like yours better.  Needless to say, I agree with this change.  :-)
--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

With core_subtree turned on (capability "subtree", request "subtree"
from fetch-pack), traverse_commit_list will be in "subtree mode",
which will not go farther than the given subtree.

As the result, the pack is broken be design, only contains enough
blobs/trees/commits to reach the given subtree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 upload-pack.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index e432e83..9b6710a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,7 +160,7 @@ static void create_pack_file(void)
 	ssize_t sz;
 	const char *argv[10];
 	int arg = 0;
-	int internal_rev_list = shallow_nr;
+	int internal_rev_list = shallow_nr || core_subtree;
 
 	if (internal_rev_list) {
 		memset(&rev_list, 0, sizeof(rev_list));
@@ -505,6 +505,20 @@ static void receive_needs(void)
 		if (debug_fd)
 			write_in_full(debug_fd, line, len);
 
+		if (!prefixcmp(line, "subtree ")) {
+			int len;
+			char *subtree;
+			if (core_subtree)
+				die("sorry, only one subtree supported");
+			len = strlen(line+8);
+			subtree = malloc(len+1);
+			memcpy(subtree, line+8, len-1);
+			subtree[len-1] = '\0'; /* \n */
+			if (subtree[len-2] != '/')
+				die("subtree request must end with a slash");
+			core_subtree = subtree;
+			continue;
+		}
 		if (!prefixcmp(line, "shallow ")) {
 			unsigned char sha1[20];
 			struct object *object;
@@ -624,7 +638,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed subtree";
 	struct object *o = parse_object(sha1);
 
 	if (!o)
-- 
1.7.1.rc1.69.g24c2f7

--

From: Elijah Newren
Date: Sunday, August 1, 2010 - 9:27 pm

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:

I'm not sure users would understand this error message; perhaps
something more like "Fetching/cloning from a subtree-sparse repository
not supported"?
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

This options requires subtree-aware upload-pack. It simply pass the
subtree from command line (or from $GIT_DIR/config) to upload-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/fetch-pack.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dbd8b7b..7460ecc 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -237,6 +237,8 @@ static int find_common(int fd[2], unsigned char *result_sha1,
 	for_each_ref(rev_list_insert_ref, NULL);
 
 	fetching = 0;
+	if (core_subtree)
+		packet_buf_write(&req_buf, "subtree %s\n", core_subtree);
 	for ( ; refs ; refs = refs->next) {
 		unsigned char *remote = refs->old_sha1;
 		const char *remote_hex;
@@ -692,6 +694,8 @@ static struct ref *do_fetch_pack(int fd[2],
 
 	if (is_repository_shallow() && !server_supports("shallow"))
 		die("Server does not support shallow clients");
+	if (core_subtree && !server_supports("subtree"))
+		die("Server does not support subtree");
 	if (server_supports("multi_ack_detailed")) {
 		if (args.verbose)
 			fprintf(stderr, "Server supports multi_ack_detailed\n");
@@ -860,6 +864,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 				pack_lockfile_ptr = &pack_lockfile;
 				continue;
 			}
+			if (!prefixcmp(arg, "--subtree=")) {
+				core_subtree = arg + 10;
+				continue;
+			}
 			usage(fetch_pack_usage);
 		}
 		dest = (char *)arg;
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

This adds the main function, subtree_import(), which is intended to be
used by "git clone".

Because subtree packs are not complete. They are barely usable. Git
client will cry out missing objects here and there... Theortically,
client code could be adapted to only look for objects within
subtree. That was painful to try.

Alternatively, subtree_import() rewrites commits to have only the
specified subtree, sealing all broken path. Git client now happily
works with these new commits.

However, users might not, because it's different commit, different
SHA-1. They can't use those SHA-1 to communicate within their team. To
work around this, all original commits are replaced by new commits
using git-replace.

Of course this is still not perfect. Users may be able to send SHA-1
around, which is consistent. They may not do the same with tree SHA-1.

Rewriting/replacing commits takes time and space. For replacing _all_
commits, the current replace mechanism is not suitable, which is why
subtree_lookup_object() was introduced in previous patches.

For rewriting, writing a huge number of objects is slow. So
subtree_import() builds a pack for all new objects. These packs are
not optimized. But it does reduce wait time for rewriting.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 subtree.c |  244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h |    1 +
 2 files changed, 245 insertions(+), 0 deletions(-)

diff --git a/subtree.c b/subtree.c
index 601d827..8c075be 100644
--- a/subtree.c
+++ b/subtree.c
@@ -115,3 +115,247 @@ const unsigned char *subtree_lookup_object(const unsigned char *sha1)
 		return subtree_commit[pos]->sha1[1];
 	return sha1;
 }
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
+	z_stream stream;
+	void *in, *out;
+	unsigned long maxsize;
+
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_DEFAULT_COMPRESSION);
+	maxsize = deflateBound(&stream, size);
+
+	in = *pptr;
+	out = ...
From: Elijah Newren
Date: Sunday, August 1, 2010 - 9:37 pm

Hi,


It may have been painful, but personally I think it's still the right
way to do it.  Of course, that's a pretty easy thing for me to say,
since you're pretty far ahead of me and I haven't felt your pain yet.
Maybe I'll change my mind after trying it for a while, but I'm not


My compiler complains that you didn't typecast the return value from
strlen to an int.
--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

With all the preparation work, here comes --subtree. So clone away!

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index efb1e6f..43bc34b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
 #include "branch.h"
 #include "remote.h"
 #include "run-command.h"
+#include "subtree.h"
 
 /*
  * Overall FIXMEs:
@@ -78,6 +79,8 @@ static struct option builtin_clone_options[] = {
 		   "path to git-upload-pack on the remote"),
 	OPT_STRING(0, "depth", &option_depth, "depth",
 		    "create a shallow clone of that depth"),
+	OPT_STRING(0, "subtree", &core_subtree, "subtree",
+		   "subtree clone"),
 
 	OPT_END()
 };
@@ -515,6 +518,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	strbuf_reset(&value);
 
 	if (path && !is_bundle) {
+		if (core_subtree)
+			die("Local subtree clone does not work (now)");
 		refs = clone_local(path, git_dir);
 		mapped_refs = wanted_peer_refs(refs, refspec);
 	} else {
@@ -623,6 +628,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_disconnect(transport);
 	}
 
+	if (core_subtree) {
+		git_config_set("core.subtree", core_subtree);
+		subtree_import();
+	}
+
 	if (!option_no_checkout) {
 		struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 		struct unpack_trees_options opts;
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/pack-objects.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0e81673..5d7b277 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2277,6 +2277,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			grafts_replace_parents = 0;
 			continue;
 		}
+		if (!prefixcmp(arg, "--subtree=")) {
+			core_subtree = arg + 10;
+			continue;
+		}
 		usage(pack_usage);
 	}
 
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Which is exactly the opposite of rewriting incoming commits.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 subtree.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 subtree.h |    1 +
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/subtree.c b/subtree.c
index 8c075be..739ff5f 100644
--- a/subtree.c
+++ b/subtree.c
@@ -359,3 +359,176 @@ void subtree_import()
 	if (revs.pending.nr)
 		free(revs.pending.objects);
 }
+
+/*
+ * The opposite of narrow_tree(). Put the subtree back to the original tree.
+ */
+static int widen_tree(const unsigned char *sha1,
+		      unsigned char *newsha1,
+		      const unsigned char *subtree_sha1,
+		      const char *prefix)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	struct strbuf buffer;
+	const char *slash;
+	int subtree_len;
+	enum object_type type;
+	unsigned long size;
+	char *tree;
+
+	slash = strchr(prefix, '/');
+	subtree_len = slash ? slash - prefix : strlen(prefix);
+
+	tree = read_sha1_file(sha1, &type, &size);
+	if (type != OBJ_TREE)
+		die("%s is not a tree", sha1_to_hex(sha1));
+
+	init_tree_desc(&desc, tree, size);
+	strbuf_init(&buffer, 8192);
+	while (tree_entry(&desc, &entry)) {
+		strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');
+
+		if (S_ISDIR(entry.mode) &&
+		    subtree_len == strlen(entry.path) &&
+		    !strncmp(entry.path, prefix, subtree_len)) {
+			unsigned char newtree_sha1[20];
+
+			if (slash && slash[1]) /* trailing slash does not count */
+				widen_tree(entry.sha1, newtree_sha1, subtree_sha1,
+					   prefix+subtree_len+1);
+			else
+				/* replace the tree */
+				memcpy(newtree_sha1, subtree_sha1, 20);
+
+			strbuf_add(&buffer, newtree_sha1, 20);
+		}
+		else
+			strbuf_add(&buffer, entry.sha1, 20);
+	}
+	free(tree);
+
+	if (write_sha1_file(buffer.buf, buffer.len, tree_type, newsha1)) {
+		error("Could not write replaced tree for %s", ...
From: Elijah Newren
Date: Sunday, August 1, 2010 - 9:40 pm

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:

Again, gcc here complains that "subtree.c:390: warning: field
precision should have type ‘int’, but argument 4 has type ‘size_t’" --
typecast the return value of strlen to int?
--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

In subtree mode, you work on a narrowed trees. You make narrowed
commits. If you want to push upstream, you would need to put your
updated subtree back to the full tree again. Otherwise upstream would
complain you delete all trees but your subtree, not good.

In order to do that, commit_tree() now takes the base tree SHA-1. With
that, it can create upstream-compatible commits. It does not now,
though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/commit-tree.c |    2 +-
 builtin/commit.c      |    2 +-
 builtin/merge.c       |    4 ++--
 builtin/notes.c       |    2 +-
 commit.c              |    2 +-
 commit.h              |    2 +-
 notes-cache.c         |    2 +-
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 87f0591..88a6833 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
 	if (strbuf_read(&buffer, 0, 0) < 0)
 		die_errno("git commit-tree: failed to read");
 
-	if (!commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) {
+	if (!commit_tree(buffer.buf, tree_sha1, NULL, parents, commit_sha1, NULL)) {
 		printf("%s\n", sha1_to_hex(commit_sha1));
 		return 0;
 	}
diff --git a/builtin/commit.c b/builtin/commit.c
index 2bb30c0..6b4c678 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, parents, commit_sha1,
+	if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
 			fmt_ident(author_name, author_email, author_date,
 				IDENT_ERROR_ON_NO_NAME))) {
 		rollback_index_files();
diff --git a/builtin/merge.c b/builtin/merge.c
index 37ce4f5..8745b54 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -779,7 +779,7 @@ static int merge_trivial(void)
 	parent->next = ...
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index 7121631..258d3fb 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "subtree.h"
 
 int save_commit_buffer = 1;
 
@@ -858,5 +859,12 @@ int commit_tree(const char *msg, unsigned char *tree, unsigned char *base_tree,
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
+
+	if (core_subtree && !result) {
+		unsigned char subtree_commit[20];
+		memcpy(subtree_commit, ret, 20);
+		result = subtree_export(subtree_commit, base_tree, ret);
+	}
+
 	return result;
 }
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/commit.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 6b4c678..c551d72 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
+	if (commit_tree(sb.buf, active_cache_tree->sha1,
+			parents ? parents->item->object.sha1 : NULL,
+			parents, commit_sha1,
 			fmt_ident(author_name, author_email, author_date,
 				IDENT_ERROR_ON_NO_NAME))) {
 		rollback_index_files();
-- 
1.7.1.rc1.69.g24c2f7

--

From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
Date: Saturday, July 31, 2010 - 9:18 am

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/send-pack.c |    2 ++
 upload-pack.c       |    3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 481602d..fb1ad2b 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -53,6 +53,8 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 	int i;
 
 	i = 4;
+	if (core_subtree)
+		args->use_thin_pack = 0;
 	if (args->use_thin_pack)
 		argv[i++] = "--thin";
 	if (args->use_ofs_delta)
diff --git a/upload-pack.c b/upload-pack.c
index 9b6710a..c65a3cb 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -581,6 +581,9 @@ static void receive_needs(void)
 	if (!use_sideband && daemon_mode)
 		no_progress = 1;
 
+	if (core_subtree)
+		use_thin_pack = 0;
+
 	if (depth == 0 && shallows.nr == 0)
 		return;
 	if (depth > 0) {
-- 
1.7.1.rc1.69.g24c2f7

--

From: Sverre Rabbelier
Date: Saturday, July 31, 2010 - 9:14 pm

Heya,



Can they be combined to create the fabled narrow checkout?

-- 
Cheers,

Sverre Rabbelier
--

From: Nguyen Thai Ngoc Duy
Date: Saturday, July 31, 2010 - 11:58 pm

Yes. For the record, --subtree=Documentation/ with --depth=1 made a pack of 5MB.
-- 
Duy
--

From: Sverre Rabbelier
Date: Sunday, August 1, 2010 - 1:05 pm

Heya,


I hope everybody is paying attention to these patches then! :)

-- 
Cheers,

Sverre Rabbelier
--

From: Elijah Newren
Date: Sunday, August 1, 2010 - 10:18 pm

Hi,


Very nice, it's awesome you're working on this.  I'm of the same
opinion that Shawn stated earlier, namely that I don't like the route
of rewriting commits on the fly like this (more on that later), but
it's really cool to see some ideas being tried and pushed to their

I tried it out, but I seem to be doing something wrong.  I applied
your patches to current master, and tried the following -- am I doing
something wrong or omitting any important steps?

$ git --version
git version 1.7.2.1.22.g236df

$ git clone file://$(pwd)/git fullclone
Cloning into fullclone...
warning: templates not found /home/newren/share/git-core/templates
remote: Counting objects: 96220, done.
remote: Compressing objects: 100% (24925/24925), done.
remote: Total 96220 (delta 70575), reused 95687 (delta 70236)
Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done.
Resolving deltas: 100% (70575/70575), done.
fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc

$ git clone --subtree=Documentation/ file://$(pwd)/git docclone
Cloning into docclone...
warning: templates not found /home/newren/share/git-core/templates
fatal: The remote end hung up unexpectedly
fatal: early EOF

58 MB for full repo?  What are you counting?  For me, I get 25M:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ ls -lh git/.git/objects/pack/*.pack
-r--r--r--. 1 newren newren 25M 2010-08-01 18:05
git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack

Are you counting the full checkout too or something?  If so, that
varies very wildly between systems, making it hard to compare numbers.
 (For me, 'du -hs git/' returns 44 MB.)  I'd like to be able to
duplicate your numbers and investigate further.  It seems to me that

I think it's a pretty nifty hack.  It's fun to see.  :-)  However, I
do have a number of reservations about the general strategy:  As
mentioned earlier, I'm not sure I like the on-the-fly commit
rewriting, as mentioned by Shawn in your ...
From: Nguyen Thai Ngoc Duy
Date: Monday, August 2, 2010 - 12:10 am

This one looks like the unintialized case you pointed out in

Not sure. Does file:// use receive-pack/upload-pack? I tested it over

It's my git.git, probably has more topic branches plus junk stuff. If
you are only interested in numbers, playing with git pack-objects is
enough. You need changes in list-objects.c and builtin/pack-objects.c,
then you can

git pack-objects --stdout --subtree=foo/ > temp.pack


And it's also fun to try. I'd like to try it on larger repos but I


I agree. Being able to fetch from an incomplete repo is very nice.
Though I admit I don't know how to do it. I think sparse clone would

Look forward to see sparse clone realized. Although I think that would
be painful :-)
-- 
Duy
--

From: Nguyen Thai Ngoc Duy
Date: Monday, August 2, 2010 - 3:55 pm

My number 24MB was incorrect because process_tree() leaked too many
blobs. It should have been 16MB. Anyway I have updated my series and
put it here (to spam git mailing less)

http://repo.or.cz/w/git/pclouds.git/shortlog/refs/heads/subtree
(caveat: constantly rebased tree)

if you still want to play with it. For number lovers, fetching only
Documentation from linux-2.6.git took 94MB (full repo 366MB). Yeah
Documentation was an easy target.
-- 
Duy
--

Previous thread: Re: error commiting in Git by Daniel França on Saturday, July 31, 2010 - 3:25 pm. (5 messages)

Next thread: [PATCH/RFC v2] Documentation: flesh out “git pull” description by Jonathan Nieder on Saturday, July 31, 2010 - 7:54 pm. (6 messages)