Something to play with so we can evaluate which is the best strategy
for non-full clone (or whatever you call it).
The idea is the same: pack only enough to access a subtree, rewrite
commits at client side, rewrite again when pushing. However I put
git-replace into the mix, so at least commit SHA-1 looks as same as from
upstream. git-subtree is not needed (although it's still an option)
With this, I can clone Documentaion/ from git.git, update and push. I
haven't tested it further. Space consumption is 24MB (58MB for full
repo). Not really impressive, but if one truely cares about disk
space, he/she should also use shallow clone.
Performance is impacted, due to bulk commit replacement. There is a
split second delay for every command. It's the price of replacing 24k
commits every time. I think the delay could be improved a little bit
(caching or mmap..)
Rewriting commits at clone takes time too. Doing individual object
writing takes lots of space and time. I put all new objects directly
to a pack now. Rewriting time now becomes quite acceptable (a few
seconds). Although deep subtree/repo may take longer. Rewriting on
demand can be considered in such cases.
Repo-care commands like fsck, repack, gc are left out for now.
Finally, it's more of a hack just to see how far I can go. It will
break things.
Nguyễn Thái Ngọc Duy (16):
Add core.subtree
list-objects: limit traversing within the given subtree if
core.subtree is set
parse_object: keep sha1 even when parsing replaced one
Allow to invalidate a commit in in-memory object store
Hook up replace-object to allow bulk commit replacement
upload-pack: use a separate variable to control whether internal
rev-list is used
upload-pack: support subtree pack
fetch-pack: support --subtree
subtree: rewrite incoming commits
clone: support subtree clone with parameter --subtree
pack-objects: add --subtree (for pushing)
subtree: rewriting outgoing commits
Update commit_tree() interface to take ...This variable contains the subtree. With core_subtree being non-empty
behavior of git may be totally different.
Perhaps this should not stay in .git/config, rather .git/subtree
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
cache.h | 1 +
config.c | 3 +++
environment.c | 2 ++
3 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/cache.h b/cache.h
index c9fa3df..04ebe6e 100644
--- a/cache.h
+++ b/cache.h
@@ -551,6 +551,7 @@ extern int read_replace_refs;
extern int fsync_object_files;
extern int core_preload_index;
extern int core_apply_sparse_checkout;
+extern const char *core_subtree;
enum safe_crlf {
SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index cdcf583..86ded29 100644
--- a/config.c
+++ b/config.c
@@ -595,6 +595,9 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.subtree"))
+ return git_config_string(&core_subtree, var, value);
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
}
diff --git a/environment.c b/environment.c
index 83d38d3..1365dd0 100644
--- a/environment.c
+++ b/environment.c
@@ -57,6 +57,8 @@ int core_apply_sparse_checkout;
/* Parallel index stat data preload? */
int core_preload_index = 0;
+const char *core_subtree;
+
/* This is set by setup_git_dir_gently() and/or git_default_config() */
char *git_work_tree_cfg;
static char *work_tree;
--
1.7.1.rc1.69.g24c2f7
--
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
list-objects.c | 23 +++++++++++++++++------
1 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/list-objects.c b/list-objects.c
index 8953548..1b25b54 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -61,12 +61,15 @@ static void process_tree(struct rev_info *revs,
struct tree *tree,
show_object_fn show,
struct name_path *path,
- const char *name)
+ const char *name,
+ const char *subtree)
{
struct object *obj = &tree->object;
struct tree_desc desc;
struct name_entry entry;
struct name_path me;
+ const char *slash;
+ int subtree_len;
if (!revs->tree_objects)
return;
@@ -82,13 +85,21 @@ static void process_tree(struct rev_info *revs,
me.elem = name;
me.elem_len = strlen(name);
+ if (subtree) {
+ slash = strchr(subtree, '/');
+ subtree_len = slash ? slash - subtree : strlen(subtree);
+ }
+
init_tree_desc(&desc, tree->buffer, tree->size);
while (tree_entry(&desc, &entry)) {
- if (S_ISDIR(entry.mode))
- process_tree(revs,
- lookup_tree(entry.sha1),
- show, &me, entry.path);
+ if (S_ISDIR(entry.mode)) {
+ if (!subtree || !strncmp(entry.path, subtree, subtree_len))
+ process_tree(revs,
+ lookup_tree(entry.sha1),
+ show, &me, entry.path,
+ slash && slash[1] ? slash+1 : NULL);
+ }
else if (S_ISGITLINK(entry.mode))
process_gitlink(revs, entry.sha1,
show, &me, entry.path);
@@ -164,7 +175,7 @@ void traverse_commit_list(struct rev_info *revs,
}
if (obj->type == OBJ_TREE) {
process_tree(revs, (struct tree *)obj, show_object,
- NULL, name);
+ NULL, name, core_subtree);
continue;
}
if (obj->type == OBJ_BLOB) {
--
1.7.1.rc1.69.g24c2f7
--
> + int subtree_len; Shouldn't that be size_t? strlen returns size_t, and strncmp expects size_t, not int. --
Hmm.. yeah. The compiler didn't warn me. Anyway subtree_len should be small enough (i.e. < PATH_MAX) that type does not really matters. -- Duy --
Yes. Thanks. Will fix. -- Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- object.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/object.c b/object.c index 277b3dd..7adfda7 100644 --- a/object.c +++ b/object.c @@ -199,7 +199,7 @@ struct object *parse_object(const unsigned char *sha1) return NULL; } - obj = parse_object_buffer(repl, type, size, buffer, &eaten); + obj = parse_object_buffer(sha1, type, size, buffer, &eaten); if (!eaten) free(buffer); return obj; -- 1.7.1.rc1.69.g24c2f7 --
This is needed if replacing object happens at run time.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
commit.c | 15 +++++++++++++++
commit.h | 2 ++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/commit.c b/commit.c
index e9b0750..d1e30b2 100644
--- a/commit.c
+++ b/commit.c
@@ -315,6 +315,21 @@ int parse_commit(struct commit *item)
return ret;
}
+int invalidate_commit(struct commit *item)
+{
+ if (!item)
+ return -1;
+
+ if (item->object.parsed) {
+ item->object.parsed = 0;
+ if (item->buffer) {
+ free(item->buffer);
+ item->buffer = NULL;
+ }
+ }
+ return 0;
+}
+
struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p)
{
struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
diff --git a/commit.h b/commit.h
index eb2b8ac..d8c01ea 100644
--- a/commit.h
+++ b/commit.h
@@ -41,6 +41,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size);
int parse_commit(struct commit *item);
+int invalidate_commit(struct commit *item);
+
struct commit_list * commit_list_insert(struct commit *item, struct commit_list **list_p);
unsigned commit_list_count(const struct commit_list *l);
struct commit_list * insert_by_date(struct commit *item, struct commit_list **list);
--
1.7.1.rc1.69.g24c2f7
--
$GIT_DIR/subtree contains commit mapping in subtree mode. It's quite
large that putting it in $GIT_DIR/refs/replace may slow git down
significantly. Even with this, there will be a split second delay for
every git command.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Makefile | 2 +
replace_object.c | 5 ++
subtree.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
subtree.h | 2 +
4 files changed, 126 insertions(+), 0 deletions(-)
create mode 100644 subtree.c
create mode 100644 subtree.h
diff --git a/Makefile b/Makefile
index f33648d..0d13538 100644
--- a/Makefile
+++ b/Makefile
@@ -525,6 +525,7 @@ LIB_H += sigchain.h
LIB_H += strbuf.h
LIB_H += string-list.h
LIB_H += submodule.h
+LIB_H += subtree.h
LIB_H += tag.h
LIB_H += transport.h
LIB_H += tree.h
@@ -629,6 +630,7 @@ LIB_OBJS += sigchain.o
LIB_OBJS += strbuf.o
LIB_OBJS += string-list.o
LIB_OBJS += submodule.o
+LIB_OBJS += subtree.o
LIB_OBJS += symlinks.o
LIB_OBJS += tag.o
LIB_OBJS += trace.o
diff --git a/replace_object.c b/replace_object.c
index eb59604..5fe4099 100644
--- a/replace_object.c
+++ b/replace_object.c
@@ -1,6 +1,7 @@
#include "cache.h"
#include "sha1-lookup.h"
#include "refs.h"
+#include "subtree.h"
static struct replace_object {
unsigned char sha1[2][20];
@@ -82,6 +83,7 @@ static void prepare_replace_object(void)
if (replace_object_prepared)
return;
+ prepare_subtree_commit();
for_each_replace_ref(register_replace_ref, NULL);
replace_object_prepared = 1;
}
@@ -99,6 +101,9 @@ const unsigned char *lookup_replace_object(const unsigned char *sha1)
prepare_replace_object();
+ if (core_subtree)
+ cur = subtree_lookup_object(cur);
+
/* Try to recursively replace the object */
do {
if (--depth < 0)
diff --git a/subtree.c b/subtree.c
new file mode 100644
index 0000000..601d827
--- /dev/null
+++ b/subtree.c
@@ -0,0 +1,117 @@
+#include "cache.h"
+#include ...I really do not like the use of "replace" for the purpose of narrow clones. While "replace" is about fixing a mistake by tweaking trees, a desire to have a narrow clone at this moment is _not_ a mistake. You may want to have wider or full clone of the project tomorrow. You may want to push the result of committing on top of such a narrowed clone back to a full repository. My gut feeling is that that use of "replace" to stub out the objects that you do not currently have would make it a nightmare when you would want to widen (especially to widen over the wire while pushing into a full repository on the other end), although I haven't looked at all the patches in the series. Can you back up a bit and give us a high-level overview of how various operations in a narrowed clone should work, and how you achieve that design goal? Let's take an example of starting from git.git and narrow-clone only its Documentation/ (as you seem to have used as a guinea-pig) subdirectory. For the sake of simplicity, let's say the upstream project has only one commit. One plausible approach would be to have the commit, its top level tree object, its Documentation/ tree object and all the blobs below that level, while other blobs and trees that are reachable from the top level tree object are left missing, but somehow are marked so that fsck would think they are OK to be missing. Your worktree would obviously be narrowed to the same Documentation/ area, and unlike the narrow checkout codepath, you do not widen on demand (unless you automatically fetch missing parts of the tree, which I do not think you should do by default to help people who work while at 30,000ft). Instead, any operation that tries to modify outside the "subtree" area should fail. When you build a commit that represents a Documentation patch on top of such a narrowed clone, because you have a full tree of Documentation/ area, you can come up with the updated tree object for that part of the project. If "subtree" mode (aka ...
Indeed. My intention was "hey this repo is too big, I only need some pieces of it. Let me grab something and do my work. (Then throw away the cloned repo)". It's best used together with shallow clone to give low download/disk space, and a minimum tree to fix something quick. I'm not really sure if such repos are sustainable in long run. And no I did not want to widen/narrow the tree (as it was to be throw away tree). Now thinking of widening. The way I do narrow clone is quite similar with shallow clone. I hope the way shallow clone is deepen can Operations work as normal (as the incomplete clone is augmented to become "normal"). In order to make it looks normal, every time a new commit comes in (either from another repository, or user creates a new one), the commit needs to be processed/replaced, so that the repo Changes outside the subtree area are dropped on the floor now, not This is where git-replace comes in. I do not want to deal with full flat index. Giving pointers to missing objects may make git commands nervous. I rewrite the commit so that now it only has Documentation/ and nothing else (which I have all needed objects). The index is narrowed too. Because the index (even narrowed) is complete (i.e. all entries reachable), most operations should work. Then, to hide the helper commit from user, I replace the original (full) commit with this new commit. So from outside git sees SHA-1 of the original commit, but its content is from the helper one. These helper commits guarantee git won't reach out for missing objects. It's a trade off. Doing full index requires much more effort into git. Using "git-subtree split", while free git developers to do other things, might be inconvenient for users (without server support, full repo must be downloaded, replaced SHA-1 from git-subtree cannot be That's just a part of the story. Repository integrity is a prerequisite in git from the beginning. git-merge operates directly on trees so cache-tree won't help much. ...
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
upload-pack.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/upload-pack.c b/upload-pack.c
index dc464d7..e432e83 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,8 +160,9 @@ static void create_pack_file(void)
ssize_t sz;
const char *argv[10];
int arg = 0;
+ int internal_rev_list = shallow_nr;
- if (shallow_nr) {
+ if (internal_rev_list) {
memset(&rev_list, 0, sizeof(rev_list));
rev_list.proc = do_rev_list;
rev_list.out = -1;
@@ -187,7 +188,7 @@ static void create_pack_file(void)
argv[arg++] = NULL;
memset(&pack_objects, 0, sizeof(pack_objects));
- pack_objects.in = shallow_nr ? rev_list.out : -1;
+ pack_objects.in = internal_rev_list ? rev_list.out : -1;
pack_objects.out = -1;
pack_objects.err = -1;
pack_objects.git_cmd = 1;
@@ -197,7 +198,7 @@ static void create_pack_file(void)
die("git upload-pack: unable to fork git-pack-objects");
/* pass on revisions we (don't) want */
- if (!shallow_nr) {
+ if (!internal_rev_list) {
FILE *pipe_fd = xfdopen(pack_objects.in, "w");
if (!create_full_pack) {
int i;
@@ -311,7 +312,7 @@ static void create_pack_file(void)
error("git upload-pack: git-pack-objects died with error.");
goto fail;
}
- if (shallow_nr && finish_async(&rev_list))
+ if (internal_rev_list && finish_async(&rev_list))
goto fail; /* error was already reported */
/* flush the data */
--
1.7.1.rc1.69.g24c2f7
--
Hi, <snip> I've got the exact same changes in one of my in-progress-patches in my sparse-clone branch. That is, other than the variable name, but I like yours better. Needless to say, I agree with this change. :-) --
With core_subtree turned on (capability "subtree", request "subtree"
from fetch-pack), traverse_commit_list will be in "subtree mode",
which will not go farther than the given subtree.
As the result, the pack is broken be design, only contains enough
blobs/trees/commits to reach the given subtree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
upload-pack.c | 18 ++++++++++++++++--
1 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/upload-pack.c b/upload-pack.c
index e432e83..9b6710a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -160,7 +160,7 @@ static void create_pack_file(void)
ssize_t sz;
const char *argv[10];
int arg = 0;
- int internal_rev_list = shallow_nr;
+ int internal_rev_list = shallow_nr || core_subtree;
if (internal_rev_list) {
memset(&rev_list, 0, sizeof(rev_list));
@@ -505,6 +505,20 @@ static void receive_needs(void)
if (debug_fd)
write_in_full(debug_fd, line, len);
+ if (!prefixcmp(line, "subtree ")) {
+ int len;
+ char *subtree;
+ if (core_subtree)
+ die("sorry, only one subtree supported");
+ len = strlen(line+8);
+ subtree = malloc(len+1);
+ memcpy(subtree, line+8, len-1);
+ subtree[len-1] = '\0'; /* \n */
+ if (subtree[len-2] != '/')
+ die("subtree request must end with a slash");
+ core_subtree = subtree;
+ continue;
+ }
if (!prefixcmp(line, "shallow ")) {
unsigned char sha1[20];
struct object *object;
@@ -624,7 +638,7 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
{
static const char *capabilities = "multi_ack thin-pack side-band"
" side-band-64k ofs-delta shallow no-progress"
- " include-tag multi_ack_detailed";
+ " include-tag multi_ack_detailed subtree";
struct object *o = parse_object(sha1);
if (!o)
--
1.7.1.rc1.69.g24c2f7
--
2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>: I'm not sure users would understand this error message; perhaps something more like "Fetching/cloning from a subtree-sparse repository not supported"?
This options requires subtree-aware upload-pack. It simply pass the
subtree from command line (or from $GIT_DIR/config) to upload-pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/fetch-pack.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dbd8b7b..7460ecc 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -237,6 +237,8 @@ static int find_common(int fd[2], unsigned char *result_sha1,
for_each_ref(rev_list_insert_ref, NULL);
fetching = 0;
+ if (core_subtree)
+ packet_buf_write(&req_buf, "subtree %s\n", core_subtree);
for ( ; refs ; refs = refs->next) {
unsigned char *remote = refs->old_sha1;
const char *remote_hex;
@@ -692,6 +694,8 @@ static struct ref *do_fetch_pack(int fd[2],
if (is_repository_shallow() && !server_supports("shallow"))
die("Server does not support shallow clients");
+ if (core_subtree && !server_supports("subtree"))
+ die("Server does not support subtree");
if (server_supports("multi_ack_detailed")) {
if (args.verbose)
fprintf(stderr, "Server supports multi_ack_detailed\n");
@@ -860,6 +864,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
pack_lockfile_ptr = &pack_lockfile;
continue;
}
+ if (!prefixcmp(arg, "--subtree=")) {
+ core_subtree = arg + 10;
+ continue;
+ }
usage(fetch_pack_usage);
}
dest = (char *)arg;
--
1.7.1.rc1.69.g24c2f7
--
This adds the main function, subtree_import(), which is intended to be
used by "git clone".
Because subtree packs are not complete. They are barely usable. Git
client will cry out missing objects here and there... Theortically,
client code could be adapted to only look for objects within
subtree. That was painful to try.
Alternatively, subtree_import() rewrites commits to have only the
specified subtree, sealing all broken path. Git client now happily
works with these new commits.
However, users might not, because it's different commit, different
SHA-1. They can't use those SHA-1 to communicate within their team. To
work around this, all original commits are replaced by new commits
using git-replace.
Of course this is still not perfect. Users may be able to send SHA-1
around, which is consistent. They may not do the same with tree SHA-1.
Rewriting/replacing commits takes time and space. For replacing _all_
commits, the current replace mechanism is not suitable, which is why
subtree_lookup_object() was introduced in previous patches.
For rewriting, writing a huge number of objects is slow. So
subtree_import() builds a pack for all new objects. These packs are
not optimized. But it does reduce wait time for rewriting.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
subtree.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
subtree.h | 1 +
2 files changed, 245 insertions(+), 0 deletions(-)
diff --git a/subtree.c b/subtree.c
index 601d827..8c075be 100644
--- a/subtree.c
+++ b/subtree.c
@@ -115,3 +115,247 @@ const unsigned char *subtree_lookup_object(const unsigned char *sha1)
return subtree_commit[pos]->sha1[1];
return sha1;
}
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
+ z_stream stream;
+ void *in, *out;
+ unsigned long maxsize;
+
+ memset(&stream, 0, sizeof(stream));
+ deflateInit(&stream, Z_DEFAULT_COMPRESSION);
+ maxsize = deflateBound(&stream, size);
+
+ in = *pptr;
+ out = ...Hi, It may have been painful, but personally I think it's still the right way to do it. Of course, that's a pretty easy thing for me to say, since you're pretty far ahead of me and I haven't felt your pain yet. Maybe I'll change my mind after trying it for a while, but I'm not My compiler complains that you didn't typecast the return value from strlen to an int. --
With all the preparation work, here comes --subtree. So clone away!
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/clone.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/builtin/clone.c b/builtin/clone.c
index efb1e6f..43bc34b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
#include "branch.h"
#include "remote.h"
#include "run-command.h"
+#include "subtree.h"
/*
* Overall FIXMEs:
@@ -78,6 +79,8 @@ static struct option builtin_clone_options[] = {
"path to git-upload-pack on the remote"),
OPT_STRING(0, "depth", &option_depth, "depth",
"create a shallow clone of that depth"),
+ OPT_STRING(0, "subtree", &core_subtree, "subtree",
+ "subtree clone"),
OPT_END()
};
@@ -515,6 +518,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
strbuf_reset(&value);
if (path && !is_bundle) {
+ if (core_subtree)
+ die("Local subtree clone does not work (now)");
refs = clone_local(path, git_dir);
mapped_refs = wanted_peer_refs(refs, refspec);
} else {
@@ -623,6 +628,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
transport_disconnect(transport);
}
+ if (core_subtree) {
+ git_config_set("core.subtree", core_subtree);
+ subtree_import();
+ }
+
if (!option_no_checkout) {
struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
struct unpack_trees_options opts;
--
1.7.1.rc1.69.g24c2f7
--
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/pack-objects.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0e81673..5d7b277 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2277,6 +2277,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
grafts_replace_parents = 0;
continue;
}
+ if (!prefixcmp(arg, "--subtree=")) {
+ core_subtree = arg + 10;
+ continue;
+ }
usage(pack_usage);
}
--
1.7.1.rc1.69.g24c2f7
--
Which is exactly the opposite of rewriting incoming commits.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
subtree.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
subtree.h | 1 +
2 files changed, 174 insertions(+), 0 deletions(-)
diff --git a/subtree.c b/subtree.c
index 8c075be..739ff5f 100644
--- a/subtree.c
+++ b/subtree.c
@@ -359,3 +359,176 @@ void subtree_import()
if (revs.pending.nr)
free(revs.pending.objects);
}
+
+/*
+ * The opposite of narrow_tree(). Put the subtree back to the original tree.
+ */
+static int widen_tree(const unsigned char *sha1,
+ unsigned char *newsha1,
+ const unsigned char *subtree_sha1,
+ const char *prefix)
+{
+ struct tree_desc desc;
+ struct name_entry entry;
+ struct strbuf buffer;
+ const char *slash;
+ int subtree_len;
+ enum object_type type;
+ unsigned long size;
+ char *tree;
+
+ slash = strchr(prefix, '/');
+ subtree_len = slash ? slash - prefix : strlen(prefix);
+
+ tree = read_sha1_file(sha1, &type, &size);
+ if (type != OBJ_TREE)
+ die("%s is not a tree", sha1_to_hex(sha1));
+
+ init_tree_desc(&desc, tree, size);
+ strbuf_init(&buffer, 8192);
+ while (tree_entry(&desc, &entry)) {
+ strbuf_addf(&buffer, "%o %.*s%c", entry.mode, strlen(entry.path), entry.path, '\0');
+
+ if (S_ISDIR(entry.mode) &&
+ subtree_len == strlen(entry.path) &&
+ !strncmp(entry.path, prefix, subtree_len)) {
+ unsigned char newtree_sha1[20];
+
+ if (slash && slash[1]) /* trailing slash does not count */
+ widen_tree(entry.sha1, newtree_sha1, subtree_sha1,
+ prefix+subtree_len+1);
+ else
+ /* replace the tree */
+ memcpy(newtree_sha1, subtree_sha1, 20);
+
+ strbuf_add(&buffer, newtree_sha1, 20);
+ }
+ else
+ strbuf_add(&buffer, entry.sha1, 20);
+ }
+ free(tree);
+
+ if (write_sha1_file(buffer.buf, buffer.len, tree_type, newsha1)) {
+ error("Could not write replaced tree for %s", ...2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>: Again, gcc here complains that "subtree.c:390: warning: field precision should have type ‘int’, but argument 4 has type ‘size_t’" -- typecast the return value of strlen to int? --
In subtree mode, you work on a narrowed trees. You make narrowed
commits. If you want to push upstream, you would need to put your
updated subtree back to the full tree again. Otherwise upstream would
complain you delete all trees but your subtree, not good.
In order to do that, commit_tree() now takes the base tree SHA-1. With
that, it can create upstream-compatible commits. It does not now,
though.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/commit-tree.c | 2 +-
builtin/commit.c | 2 +-
builtin/merge.c | 4 ++--
builtin/notes.c | 2 +-
commit.c | 2 +-
commit.h | 2 +-
notes-cache.c | 2 +-
7 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 87f0591..88a6833 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
if (strbuf_read(&buffer, 0, 0) < 0)
die_errno("git commit-tree: failed to read");
- if (!commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) {
+ if (!commit_tree(buffer.buf, tree_sha1, NULL, parents, commit_sha1, NULL)) {
printf("%s\n", sha1_to_hex(commit_sha1));
return 0;
}
diff --git a/builtin/commit.c b/builtin/commit.c
index 2bb30c0..6b4c678 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
exit(1);
}
- if (commit_tree(sb.buf, active_cache_tree->sha1, parents, commit_sha1,
+ if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
fmt_ident(author_name, author_email, author_date,
IDENT_ERROR_ON_NO_NAME))) {
rollback_index_files();
diff --git a/builtin/merge.c b/builtin/merge.c
index 37ce4f5..8745b54 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -779,7 +779,7 @@ static int merge_trivial(void)
parent->next = ...Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
commit.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/commit.c b/commit.c
index 7121631..258d3fb 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
#include "diff.h"
#include "revision.h"
#include "notes.h"
+#include "subtree.h"
int save_commit_buffer = 1;
@@ -858,5 +859,12 @@ int commit_tree(const char *msg, unsigned char *tree, unsigned char *base_tree,
result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
strbuf_release(&buffer);
+
+ if (core_subtree && !result) {
+ unsigned char subtree_commit[20];
+ memcpy(subtree_commit, ret, 20);
+ result = subtree_export(subtree_commit, base_tree, ret);
+ }
+
return result;
}
--
1.7.1.rc1.69.g24c2f7
--
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/commit.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index 6b4c678..c551d72 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1350,7 +1350,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
exit(1);
}
- if (commit_tree(sb.buf, active_cache_tree->sha1, NULL, parents, commit_sha1,
+ if (commit_tree(sb.buf, active_cache_tree->sha1,
+ parents ? parents->item->object.sha1 : NULL,
+ parents, commit_sha1,
fmt_ident(author_name, author_email, author_date,
IDENT_ERROR_ON_NO_NAME))) {
rollback_index_files();
--
1.7.1.rc1.69.g24c2f7
--
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/send-pack.c | 2 ++
upload-pack.c | 3 +++
2 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 481602d..fb1ad2b 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -53,6 +53,8 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
int i;
i = 4;
+ if (core_subtree)
+ args->use_thin_pack = 0;
if (args->use_thin_pack)
argv[i++] = "--thin";
if (args->use_ofs_delta)
diff --git a/upload-pack.c b/upload-pack.c
index 9b6710a..c65a3cb 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -581,6 +581,9 @@ static void receive_needs(void)
if (!use_sideband && daemon_mode)
no_progress = 1;
+ if (core_subtree)
+ use_thin_pack = 0;
+
if (depth == 0 && shallows.nr == 0)
return;
if (depth > 0) {
--
1.7.1.rc1.69.g24c2f7
--
Heya, Can they be combined to create the fabled narrow checkout? -- Cheers, Sverre Rabbelier --
Yes. For the record, --subtree=Documentation/ with --depth=1 made a pack of 5MB. -- Duy --
Heya, I hope everybody is paying attention to these patches then! :) -- Cheers, Sverre Rabbelier --
Hi, Very nice, it's awesome you're working on this. I'm of the same opinion that Shawn stated earlier, namely that I don't like the route of rewriting commits on the fly like this (more on that later), but it's really cool to see some ideas being tried and pushed to their I tried it out, but I seem to be doing something wrong. I applied your patches to current master, and tried the following -- am I doing something wrong or omitting any important steps? $ git --version git version 1.7.2.1.22.g236df $ git clone file://$(pwd)/git fullclone Cloning into fullclone... warning: templates not found /home/newren/share/git-core/templates remote: Counting objects: 96220, done. remote: Compressing objects: 100% (24925/24925), done. remote: Total 96220 (delta 70575), reused 95687 (delta 70236) Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done. Resolving deltas: 100% (70575/70575), done. fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc $ git clone --subtree=Documentation/ file://$(pwd)/git docclone Cloning into docclone... warning: templates not found /home/newren/share/git-core/templates fatal: The remote end hung up unexpectedly fatal: early EOF 58 MB for full repo? What are you counting? For me, I get 25M: $ git clone git://git.kernel.org/pub/scm/git/git.git $ ls -lh git/.git/objects/pack/*.pack -r--r--r--. 1 newren newren 25M 2010-08-01 18:05 git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack Are you counting the full checkout too or something? If so, that varies very wildly between systems, making it hard to compare numbers. (For me, 'du -hs git/' returns 44 MB.) I'd like to be able to duplicate your numbers and investigate further. It seems to me that I think it's a pretty nifty hack. It's fun to see. :-) However, I do have a number of reservations about the general strategy: As mentioned earlier, I'm not sure I like the on-the-fly commit rewriting, as mentioned by Shawn in your ...
This one looks like the unintialized case you pointed out in Not sure. Does file:// use receive-pack/upload-pack? I tested it over It's my git.git, probably has more topic branches plus junk stuff. If you are only interested in numbers, playing with git pack-objects is enough. You need changes in list-objects.c and builtin/pack-objects.c, then you can git pack-objects --stdout --subtree=foo/ > temp.pack And it's also fun to try. I'd like to try it on larger repos but I I agree. Being able to fetch from an incomplete repo is very nice. Though I admit I don't know how to do it. I think sparse clone would Look forward to see sparse clone realized. Although I think that would be painful :-) -- Duy --
My number 24MB was incorrect because process_tree() leaked too many blobs. It should have been 16MB. Anyway I have updated my series and put it here (to spam git mailing less) http://repo.or.cz/w/git/pclouds.git/shortlog/refs/heads/subtree (caveat: constantly rebased tree) if you still want to play with it. For number lovers, fetching only Documentation from linux-2.6.git took 94MB (full repo 366MB). Yeah Documentation was an easy target. -- Duy --
