Re: [RFC PATCH] Reduce the number of connects when fetching

Previous thread: [PATCH] Build in ls-remote by Daniel Barkalow on Sunday, November 4, 2007 - 1:51 pm. (2 messages)

Next thread: [PATCH] Rearrange git-format-patch synopsis to improve clarity. by David Symonds on Sunday, November 4, 2007 - 3:27 pm. (7 messages)
From: Daniel Barkalow
Date: Sunday, November 4, 2007 - 2:28 pm

This shares the connection between getting the remote ref list and
getting objects in the first batch. (A second connection is still used
to follow tags)

It passes all of the tests, and should be fine, but I don't entirely 
understand the git protocol, so I'd like people who know it better to 
check this over. In particular, I don't know if there's a way to have the 
connection end up in a state where objects for more refs can be requested 
after some refs have been requested and the resulting objects read.

The idea is to keep the open connection in the data for the transport in 
between getting the list of refs and doing anything further. This 
therefore moves the connection-handling aspects outside of fetch-pack() 
and handles them primarily in transport.c.

This is on top of current next.
---
 builtin-fetch-pack.c |   74 ++++++++++++++++++++++++++-----------------------
 fetch-pack.h         |    2 +
 transport.c          |   41 +++++++++++++++++++--------
 3 files changed, 70 insertions(+), 47 deletions(-)

diff --git a/builtin-fetch-pack.c b/builtin-fetch-pack.c
index 862652b..7783c05 100644
--- a/builtin-fetch-pack.c
+++ b/builtin-fetch-pack.c
@@ -7,6 +7,7 @@
 #include "pack.h"
 #include "sideband.h"
 #include "fetch-pack.h"
+#include "remote.h"
 #include "run-command.h"
 
 static int transfer_unpack_limit = -1;
@@ -558,14 +559,14 @@ static int get_pack(int xd[2], char **pack_lockfile)
 }
 
 static struct ref *do_fetch_pack(int fd[2],
+		const struct ref *orig_ref,
 		int nr_match,
 		char **match,
 		char **pack_lockfile)
 {
-	struct ref *ref;
+	struct ref *ref = copy_ref_list(orig_ref);
 	unsigned char sha1[20];
 
-	get_remote_heads(fd[0], &ref, 0, NULL, 0);
 	if (is_repository_shallow() && !server_supports("shallow"))
 		die("Server does not support shallow clients");
 	if (server_supports("multi_ack")) {
@@ -583,10 +584,6 @@ static struct ref *do_fetch_pack(int fd[2],
 			fprintf(stderr, "Server supports side-band\n");
 		use_sideband = 1;
 ...
From: Junio C Hamano
Date: Monday, November 5, 2007 - 6:51 pm

The idea is very sound.  The scripted version of git-fetch used
a separate ls-remote only because peek-remote and fetch-pack

The upload-pack protocol goes "S: here are what I have, C: I
want these, C: I have these, S: ok, continue, C: I have these,
S: ok, continue, C: I have these, S: ok, I've heard enough, C:
done, S: packfile is here", so after packfile generation starts
there is nothing further the downloader can say.

Otherwise you would be able to do the tag following using the
same connection, but that is unfortunately not a case.
-

From: Daniel Barkalow
Date: Monday, November 5, 2007 - 8:04 pm

I figured that had to be the case, due to the way the protocol acts at the 

It would be nice if this could continue: "C: I also want these, S: ok, 
heard enough, C: done, S: another packfile is here"; we should be able to 
identify the end of the packfile on both ends to resume doing other 
things.

Or, maybe, "C: I also want these single objects, S: here's a thin pack of 
them", since it's exclusively tags pointing to objects we have just 
gotten.

	-Daniel
*This .sig left intentionally blank*
-

From: Junio C Hamano
Date: Monday, November 5, 2007 - 10:03 pm

You would need a protocol extension for this, as the protocol
defines all the remainder after the have-ack exchange to be
intended for unpack-objects, not just the data for a single pack
that immediately follows the exchange.  Mysteriously,
unpack-objects even has code to write out the remainder after
the pack data.

The protocol extension would probably need to depend on the
existence of sideband.  Make the sending side signal the end of
the pack data over the sideband after sending a pack, and then
both sides can go back to the "C: I want these too, S: Ok, here
it is" exchange.  You may optionally want to do another have-ack
exchange here, but for the purpose of tag following I suspect
there is no need for that.
-

From: Johannes Sixt
Date: Tuesday, November 6, 2007 - 1:13 am

How about:

S: here are what I have
C: I want these
C: want tags   <-- new
C: I have these
S: ok, continue
C: I have these
S: ok, continue
C: I have these
S: ok, these are the tags  <-- new
S: I've heard enough
C: done
S: packfile is here

The tags that the server provides are those (and only those[*]) that 
reference objects in the packfile that it's going to send.

[*] This way the client doesn't have to figure out which tags it wants; as a 
side-effect it won't accidentally fetch tags for objects that it happens to 
have in the repository, but aren't reachable from any ref (like what used to 
happen).

-- Hannes
-

From: Junio C Hamano
Date: Tuesday, November 6, 2007 - 1:34 am

Yes, but that shifts the burden to the sending side which is
always bad.  We want to make the client work as much as possible
when it is practical.


-

Previous thread: [PATCH] Build in ls-remote by Daniel Barkalow on Sunday, November 4, 2007 - 1:51 pm. (2 messages)

Next thread: [PATCH] Rearrange git-format-patch synopsis to improve clarity. by David Symonds on Sunday, November 4, 2007 - 3:27 pm. (7 messages)