[PATCH 5/6] http-fetch: Use index-pack rather than verify-pack to check packs

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Shawn O. Pearce
Date: Thursday, April 15, 2010 - 12:09 pm

To ensure we don't leave a corrupt pack file positioned as though
it were a valid pack file, run index-pack on the temporary pack
before we rename it to its final name.  If index-pack crashes out
when it discovers file corruption (e.g. GitHub's error HTML at the
end of the file), simply delete the temporary files to cleanup.

By waiting until the pack has been validated before we move it
to its final name, we eliminate a race condition where another
concurrent reader might try to access the pack at the same time
that we are still trying to verify its not corrupt.

Switching from verify-pack to index-pack is a change in behavior,
but it should turn out better for users.  The index-pack algorithm
tries to minimize disk seeks, as well as the number of times any
given object is inflated, by organizing its work along delta chains.
The verify-pack logic does not attempt to do this, thrashing the
delta base cache and the filesystem cache.

By recreating the index file locally, we also can automatically
upgrade from a v1 pack table of contents to v2.  This makes the
CRC32 data available for use during later repacks, even if the
server didn't have them on hand.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 cache.h               |    1 +
 http.c                |   42 ++++++++++++++++++++++++++++++++++--------
 http.h                |    1 -
 sha1_file.c           |   11 +++++++++--
 t/t5550-http-fetch.sh |   15 +++++++++++++++
 5 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 5eb0573..4150603 100644
--- a/cache.h
+++ b/cache.h
@@ -916,6 +916,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 
 extern void pack_report(void);
 extern int open_pack_index(struct packed_git *);
+extern void close_pack_index(struct packed_git *);
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *);
 extern void close_pack_windows(struct packed_git *);
 extern void unuse_pack(struct pack_window **);
diff --git a/http.c b/http.c
index 64e0c18..aa3e380 100644
--- a/http.c
+++ b/http.c
@@ -1,6 +1,7 @@
 #include "http.h"
 #include "pack.h"
 #include "sideband.h"
+#include "run-command.h"
 
 int data_received;
 int active_requests;
@@ -1000,11 +1001,13 @@ void release_http_pack_request(struct http_pack_request *preq)
 
 int finish_http_pack_request(struct http_pack_request *preq)
 {
-	int ret;
 	struct packed_git **lst;
 	struct packed_git *p = preq->target;
+	char *tmp_idx;
+	struct child_process ip;
+	const char *ip_argv[8];
 
-	p->pack_size = ftell(preq->packfile);
+	close_pack_index(p);
 	fclose(preq->packfile);
 	preq->packfile = NULL;
 	preq->slot->local = NULL;
@@ -1014,13 +1017,37 @@ int finish_http_pack_request(struct http_pack_request *preq)
 		lst = &((*lst)->next);
 	*lst = (*lst)->next;
 
-	ret = move_temp_to_file(preq->tmpfile, preq->filename);
-	if (ret)
-		return ret;
-	if (verify_pack(p))
+	tmp_idx = xstrdup(preq->tmpfile);
+	strcpy(tmp_idx + strlen(tmp_idx) - strlen(".pack.temp"),
+	       ".idx.temp");
+
+	ip_argv[0] = "index-pack";
+	ip_argv[1] = "-o";
+	ip_argv[2] = tmp_idx;
+	ip_argv[3] = preq->tmpfile;
+	ip_argv[4] = NULL;
+
+	memset(&ip, 0, sizeof(ip));
+	ip.argv = ip_argv;
+	ip.git_cmd = 1;
+	ip.no_stdin = 1;
+	ip.no_stdout = 1;
+
+	if (run_command(&ip)) {
+		unlink(preq->tmpfile);
+		unlink(tmp_idx);
+		free(tmp_idx);
 		return -1;
-	install_packed_git(p);
+	}
 
+	if (move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1))
+	 || move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
+		free(tmp_idx);
+		return -1;
+	}
+
+	install_packed_git(p);
+	free(tmp_idx);
 	return 0;
 }
 
@@ -1043,7 +1070,6 @@ struct http_pack_request *new_http_pack_request(
 	preq->url = strbuf_detach(&buf, NULL);
 
 	filename = sha1_pack_name(target->sha1);
-	snprintf(preq->filename, sizeof(preq->filename), "%s", filename);
 	snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp", filename);
 	preq->packfile = fopen(preq->tmpfile, "a");
 	if (!preq->packfile) {
diff --git a/http.h b/http.h
index 5c9441c..e4a8126 100644
--- a/http.h
+++ b/http.h
@@ -152,7 +152,6 @@ struct http_pack_request
 	struct packed_git *target;
 	struct packed_git **lst;
 	FILE *packfile;
-	char filename[PATH_MAX];
 	char tmpfile[PATH_MAX];
 	struct curl_slist *range_header;
 	struct active_request_slot *slot;
diff --git a/sha1_file.c b/sha1_file.c
index ff65328..820063e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -599,6 +599,14 @@ void unuse_pack(struct pack_window **w_cursor)
 	}
 }
 
+void close_pack_index(struct packed_git *p)
+{
+	if (p->index_data) {
+		munmap((void *)p->index_data, p->index_size);
+		p->index_data = NULL;
+	}
+}
+
 /*
  * This is used by git-repack in case a newly created pack happens to
  * contain the same set of objects as an existing one.  In that case
@@ -620,8 +628,7 @@ void free_pack_by_name(const char *pack_name)
 			close_pack_windows(p);
 			if (p->pack_fd != -1)
 				close(p->pack_fd);
-			if (p->index_data)
-				munmap((void *)p->index_data, p->index_size);
+			close_pack_index(p);
 			free(p->bad_object_sha1);
 			*pp = p->next;
 			free(p);
diff --git a/t/t5550-http-fetch.sh b/t/t5550-http-fetch.sh
index 78c31c9..bdac8d7 100755
--- a/t/t5550-http-fetch.sh
+++ b/t/t5550-http-fetch.sh
@@ -62,6 +62,21 @@ test_expect_success 'fetch packed objects' '
 	git clone $HTTPD_URL/dumb/repo_pack.git
 '
 
+test_expect_success 'fetch notices corrupt pack' '
+	cp -R "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git "$HTTPD_DOCUMENT_ROOT_PATH"/repo_bad1.git &&
+	(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_bad1.git &&
+	 p=`ls objects/pack/pack-*.pack` &&
+	 chmod u+w $p &&
+	 dd if=/dev/zero of=$p bs=256 count=1 seek=1
+	) &&
+	mkdir repo_bad1.git &&
+	(cd repo_bad1.git &&
+	 git --bare init &&
+	 test_must_fail git --bare fetch $HTTPD_URL/dumb/repo_bad1.git &&
+	 test 0 = `ls objects/pack/pack-*.pack | wc -l`
+	)
+'
+
 test_expect_success 'did not use upload-pack service' '
 	grep '/git-upload-pack' <"$HTTPD_ROOT_PATH"/access.log >act
 	: >exp
-- 
1.7.1.rc1.269.ga27c7

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
git fetch over http:// left my repo broken, Christian Halstrick, (Thu Apr 15, 2:51 am)
Re: git fetch over http:// left my repo broken, Michael J Gruber, (Thu Apr 15, 2:58 am)
Re: git fetch over http:// left my repo broken, Ilari Liusvaara, (Thu Apr 15, 4:33 am)
Re: git fetch over http:// left my repo broken, Ilari Liusvaara, (Thu Apr 15, 4:43 am)
Re: git fetch over http:// left my repo broken, Shawn O. Pearce, (Thu Apr 15, 7:15 am)
[PATCH 0/6] detect dumb HTTP pack file corruption, Shawn O. Pearce, (Thu Apr 15, 12:09 pm)
[PATCH 1/6] http.c: Remove bad free of static block, Shawn O. Pearce, (Thu Apr 15, 12:09 pm)
[PATCH 2/6] t5550-http-fetch: Use subshell for repository ..., Shawn O. Pearce, (Thu Apr 15, 12:09 pm)
[PATCH 5/6] http-fetch: Use index-pack rather than verify- ..., Shawn O. Pearce, (Thu Apr 15, 12:09 pm)
Re: [PATCH 0/6] detect dumb HTTP pack file corruption, Junio C Hamano, (Sat Apr 17, 10:56 am)
Re: [PATCH 0/6] detect dumb HTTP pack file corruption, Shawn O. Pearce, (Sat Apr 17, 12:11 pm)
Re: [PATCH v2 5/6] http-fetch: Use index-pack rather than ..., Shawn O. Pearce, (Sat Apr 17, 12:30 pm)
[PATCH v3 01/11] http.c: Remove bad free of static block, Shawn O. Pearce, (Sat Apr 17, 1:07 pm)
[PATCH v3 09/11] Allow parse_pack_index on temporary files, Shawn O. Pearce, (Sat Apr 17, 1:07 pm)
[PATCH v4 00/11] Resend sp/maint-dumb-http-pack-reidx, Shawn O. Pearce, (Mon Apr 19, 7:23 am)
[PATCH v4 09/11] Allow parse_pack_index on temporary files, Shawn O. Pearce, (Mon Apr 19, 7:23 am)
Re: [PATCH v4 00/11] Resend sp/maint-dumb-http-pack-reidx, Tay Ray Chuan, (Mon Apr 19, 7:46 am)
Re: [PATCH v4 00/11] Resend sp/maint-dumb-http-pack-reidx, Shawn O. Pearce, (Mon Apr 19, 7:49 am)
Re: [PATCH v4 00/11] Resend sp/maint-dumb-http-pack-reidx, Tay Ray Chuan, (Mon Apr 19, 9:33 pm)