Hi, some days back I fetched from a github repo with http protocol and afterwards my local repo was broken. Since the fetch was done by a cronjob I don't know whether the fetch reported an error. Problem is that one pack file was corrupted because the github servers put the repo I wanted to clone into some maintenance mode while I was fetching. The pack file includes at the end the html source code - which makes these files clearly corrupted. Git should detect this error and let the fetch fail, right? All this done on Linux (CentOS) with git version 1.6.6.1. Github repo was http://github.com/sonatype/sonatype-tycho.git. This is not easy to reproduce - because you have to hit the maintenance times of github. Linux wdfd00220954a.wdf.sap.corp 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 origin http://github.com/sonatype/sonatype-tycho.git (fetch) error: packfile ./objects/pack/pack-b5dceb0bae390d6540f881be686839f7e691fa0b.pack does not match index error: packfile ./objects/pack/pack-b5dceb0bae390d6540f881be686839f7e691fa0b.pack cannot be accessed error: packfile ./objects/pack/pack-b5dceb0bae390d6540f881be686839f7e691fa0b.pack does not match index fatal: packfile ./objects/pack/pack-b5dceb0bae390d6540f881be686839f7e691fa0b.pack cannot be accessed -rw-r--r-- 1 git git 766920900 Apr 13 16:22 ./objects/pack/pack-b5dceb0bae390d6540f881be686839f7e691fa0b.pack 0000000: 5041 434b 0000 0002 0000 4bef 962b 789c PACK......K..+x. 0000010: 9591 cb72 dc20 1045 f77c 457f 8067 8a41 ...r. .E.|E..g.A 0000020: 1a21 a552 a954 5cce 63e5 2adb 9b2c 7934 .!.R.T\.c.*..,y4 0000030: 2332 8856 00d9 99bf 37a3 781c 2f9d 0d05 #2.V....7.x./... # dump a section near the end of the corrupted file: html source code 2db625f6:d64b 2752 e70f 8f1a 6161 3c21 444f 4354 .K'R....aa<!DOCT 2db62606:5950 4520 6874 6d6c 2050 5542 4c49 4320 YPE html PUBLIC 2db62616:222d 2f2f 5733 432f 2f44 5444 2058 4854 "-//W3C//DTD XHT 2db62626:4d4c 2031 2e30 2054 7261 6e73 6974 696f ML 1.0 Transitio 2db62636:6e61 6c2f ...
Right. And Github should not pull your repo away from under your feet. But still, Git should be able to deal with broken servers. The problem is: If the server does not report any problem but simply serves a broken pack (with correct header), how should Git notice? It would require a fsck before accepting any new pack. If you move away the broken pack, do you get any dangling refs? Michael --
Pack trailer hash. Apparently dumb HTTP fetch needs to bypass pack to index conversion somehow since index-pack aborts if trailer hash check fails (not to mention other failures corrupt pack may cause). -Ilari --
Oddly enough, http.c runs verify_pack() after the download, but does so only after it swings the pack file into position. If verify_pack() fails, it leaves the corrupt pack file in the objects/pack directory. Talk about fail. I'll put together a patch shortly. -- Shawn. --
The filename variable here is pointing to a block of memory that was allocated by sha1_file.c and is also held in a static variable scoped within the sha1_pack_name() function. Doing a free() here is returning that memory to the allocator while we might still try to reuse it on a subsequent sha1_pack_name() invocation. That's not acceptable, so don't free it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- http.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/http.c b/http.c index 4814217..f26625e 100644 --- a/http.c +++ b/http.c @@ -1082,7 +1082,6 @@ struct http_pack_request *new_http_pack_request( return preq; abort: - free(filename); free(preq->url); free(preq); return NULL; -- 1.7.1.rc1.269.ga27c7 --
Change into the server repository's directory using a subshell, so we can return back to the top of the trash directory before doing anything more in the test script. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- t/t5550-http-fetch.sh | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/t/t5550-http-fetch.sh b/t/t5550-http-fetch.sh index 8cfce96..78c31c9 100755 --- a/t/t5550-http-fetch.sh +++ b/t/t5550-http-fetch.sh @@ -55,9 +55,10 @@ test_expect_success 'http remote detects correct HEAD' ' test_expect_success 'fetch packed objects' ' cp -R "$HTTPD_DOCUMENT_ROOT_PATH"/repo.git "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && - cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && - git --bare repack && - git --bare prune-packed && + (cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && + git --bare repack && + git --bare prune-packed + ) && git clone $HTTPD_URL/dumb/repo_pack.git ' -- 1.7.1.rc1.269.ga27c7 --
Always remove the struct packed_git from the active list, even
if the rename of the temporary file fails.
While we are here, simplify the code a bit by using a common
local variable name ("p") to hold the relevant packed_git.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/http.c b/http.c
index f26625e..4558f11 100644
--- a/http.c
+++ b/http.c
@@ -1002,8 +1002,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
{
int ret;
struct packed_git **lst;
+ struct packed_git *p = preq->target;
- preq->target->pack_size = ftell(preq->packfile);
+ p->pack_size = ftell(preq->packfile);
if (preq->packfile != NULL) {
fclose(preq->packfile);
@@ -1011,18 +1012,17 @@ int finish_http_pack_request(struct http_pack_request *preq)
preq->slot->local = NULL;
}
- ret = move_temp_to_file(preq->tmpfile, preq->filename);
- if (ret)
- return ret;
-
lst = preq->lst;
- while (*lst != preq->target)
+ while (*lst != p)
lst = &((*lst)->next);
*lst = (*lst)->next;
- if (verify_pack(preq->target))
+ ret = move_temp_to_file(preq->tmpfile, preq->filename);
+ if (ret)
+ return ret;
+ if (verify_pack(p))
return -1;
- install_packed_git(preq->target);
+ install_packed_git(p);
return 0;
}
--
1.7.1.rc1.269.ga27c7
--
This series tries to better detect and avoid corrupted pack and idx files downloaded over the dumb HTTP transport. It addresses the GitHub repository maintainence causing corruption issue reported today by Christian Halstrick. Shawn O. Pearce (6): http.c: Remove bad free of static block t5550-http-fetch: Use subshell for repository operations http.c: Tiny refactoring of finish_http_pack_request http.c: Drop useless != NULL test in finish_http_pack_request http-fetch: Use index-pack rather than verify-pack to check packs http-fetch: Use temporary files for pack-*.idx until verified cache.h | 3 +- http.c | 118 +++++++++++++++++++++++++++++++++---------------- http.h | 1 - pack-check.c | 15 +++++-- pack.h | 1 + sha1_file.c | 17 +++++-- t/t5550-http-fetch.sh | 37 ++++++++++++++- 7 files changed, 140 insertions(+), 52 deletions(-) --
Hmph, I am getting failures from "[index v2] 6 verify-pack detects CRC mismatch" in t5302 when this is applied to 'maint' (or when the result is merged to 'master'). --
Oops. Well, I need to respin the series anyway to address Tay Ray Chuan's comments. Clearly I failed to run the full test suite before sending this series. I promise to run the full suite before resending. :-) -- Shawn. --
The test preq->packfile != NULL is always true. If packfile was
actually NULL when entering this function the ftell() above would
crash out with a SIGSEGV, resulting in never reaching this point.
Simplify the code by just removing the conditional.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/http.c b/http.c
index 4558f11..64e0c18 100644
--- a/http.c
+++ b/http.c
@@ -1005,12 +1005,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
struct packed_git *p = preq->target;
p->pack_size = ftell(preq->packfile);
-
- if (preq->packfile != NULL) {
- fclose(preq->packfile);
- preq->packfile = NULL;
- preq->slot->local = NULL;
- }
+ fclose(preq->packfile);
+ preq->packfile = NULL;
+ preq->slot->local = NULL;
lst = preq->lst;
while (*lst != p)
--
1.7.1.rc1.269.ga27c7
--
To ensure we don't leave a corrupt pack file positioned as though it were a valid pack file, run index-pack on the temporary pack before we rename it to its final name. If index-pack crashes out when it discovers file corruption (e.g. GitHub's error HTML at the end of the file), simply delete the temporary files to cleanup. By waiting until the pack has been validated before we move it to its final name, we eliminate a race condition where another concurrent reader might try to access the pack at the same time that we are still trying to verify its not corrupt. Switching from verify-pack to index-pack is a change in behavior, but it should turn out better for users. The index-pack algorithm tries to minimize disk seeks, as well as the number of times any given object is inflated, by organizing its work along delta chains. The verify-pack logic does not attempt to do this, thrashing the delta base cache and the filesystem cache. By recreating the index file locally, we also can automatically upgrade from a v1 pack table of contents to v2. This makes the CRC32 data available for use during later repacks, even if the server didn't have them on hand. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- cache.h | 1 + http.c | 42 ++++++++++++++++++++++++++++++++++-------- http.h | 1 - sha1_file.c | 11 +++++++++-- t/t5550-http-fetch.sh | 15 +++++++++++++++ 5 files changed, 59 insertions(+), 11 deletions(-) diff --git a/cache.h b/cache.h index 5eb0573..4150603 100644 --- a/cache.h +++ b/cache.h @@ -916,6 +916,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1, extern void pack_report(void); extern int open_pack_index(struct packed_git *); +extern void close_pack_index(struct packed_git *); extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *); extern void close_pack_windows(struct packed_git *); extern void ...
Since the particular byte that overwrites the pack is irrelevant, please make this: printf %0256d 0 | dd of=$p bs=256 count=1 seek=1 for the benefit of us poor Windowsers who do not have /dev/zero. Perhaps you want to add conv=notrunc. Ditto in patch 6/6. Thanks, -- Hannes --
Verify that a downloaded pack-*.idx file is consistent and valid
as an index file before we rename it into its final destination.
This prevents a corrupt index file from later being treated as a
usable file, confusing readers.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 2 +-
http.c | 62 +++++++++++++++++++++++++++++++-----------------
pack-check.c | 15 ++++++++---
pack.h | 1 +
sha1_file.c | 6 +++-
t/t5550-http-fetch.sh | 15 ++++++++++++
6 files changed, 72 insertions(+), 29 deletions(-)
diff --git a/cache.h b/cache.h
index 4150603..0d101e4 100644
--- a/cache.h
+++ b/cache.h
@@ -905,7 +905,7 @@ struct extra_have_objects {
extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
extern int server_supports(const char *feature);
-extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
extern void prepare_packed_git(void);
extern void reprepare_packed_git(void);
diff --git a/http.c b/http.c
index aa3e380..2d88034 100644
--- a/http.c
+++ b/http.c
@@ -897,47 +897,65 @@ int http_fetch_ref(const char *base, struct ref *ref)
}
/* Helpers for fetching packs */
-static int fetch_pack_index(unsigned char *sha1, const char *base_url)
+static char *fetch_pack_index(unsigned char *sha1, const char *base_url)
{
- int ret = 0;
- char *hex = xstrdup(sha1_to_hex(sha1));
- char *filename;
- char *url = NULL;
+ char *url, *tmp;
struct strbuf buf = STRBUF_INIT;
- if (has_pack_index(sha1)) {
- ret = 0;
- goto cleanup;
- }
-
if (http_is_verbose)
- fprintf(stderr, "Getting index for pack %s\n", hex);
+ fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
end_url_with_slash(&buf, base_url);
- strbuf_addf(&buf, "objects/pack/pack-%s.idx", ...Hi
It probably should be mentioned in the commit message or elsewhere that
as fetch_and_setup_pack_index() now checks for the pack index locally
The conflation of "ret" as the result of both verify_pack_index() and
move_temp_to_file() is pretty confusing.
Also, perhaps the below could be squashed in to reduce if()'s on tmp_idx.
-- >8 --
diff --git a/http.c b/http.c
index 6b7b899..e5bb54a 100644
--- a/http.c
+++ b/http.c
@@ -945,35 +945,37 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
{
struct packed_git *new_pack;
char *tmp_idx = NULL;
+ int ret;
- if (!has_pack_index(sha1)) {
- tmp_idx = fetch_pack_index(sha1, base_url);
- if (!tmp_idx)
- return -1;
+ if (has_pack_index(sha1)) {
+ new_pack = parse_pack_index(sha1, NULL);
+ if (!new_pack)
+ return -1; /* parse_pack_index() already issued error message */
+ goto add_pack;
}
+ tmp_idx = fetch_pack_index(sha1, base_url);
+ if (!tmp_idx)
+ return -1;
+
new_pack = parse_pack_index(sha1, tmp_idx);
if (!new_pack) {
- if (tmp_idx) {
- unlink(tmp_idx);
- free(tmp_idx);
- }
+ unlink(tmp_idx);
+ free(tmp_idx);
+
return -1; /* parse_pack_index() already issued error message */
}
- if (tmp_idx) {
- int ret;
-
- ret = verify_pack_index(new_pack);
- if (!ret) {
- close_pack_index(new_pack);
- ret = move_temp_to_file(tmp_idx, sha1_pack_index_name(sha1));
- }
- free(tmp_idx);
- if (ret)
- return -1;
+ ret = verify_pack_index(new_pack);
+ if (!ret) {
+ close_pack_index(new_pack);
+ ret = move_temp_to_file(tmp_idx, sha1_pack_index_name(sha1));
}
+ free(tmp_idx);
+ if (ret)
+ return -1;
+add_pack:
new_pack->next = *packs_head;
*packs_head = new_pack;
return 0;
--
Cheers,
Ray Chuan
--
The filename variable here is pointing to a block of memory that was allocated by sha1_file.c and is also held in a static variable scoped within the sha1_pack_name() function. Doing a free() here is returning that memory to the allocator while we might still try to reuse it on a subsequent sha1_pack_name() invocation. That's not acceptable, so don't free it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- http.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/http.c b/http.c index 4814217..f26625e 100644 --- a/http.c +++ b/http.c @@ -1082,7 +1082,6 @@ struct http_pack_request *new_http_pack_request( return preq; abort: - free(filename); free(preq->url); free(preq); return NULL; -- 1.7.1.rc1.269.ga27c7 --
The test preq->packfile != NULL is always true. If packfile was
actually NULL when entering this function the ftell() above would
crash out with a SIGSEGV, resulting in never reaching this point.
Simplify the code by just removing the conditional.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/http.c b/http.c
index 4558f11..64e0c18 100644
--- a/http.c
+++ b/http.c
@@ -1005,12 +1005,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
struct packed_git *p = preq->target;
p->pack_size = ftell(preq->packfile);
-
- if (preq->packfile != NULL) {
- fclose(preq->packfile);
- preq->packfile = NULL;
- preq->slot->local = NULL;
- }
+ fclose(preq->packfile);
+ preq->packfile = NULL;
+ preq->slot->local = NULL;
lst = preq->lst;
while (*lst != p)
--
1.7.1.rc1.269.ga27c7
--
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format. Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 1 +
sha1_file.c | 11 +++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/cache.h b/cache.h
index 5eb0573..4150603 100644
--- a/cache.h
+++ b/cache.h
@@ -916,6 +916,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
extern void pack_report(void);
extern int open_pack_index(struct packed_git *);
+extern void close_pack_index(struct packed_git *);
extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *);
extern void close_pack_windows(struct packed_git *);
extern void unuse_pack(struct pack_window **);
diff --git a/sha1_file.c b/sha1_file.c
index ff65328..820063e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -599,6 +599,14 @@ void unuse_pack(struct pack_window **w_cursor)
}
}
+void close_pack_index(struct packed_git *p)
+{
+ if (p->index_data) {
+ munmap((void *)p->index_data, p->index_size);
+ p->index_data = NULL;
+ }
+}
+
/*
* This is used by git-repack in case a newly created pack happens to
* contain the same set of objects as an existing one. In that case
@@ -620,8 +628,7 @@ void free_pack_by_name(const char *pack_name)
close_pack_windows(p);
if (p->pack_fd != -1)
close(p->pack_fd);
- if (p->index_data)
- munmap((void *)p->index_data, p->index_size);
+ close_pack_index(p);
free(p->bad_object_sha1);
*pp = p->next;
free(p);
--
1.7.1.rc1.269.ga27c7
--
Verify that a downloaded pack-*.idx file is consistent and valid
as an index file before we rename it into its final destination.
This prevents a corrupt index file from later being treated as a
usable file, confusing readers.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 56 ++++++++++++++++++++++++++++++++++--------------
t/t5550-http-fetch.sh | 15 +++++++++++++
2 files changed, 54 insertions(+), 17 deletions(-)
diff --git a/http.c b/http.c
index a7352b7..9d68d02 100644
--- a/http.c
+++ b/http.c
@@ -897,18 +897,11 @@ int http_fetch_ref(const char *base, struct ref *ref)
}
/* Helpers for fetching packs */
-static int fetch_pack_index(unsigned char *sha1, const char *base_url)
+static char *fetch_pack_index(unsigned char *sha1, const char *base_url)
{
- int ret = 0;
- char *filename;
- char *url = NULL;
+ char *url, *tmp;
struct strbuf buf = STRBUF_INIT;
- if (has_pack_index(sha1)) {
- ret = 0;
- goto cleanup;
- }
-
if (http_is_verbose)
fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
@@ -916,26 +909,55 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
strbuf_addf(&buf, "objects/pack/pack-%s.idx", sha1_to_hex(sha1));
url = strbuf_detach(&buf, NULL);
- filename = sha1_pack_index_name(sha1);
- if (http_get_file(url, filename, 0) != HTTP_OK)
- ret = error("Unable to get pack index %s\n", url);
+ strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(sha1));
+ tmp = strbuf_detach(&buf, NULL);
+
+ if (http_get_file(url, tmp, 0) != HTTP_OK) {
+ error("Unable to get pack index %s\n", url);
+ free(tmp);
+ tmp = NULL;
+ }
-cleanup:
free(url);
- return ret;
+ return tmp;
}
static int fetch_and_setup_pack_index(struct packed_git **packs_head,
unsigned char *sha1, const char *base_url)
{
struct packed_git *new_pack;
+ char *tmp_idx = NULL;
+ int ret;
+
+ if (has_pack_index(sha1)) {
+ new_pack = parse_pack_index(sha1, NULL);
+ if ...Hi, On Sat, 17 Apr 2010 13:07:44 -0700 Perhaps this should be added in: Check that we do not have the pack index file before invoking fetch_and_setup_pack_index(); that way, we can do without the has_pack_index() check in fetch_and_setup_pack_index(). -- Cheers, Ray Chuan --
This is a resend of the last half of the series, from patch 6/11
to the end, to address some minor review comments.
Junio, I think you need to reset my branch to 0da8b2e7c80a6d
("http.c: Don't store destination name in structures"), and
then apply this group.
Total series diff since v3 is this shocking one line change, most
of the edits were to commit messages:
diff --git a/http.c b/http.c
index 83f6047..0813c9e 100644
--- a/http.c
+++ b/http.c
@@ -1028,7 +1028,6 @@ int finish_http_pack_request(struct http_pack_request *preq)
const char *ip_argv[8];
close_pack_index(p);
- unlink(sha1_pack_index_name(p->sha1));
fclose(preq->packfile);
preq->packfile = NULL;
@@ -1062,6 +1061,8 @@ int finish_http_pack_request(struct http_pack_request *preq)
return -1;
}
+ unlink(sha1_pack_index_name(p->sha1));
+
if (move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1))
|| move_temp_to_file(tmp_idx, sha1_pack_index_name(p->sha1))) {
free(tmp_idx);
--
Hi,
On Mon, 19 Apr 2010 07:23:04 -0700
the small patch below could also be applied to the rebased topic branch.
-->8--
From: Tay Ray Chuan <rctay89@gmail.com>
Subject: [PATCH] http.c::new_http_pack_request: do away with the temp variable filename
Now that the temporary variable char *filename is only used in one
place, do away with it and just call sha1_pack_name() directly.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
diff --git a/http.c b/http.c
index c75eb95..110cff9 100644
--- a/http.c
+++ b/http.c
@@ -1027,7 +1027,6 @@ int finish_http_pack_request(struct http_pack_request *preq)
struct http_pack_request *new_http_pack_request(
struct packed_git *target, const char *base_url)
{
- char *filename;
long prev_posn = 0;
char range[RANGE_HEADER_SIZE];
struct strbuf buf = STRBUF_INIT;
@@ -1042,8 +1041,8 @@ struct http_pack_request *new_http_pack_request(
sha1_to_hex(target->sha1));
preq->url = strbuf_detach(&buf, NULL);
- filename = sha1_pack_name(target->sha1);
- snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp", filename);
+ snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp",
+ sha1_pack_name(target->sha1));
preq->packfile = fopen(preq->tmpfile, "a");
if (!preq->packfile) {
error("Unable to open local file %s for pack",
--
--
Cheers,
Ray Chuan
--
FWIW, Acked-by: Shawn O. Pearce <spearce@spearce.org> -- Shawn. --
Hi Junio, I noticed that the inlined patch (in message <20100419224643.00001ff1@unknown>) being acked here wasn't applied to the topic branch 'sp/maint-dumb-http-pack-reidx' in pu; just a heads-up in case you've missed something. -- Cheers, Ray Chuan --
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format. Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
No change from v3.
cache.h | 1 +
sha1_file.c | 11 +++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/cache.h b/cache.h
index 6e54993..0eba039 100644
--- a/cache.h
+++ b/cache.h
@@ -911,6 +911,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
extern void pack_report(void);
extern int open_pack_index(struct packed_git *);
+extern void close_pack_index(struct packed_git *);
extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *);
extern void close_pack_windows(struct packed_git *);
extern void unuse_pack(struct pack_window **);
diff --git a/sha1_file.c b/sha1_file.c
index c23cc5e..4e82654 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -606,6 +606,14 @@ void unuse_pack(struct pack_window **w_cursor)
}
}
+void close_pack_index(struct packed_git *p)
+{
+ if (p->index_data) {
+ munmap((void *)p->index_data, p->index_size);
+ p->index_data = NULL;
+ }
+}
+
/*
* This is used by git-repack in case a newly created pack happens to
* contain the same set of objects as an existing one. In that case
@@ -627,8 +635,7 @@ void free_pack_by_name(const char *pack_name)
close_pack_windows(p);
if (p->pack_fd != -1)
close(p->pack_fd);
- if (p->index_data)
- munmap((void *)p->index_data, p->index_size);
+ close_pack_index(p);
free(p->bad_object_sha1);
*pp = p->next;
free(p);
--
1.7.1.rc1.279.g22727
--
The dumb HTTP transport should verify an index is completely valid
before trying to use it. That requires checking the header/footer
but also checking the complete content SHA-1. All of this logic is
already in the front half of verify_pack, so pull it out into a new
function that can be reused.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
No change from v3.
pack-check.c | 15 ++++++++++++---
pack.h | 1 +
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/pack-check.c b/pack-check.c
index 166ca70..395fb95 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -133,14 +133,13 @@ static int verify_packfile(struct packed_git *p,
return err;
}
-int verify_pack(struct packed_git *p)
+int verify_pack_index(struct packed_git *p)
{
off_t index_size;
const unsigned char *index_base;
git_SHA_CTX ctx;
unsigned char sha1[20];
int err = 0;
- struct pack_window *w_curs = NULL;
if (open_pack_index(p))
return error("packfile %s index not opened", p->pack_name);
@@ -154,8 +153,18 @@ int verify_pack(struct packed_git *p)
if (hashcmp(sha1, index_base + index_size - 20))
err = error("Packfile index for %s SHA1 mismatch",
p->pack_name);
+ return err;
+}
+
+int verify_pack(struct packed_git *p)
+{
+ int err = 0;
+ struct pack_window *w_curs = NULL;
+
+ err |= verify_pack_index(p);
+ if (!p->index_data)
+ return -1;
- /* Verify pack file */
err |= verify_packfile(p, &w_curs);
unuse_pack(&w_curs);
diff --git a/pack.h b/pack.h
index b759a23..880f9c2 100644
--- a/pack.h
+++ b/pack.h
@@ -57,6 +57,7 @@ struct pack_idx_entry {
extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, unsigned char *sha1);
extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
+extern int verify_pack_index(struct packed_git *);
extern int ...Verify that a downloaded pack-*.idx file is consistent and valid
as an index file before we rename it into its final destination.
This prevents a corrupt index file from later being treated as a
usable file, confusing readers.
Check that we do not have the pack index file before invoking
fetch_pack_index(); that way, we can do without the has_pack_index()
check in fetch_pack_index().
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Added paragraph describing the move of has_pack_index() to the
commit message.
No code change from v3.
http.c | 56 ++++++++++++++++++++++++++++++++++--------------
t/t5550-http-fetch.sh | 15 +++++++++++++
2 files changed, 54 insertions(+), 17 deletions(-)
diff --git a/http.c b/http.c
index 2ebd679..0813c9e 100644
--- a/http.c
+++ b/http.c
@@ -897,18 +897,11 @@ int http_fetch_ref(const char *base, struct ref *ref)
}
/* Helpers for fetching packs */
-static int fetch_pack_index(unsigned char *sha1, const char *base_url)
+static char *fetch_pack_index(unsigned char *sha1, const char *base_url)
{
- int ret = 0;
- char *filename;
- char *url = NULL;
+ char *url, *tmp;
struct strbuf buf = STRBUF_INIT;
- if (has_pack_index(sha1)) {
- ret = 0;
- goto cleanup;
- }
-
if (http_is_verbose)
fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
@@ -916,26 +909,55 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
strbuf_addf(&buf, "objects/pack/pack-%s.idx", sha1_to_hex(sha1));
url = strbuf_detach(&buf, NULL);
- filename = sha1_pack_index_name(sha1);
- if (http_get_file(url, filename, 0) != HTTP_OK)
- ret = error("Unable to get pack index %s\n", url);
+ strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(sha1));
+ tmp = strbuf_detach(&buf, NULL);
+
+ if (http_get_file(url, tmp, 0) != HTTP_OK) {
+ error("Unable to get pack index %s\n", url);
+ free(tmp);
+ tmp = NULL;
+ }
-cleanup:
...The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped. However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available. So permit the caller to specify the
exact path of the index file.
For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
No change from v3.
cache.h | 2 +-
http.c | 2 +-
sha1_file.c | 3 +--
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index 0eba039..7db23ef 100644
--- a/cache.h
+++ b/cache.h
@@ -900,7 +900,7 @@ struct extra_have_objects {
extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
extern int server_supports(const char *feature);
-extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
extern void prepare_packed_git(void);
extern void reprepare_packed_git(void);
diff --git a/http.c b/http.c
index 95e3b8b..9c62632 100644
--- a/http.c
+++ b/http.c
@@ -932,7 +932,7 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
if (fetch_pack_index(sha1, base_url))
return -1;
- new_pack = parse_pack_index(sha1);
+ new_pack = parse_pack_index(sha1, sha1_pack_index_name(sha1));
if (!new_pack)
return -1; /* parse_pack_index() already issued error message */
new_pack->next = *packs_head;
diff --git a/sha1_file.c b/sha1_file.c
index 4e82654..9f3f514 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -845,9 +845,8 @@ struct packed_git ...To ensure we don't leave a corrupt pack file positioned as though
it were a valid pack file, run index-pack on the temporary pack
before we rename it to its final name. If index-pack crashes out
when it discovers file corruption (e.g. GitHub's error HTML at the
end of the file), simply delete the temporary files to cleanup.
By waiting until the pack has been validated before we move it
to its final name, we eliminate a race condition where another
concurrent reader might try to access the pack at the same time
that we are still trying to verify its not corrupt.
Switching from verify-pack to index-pack is a change in behavior,
but it should turn out better for users. The index-pack algorithm
tries to minimize disk seeks, as well as the number of times any
given object is inflated, by organizing its work along delta chains.
The verify-pack logic does not attempt to do this, thrashing the
delta base cache and the filesystem cache.
By recreating the index file locally, we also can automatically
upgrade from a v1 pack table of contents to v2. This makes the
CRC32 data available for use during later repacks, even if the
server didn't have them on hand.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
Moved unlink of index to after the index-pack is successful,
per Tay Ray Chuan's request.
Removed Junio SOB line since the logic changed.
http.c | 44 +++++++++++++++++++++++++++++++++++++-------
t/t5550-http-fetch.sh | 15 +++++++++++++++
2 files changed, 52 insertions(+), 7 deletions(-)
diff --git a/http.c b/http.c
index 9c62632..2ebd679 100644
--- a/http.c
+++ b/http.c
@@ -1,6 +1,7 @@
#include "http.h"
#include "pack.h"
#include "sideband.h"
+#include "run-command.h"
int data_received;
int active_requests;
@@ -998,11 +999,14 @@ void release_http_pack_request(struct http_pack_request *preq)
int finish_http_pack_request(struct http_pack_request *preq)
{
- int ret;
struct packed_git **lst;
struct packed_git *p ...Hi, Looks good. Acked-by: Tay Ray Chuan <rctay89@gmail.com> -- Cheers, Ray Chuan --
Most of the time the dumb HTTP transport is run without the verbose
flag set, so we only need the result of sha1_to_hex(sha1) once, to
construct the pack URL. Don't bother with an unnecessary malloc,
copy, free chain of this buffer.
If verbose is set, we'll format the SHA-1 twice now. But this
tiny extra CPU time spent is nothing compared to the slowdown that
is usually imposed by the verbose messages being sent to the tty,
and is entirely trivial compared to the latency involved with the
remote HTTP server sending something as big as a pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Message fixed, Acked-by added.
No code change from v3.
http.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/http.c b/http.c
index 7ee1ba5..95e3b8b 100644
--- a/http.c
+++ b/http.c
@@ -899,7 +899,6 @@ int http_fetch_ref(const char *base, struct ref *ref)
static int fetch_pack_index(unsigned char *sha1, const char *base_url)
{
int ret = 0;
- char *hex = xstrdup(sha1_to_hex(sha1));
char *filename;
char *url = NULL;
struct strbuf buf = STRBUF_INIT;
@@ -910,10 +909,10 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
}
if (http_is_verbose)
- fprintf(stderr, "Getting index for pack %s\n", hex);
+ fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
end_url_with_slash(&buf, base_url);
- strbuf_addf(&buf, "objects/pack/pack-%s.idx", hex);
+ strbuf_addf(&buf, "objects/pack/pack-%s.idx", sha1_to_hex(sha1));
url = strbuf_detach(&buf, NULL);
filename = sha1_pack_index_name(sha1);
@@ -921,7 +920,6 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
ret = error("Unable to get pack index %s\n", url);
cleanup:
- free(hex);
free(url);
return ret;
}
--
1.7.1.rc1.279.g22727
--
The dumb HTTP transport should verify an index is completely valid
before trying to use it. That requires checking the header/footer
but also checking the complete content SHA-1. All of this logic is
already in the front half of verify_pack, so pull it out into a new
function that can be reused.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
pack-check.c | 15 ++++++++++++---
pack.h | 1 +
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/pack-check.c b/pack-check.c
index 166ca70..395fb95 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -133,14 +133,13 @@ static int verify_packfile(struct packed_git *p,
return err;
}
-int verify_pack(struct packed_git *p)
+int verify_pack_index(struct packed_git *p)
{
off_t index_size;
const unsigned char *index_base;
git_SHA_CTX ctx;
unsigned char sha1[20];
int err = 0;
- struct pack_window *w_curs = NULL;
if (open_pack_index(p))
return error("packfile %s index not opened", p->pack_name);
@@ -154,8 +153,18 @@ int verify_pack(struct packed_git *p)
if (hashcmp(sha1, index_base + index_size - 20))
err = error("Packfile index for %s SHA1 mismatch",
p->pack_name);
+ return err;
+}
+
+int verify_pack(struct packed_git *p)
+{
+ int err = 0;
+ struct pack_window *w_curs = NULL;
+
+ err |= verify_pack_index(p);
+ if (!p->index_data)
+ return -1;
- /* Verify pack file */
err |= verify_packfile(p, &w_curs);
unuse_pack(&w_curs);
diff --git a/pack.h b/pack.h
index d268c01..bb27576 100644
--- a/pack.h
+++ b/pack.h
@@ -57,6 +57,7 @@ struct pack_idx_entry {
extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, unsigned char *sha1);
extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
+extern int verify_pack_index(struct packed_git *);
extern int verify_pack(struct packed_git *);
extern void fixup_pack_header_footer(int, ...Always remove the struct packed_git from the active list, even
if the rename of the temporary file fails.
While we are here, simplify the code a bit by using a common
local variable name ("p") to hold the relevant packed_git.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/http.c b/http.c
index f26625e..4558f11 100644
--- a/http.c
+++ b/http.c
@@ -1002,8 +1002,9 @@ int finish_http_pack_request(struct http_pack_request *preq)
{
int ret;
struct packed_git **lst;
+ struct packed_git *p = preq->target;
- preq->target->pack_size = ftell(preq->packfile);
+ p->pack_size = ftell(preq->packfile);
if (preq->packfile != NULL) {
fclose(preq->packfile);
@@ -1011,18 +1012,17 @@ int finish_http_pack_request(struct http_pack_request *preq)
preq->slot->local = NULL;
}
- ret = move_temp_to_file(preq->tmpfile, preq->filename);
- if (ret)
- return ret;
-
lst = preq->lst;
- while (*lst != preq->target)
+ while (*lst != p)
lst = &((*lst)->next);
*lst = (*lst)->next;
- if (verify_pack(preq->target))
+ ret = move_temp_to_file(preq->tmpfile, preq->filename);
+ if (ret)
+ return ret;
+ if (verify_pack(p))
return -1;
- install_packed_git(preq->target);
+ install_packed_git(p);
return 0;
}
--
1.7.1.rc1.269.ga27c7
--
The destination name within the object store is easily computed
on demand, reusing a static buffer held by sha1_file.c. We don't
need to copy the entire path into the request structure for safe
keeping, when it can be easily reformatted after the download has
been completed.
This reduces the size of the per-request structure, and removes
yet another PATH_MAX based limit.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http-walker.c | 2 +-
http.c | 14 ++++++--------
http.h | 2 --
3 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/http-walker.c b/http-walker.c
index ef99ae6..8ca76d0 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -510,7 +510,7 @@ static int fetch_object(struct walker *walker, struct alt_base *repo, unsigned c
ret = error("File %s has bad hash", hex);
} else if (req->rename < 0) {
ret = error("unable to write sha1 filename %s",
- req->filename);
+ sha1_file_name(req->sha1));
}
release_http_object_request(req);
diff --git a/http.c b/http.c
index 64e0c18..c75eb95 100644
--- a/http.c
+++ b/http.c
@@ -1014,7 +1014,7 @@ int finish_http_pack_request(struct http_pack_request *preq)
lst = &((*lst)->next);
*lst = (*lst)->next;
- ret = move_temp_to_file(preq->tmpfile, preq->filename);
+ ret = move_temp_to_file(preq->tmpfile, sha1_pack_name(p->sha1));
if (ret)
return ret;
if (verify_pack(p))
@@ -1043,7 +1043,6 @@ struct http_pack_request *new_http_pack_request(
preq->url = strbuf_detach(&buf, NULL);
filename = sha1_pack_name(target->sha1);
- snprintf(preq->filename, sizeof(preq->filename), "%s", filename);
snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp", filename);
preq->packfile = fopen(preq->tmpfile, "a");
if (!preq->packfile) {
@@ -1133,7 +1132,6 @@ struct http_object_request *new_http_object_request(const char *base_url,
freq->localfile = -1;
filename = sha1_file_name(sha1);
- snprintf(freq->filename, sizeof(freq->filename), "%s", ...Hi,
On Sat, 17 Apr 2010 13:07:38 -0700
now that there's a single user of char *filename, we might as well do
away with it.
PS. I think having the below as a separate patch is better than
squashing it in, as it might be detrimental to patch #05's readability
in the latter case.
-->8--
From: Tay Ray Chuan <rctay89@gmail.com>
Subject: [PATCH] http.c::new_http_pack_request: do away with the temp variable filename
Now that the temporary variable char *filename is only used in one
place, do away with it and just call sha1_pack_name() directly.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
diff --git a/http.c b/http.c
index c75eb95..110cff9 100644
--- a/http.c
+++ b/http.c
@@ -1027,7 +1027,6 @@ int finish_http_pack_request(struct http_pack_request *preq)
struct http_pack_request *new_http_pack_request(
struct packed_git *target, const char *base_url)
{
- char *filename;
long prev_posn = 0;
char range[RANGE_HEADER_SIZE];
struct strbuf buf = STRBUF_INIT;
@@ -1042,8 +1041,8 @@ struct http_pack_request *new_http_pack_request(
sha1_to_hex(target->sha1));
preq->url = strbuf_detach(&buf, NULL);
- filename = sha1_pack_name(target->sha1);
- snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp", filename);
+ snprintf(preq->tmpfile, sizeof(preq->tmpfile), "%s.temp",
+ sha1_pack_name(target->sha1));
preq->packfile = fopen(preq->tmpfile, "a");
if (!preq->packfile) {
error("Unable to open local file %s for pack",
--
--
Cheers,
Ray Chuan
--
Most of the time the dumb HTTP transport is run without the verbose
flag set, so we only need the result of sha1_to_hex(sha1) once, to
construct the pack URL. Don't bother with an unnecessary malloc,
copy, free chain of this buffer.
If verbose is set, we'll format the SHA-1 twice now. But this
tiny extra CPU time spent is nothing compared to the slowdown that
is usually imposed by the verbose messages being sent to the tty,
and its entirely trivial compared to the latency involved with the
remote HTTP server sending something as big as a pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/http.c b/http.c
index c75eb95..1a52740 100644
--- a/http.c
+++ b/http.c
@@ -899,7 +899,6 @@ int http_fetch_ref(const char *base, struct ref *ref)
static int fetch_pack_index(unsigned char *sha1, const char *base_url)
{
int ret = 0;
- char *hex = xstrdup(sha1_to_hex(sha1));
char *filename;
char *url = NULL;
struct strbuf buf = STRBUF_INIT;
@@ -910,10 +909,10 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
}
if (http_is_verbose)
- fprintf(stderr, "Getting index for pack %s\n", hex);
+ fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
end_url_with_slash(&buf, base_url);
- strbuf_addf(&buf, "objects/pack/pack-%s.idx", hex);
+ strbuf_addf(&buf, "objects/pack/pack-%s.idx", sha1_to_hex(sha1));
url = strbuf_detach(&buf, NULL);
filename = sha1_pack_index_name(sha1);
@@ -921,7 +920,6 @@ static int fetch_pack_index(unsigned char *sha1, const char *base_url)
ret = error("Unable to get pack index %s\n", url);
cleanup:
- free(hex);
free(url);
return ret;
}
--
1.7.1.rc1.269.ga27c7
--
On Sat, 17 Apr 2010 13:07:39 -0700 Minor nit: s/its/is/, ie. "is entirely trivial". Acked-by: Tay Ray Chuan <rctay89@gmail.com> -- Cheers, Ray Chuan --
Change into the server repository's directory using a subshell, so we can return back to the top of the trash directory before doing anything more in the test script. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- t/t5550-http-fetch.sh | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/t/t5550-http-fetch.sh b/t/t5550-http-fetch.sh index 8cfce96..78c31c9 100755 --- a/t/t5550-http-fetch.sh +++ b/t/t5550-http-fetch.sh @@ -55,9 +55,10 @@ test_expect_success 'http remote detects correct HEAD' ' test_expect_success 'fetch packed objects' ' cp -R "$HTTPD_DOCUMENT_ROOT_PATH"/repo.git "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && - cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && - git --bare repack && - git --bare prune-packed && + (cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && + git --bare repack && + git --bare prune-packed + ) && git clone $HTTPD_URL/dumb/repo_pack.git ' -- 1.7.1.rc1.269.ga27c7 --
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped. However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available. So permit the caller to specify the
exact path of the index file.
For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 2 +-
http.c | 2 +-
sha1_file.c | 3 +--
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index 4150603..0d101e4 100644
--- a/cache.h
+++ b/cache.h
@@ -905,7 +905,7 @@ struct extra_have_objects {
extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
extern int server_supports(const char *feature);
-extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
extern void prepare_packed_git(void);
extern void reprepare_packed_git(void);
diff --git a/http.c b/http.c
index 1a52740..9f3dfc1 100644
--- a/http.c
+++ b/http.c
@@ -932,7 +932,7 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
if (fetch_pack_index(sha1, base_url))
return -1;
- new_pack = parse_pack_index(sha1);
+ new_pack = parse_pack_index(sha1, sha1_pack_index_name(sha1));
if (!new_pack)
return -1; /* parse_pack_index() already issued error message */
new_pack->next = *packs_head;
diff --git a/sha1_file.c b/sha1_file.c
index 820063e..74bba79 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -838,9 +838,8 @@ struct packed_git *add_packed_git(const char *path, int path_len, int local)
return p;
...To ensure we don't leave a corrupt pack file positioned as though
it were a valid pack file, run index-pack on the temporary pack
before we rename it to its final name. If index-pack crashes out
when it discovers file corruption (e.g. GitHub's error HTML at the
end of the file), simply delete the temporary files to cleanup.
By waiting until the pack has been validated before we move it
to its final name, we eliminate a race condition where another
concurrent reader might try to access the pack at the same time
that we are still trying to verify its not corrupt.
Switching from verify-pack to index-pack is a change in behavior,
but it should turn out better for users. The index-pack algorithm
tries to minimize disk seeks, as well as the number of times any
given object is inflated, by organizing its work along delta chains.
The verify-pack logic does not attempt to do this, thrashing the
delta base cache and the filesystem cache.
By recreating the index file locally, we also can automatically
upgrade from a v1 pack table of contents to v2. This makes the
CRC32 data available for use during later repacks, even if the
server didn't have them on hand.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
http.c | 43 ++++++++++++++++++++++++++++++++++++-------
t/t5550-http-fetch.sh | 15 +++++++++++++++
2 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/http.c b/http.c
index 9f3dfc1..a7352b7 100644
--- a/http.c
+++ b/http.c
@@ -1,6 +1,7 @@
#include "http.h"
#include "pack.h"
#include "sideband.h"
+#include "run-command.h"
int data_received;
int active_requests;
@@ -998,11 +999,15 @@ void release_http_pack_request(struct http_pack_request *preq)
int finish_http_pack_request(struct http_pack_request *preq)
{
- int ret;
struct packed_git **lst;
struct packed_git *p = preq->target;
+ char *tmp_idx;
+ struct child_process ip;
+ const char ...Hi, On Sat, 17 Apr 2010 13:07:43 -0700 I think this should be done later, after we have run index-pack -- Cheers, Ray Chuan --
To ensure we don't leave a corrupt pack file positioned as though it were a valid pack file, run index-pack on the temporary pack before we rename it to its final name. If index-pack crashes out when it discovers file corruption (e.g. GitHub's error HTML at the end of the file), simply delete the temporary files to cleanup. By waiting until the pack has been validated before we move it to its final name, we eliminate a race condition where another concurrent reader might try to access the pack at the same time that we are still trying to verify its not corrupt. Switching from verify-pack to index-pack is a change in behavior, but it should turn out better for users. The index-pack algorithm tries to minimize disk seeks, as well as the number of times any given object is inflated, by organizing its work along delta chains. The verify-pack logic does not attempt to do this, thrashing the delta base cache and the filesystem cache. By recreating the index file locally, we also can automatically upgrade from a v1 pack table of contents to v2. This makes the CRC32 data available for use during later repacks, even if the server didn't have them on hand. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- cache.h | 1 + http.c | 42 ++++++++++++++++++++++++++++++++++-------- http.h | 1 - sha1_file.c | 11 +++++++++-- t/t5550-http-fetch.sh | 15 +++++++++++++++ 5 files changed, 59 insertions(+), 11 deletions(-) diff --git a/cache.h b/cache.h index 5eb0573..4150603 100644 --- a/cache.h +++ b/cache.h @@ -916,6 +916,7 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1, extern void pack_report(void); extern int open_pack_index(struct packed_git *); +extern void close_pack_index(struct packed_git *); extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned int *); extern void close_pack_windows(struct packed_git *); extern void ...
Hi, Hmm, when moving the pack index file, should we unlink() the old, downloaded one first? -- Cheers, Ray Chuan --
Doesn't seem worth it. I just started trying to rework this with a strbuf and I just don't see any benefit here. We know the tmpfile ends with ".pack.temp" when we created this request structure. So Yup, good point, thanks. -- Shawn. --
Verify that a downloaded pack-*.idx file is consistent and valid
as an index file before we rename it into its final destination.
This prevents a corrupt index file from later being treated as a
usable file, confusing readers.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 2 +-
http.c | 62 +++++++++++++++++++++++++++++++-----------------
pack-check.c | 15 ++++++++---
pack.h | 1 +
sha1_file.c | 6 +++-
t/t5550-http-fetch.sh | 15 ++++++++++++
6 files changed, 72 insertions(+), 29 deletions(-)
diff --git a/cache.h b/cache.h
index 4150603..0d101e4 100644
--- a/cache.h
+++ b/cache.h
@@ -905,7 +905,7 @@ struct extra_have_objects {
extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
extern int server_supports(const char *feature);
-extern struct packed_git *parse_pack_index(unsigned char *sha1);
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
extern void prepare_packed_git(void);
extern void reprepare_packed_git(void);
diff --git a/http.c b/http.c
index aa3e380..2d88034 100644
--- a/http.c
+++ b/http.c
@@ -897,47 +897,65 @@ int http_fetch_ref(const char *base, struct ref *ref)
}
/* Helpers for fetching packs */
-static int fetch_pack_index(unsigned char *sha1, const char *base_url)
+static char *fetch_pack_index(unsigned char *sha1, const char *base_url)
{
- int ret = 0;
- char *hex = xstrdup(sha1_to_hex(sha1));
- char *filename;
- char *url = NULL;
+ char *url, *tmp;
struct strbuf buf = STRBUF_INIT;
- if (has_pack_index(sha1)) {
- ret = 0;
- goto cleanup;
- }
-
if (http_is_verbose)
- fprintf(stderr, "Getting index for pack %s\n", hex);
+ fprintf(stderr, "Getting index for pack %s\n", sha1_to_hex(sha1));
end_url_with_slash(&buf, base_url);
- strbuf_addf(&buf, "objects/pack/pack-%s.idx", ...At least with smart transports do verify pack hash (since git-index-pack does check it). Maybe dumb HTTP transport grabs the index and pack and installs the downloaded versions (FAIL) instead of generating index from the pack (which would noitice the corruption)... -Ilari --
