Re: My git repo is broken, how to fix it ?

Previous thread: [PATCH] contrib/continuous: a continuous integration build manager by Shawn O. Pearce on Monday, March 19, 2007 - 9:33 pm. (2 messages)

Next thread: New index-pack "keep" violates "never overwrite" by Shawn O. Pearce on Monday, March 19, 2007 - 10:38 pm. (3 messages)
From: Linus Torvalds
Date: Monday, March 19, 2007 - 10:34 pm

Ok, this is different from what I expected. 

Since your pack-file seems to pass its own internal SHA1 checks, it means 
that it was likely corrupt already when it was written out in the pack. 
What's interesting is that it seems to unpack, but then the SHA1 of the 
unpacked object doesn't match.

The reason I say that's interesting is that it would seem to mean that the 
zlib CRC/adler check didn't trigger - which probably means that the object 
was corrupted *before* it was compressed (but after it was originally 
SHA1-summed), or the compression itself was corrupting (eg a libz 
problem).

And since the SHA1 of the pack-file matches, the thing was apparently 
also written out "correctly" after compression (but by that "correctly" I 
obviously mean that the *corrupted* data was written out). 

Sadly, by the time it's in a pack-file, it is *really* hard to figure out 
what went wrong: I see your unpacked data, but it's really the packed raw 
objects that I wanted to look at, in case there would be some pattern in 
the actual corruption (the corruption will then result in random crud when 
actually unpacking, which is why the unpacked data isn't that interesting, 
simple because there's no pattern left to analyze - it got inflated to 

I doubt autocrlf affects anything here, it's only used at checkin and 
checkout time, and it wouldn't affect the raw internal git objects.

More interesting might be if you might be using any of the other flags 
that actually affect internal git object packing: "use_legacy_headers" in 
particular? If we have a bug there, that could be nasty.

But to really look at this we should probably add a "really_careful" flag 
that actually re-verifies the SHA1 on read so that we'd catch these kinds 

Ok, no problem. I added back the git list (but not your attachments, 
obviously) but as explained above, there is not a lot I can do with the 
unpacked data, I'd like to see the actual "raw" stuff.

I'm hoping somebody has any ideas. We really ...
From: Alexander Litvinov
Date: Monday, March 19, 2007 - 11:55 pm

I will try to stop using git-gc for some time to find out broken loose 
This is the all my config options:
$ git config -l
user.name=Alexander Litvinov
user.email=XXX
core.logallrefupdates=true
core.filemode=false
core.autocrlf=true
diff.color=auto
status.color=auto
apply.whitespace=strip
core.repositoryformatversion=0
core.filemode=false
core.bare=false
remote.origin.url=/home/lan/src/XXX
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.master.remote=origin
branch.master.merge=refs/heads/master
branch.XXX.remote=origin


I can live with such slowdown as far as cygwin not fast and I am ready to wait 
right now. I don't think the situation become realy worser than now :-)

-

From: Junio C Hamano
Date: Tuesday, March 20, 2007 - 12:42 am

At least, we could do something like this to catch the breakage
when we (re)pack, to prevent damage from propagating.


diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 73d448b..5d0692a 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -65,6 +65,7 @@ static int no_reuse_delta;
 static int local;
 static int incremental;
 static int allow_ofs_delta;
+static int revalidate_sha1;
 
 static struct object_entry **sorted_by_sha, **sorted_by_type;
 static struct object_entry *objects;
@@ -974,8 +975,31 @@ static void add_preferred_base(unsigned char *sha1)
 	it->pcache.tree_size = size;
 }
 
-static void check_object(struct object_entry *entry)
+static void check_object(struct object_entry *entry, int ith, unsigned *last)
 {
+	if (revalidate_sha1) {
+		unsigned char sha1[20];
+		enum object_type type;
+		unsigned long size;
+		void *buf;
+
+		buf = read_sha1_file(entry->sha1, &type, &size);
+		hash_sha1_file(buf, size, typename(type), sha1);
+		if (hashcmp(sha1, entry->sha1))
+			die("'%s': hash mismatch", sha1_to_hex(entry->sha1));
+		free(buf);
+
+		if (progress) {
+			unsigned percent = ith * 100 / nr_objects;
+			if (percent != *last || progress_update) {
+				fprintf(stderr, "%4u%% (%u/%u) done\r",
+					percent, ith, nr_objects);
+				progress_update = 0;
+				*last = percent;
+			}
+		}
+	}
+
 	if (entry->in_pack && !entry->preferred_base) {
 		struct packed_git *p = entry->in_pack;
 		struct pack_window *w_curs = NULL;
@@ -1082,10 +1106,16 @@ static void get_object_details(void)
 {
 	uint32_t i;
 	struct object_entry *entry;
+	unsigned last_percent = 999;
+
+	if (progress && revalidate_sha1)
+		fprintf(stderr, "Revalidating %u objects.\n", nr_objects);
 
 	prepare_pack_ix();
 	for (i = 0, entry = objects; i < nr_objects; i++, entry++)
-		check_object(entry);
+		check_object(entry, i+1, &last_percent);
+	if (progress && revalidate_sha1)
+		fputc('\n', stderr);
 
 	if (nr_objects == nr_result) {
 		/*
@@ ...
From: Nicolas Pitre
Date: Tuesday, March 20, 2007 - 8:23 am

I think it would be better to retest the SHA1 when we're about to 
_write_ the object out to the pack, replacing check_pack_inflate() and 
revalidate_loose_object() with the full SHA1 check, and testing objects 
which data isn't reused from a pack too.  And make it conditional on 
!pack_to_stdout like we already do of course.


Nicolas
-

Previous thread: [PATCH] contrib/continuous: a continuous integration build manager by Shawn O. Pearce on Monday, March 19, 2007 - 9:33 pm. (2 messages)

Next thread: New index-pack "keep" violates "never overwrite" by Shawn O. Pearce on Monday, March 19, 2007 - 10:38 pm. (3 messages)