Ok, this is different from what I expected. Since your pack-file seems to pass its own internal SHA1 checks, it means that it was likely corrupt already when it was written out in the pack. What's interesting is that it seems to unpack, but then the SHA1 of the unpacked object doesn't match. The reason I say that's interesting is that it would seem to mean that the zlib CRC/adler check didn't trigger - which probably means that the object was corrupted *before* it was compressed (but after it was originally SHA1-summed), or the compression itself was corrupting (eg a libz problem). And since the SHA1 of the pack-file matches, the thing was apparently also written out "correctly" after compression (but by that "correctly" I obviously mean that the *corrupted* data was written out). Sadly, by the time it's in a pack-file, it is *really* hard to figure out what went wrong: I see your unpacked data, but it's really the packed raw objects that I wanted to look at, in case there would be some pattern in the actual corruption (the corruption will then result in random crud when actually unpacking, which is why the unpacked data isn't that interesting, simple because there's no pattern left to analyze - it got inflated to I doubt autocrlf affects anything here, it's only used at checkin and checkout time, and it wouldn't affect the raw internal git objects. More interesting might be if you might be using any of the other flags that actually affect internal git object packing: "use_legacy_headers" in particular? If we have a bug there, that could be nasty. But to really look at this we should probably add a "really_careful" flag that actually re-verifies the SHA1 on read so that we'd catch these kinds Ok, no problem. I added back the git list (but not your attachments, obviously) but as explained above, there is not a lot I can do with the unpacked data, I'd like to see the actual "raw" stuff. I'm hoping somebody has any ideas. We really ...
I will try to stop using git-gc for some time to find out broken loose This is the all my config options: $ git config -l user.name=Alexander Litvinov user.email=XXX core.logallrefupdates=true core.filemode=false core.autocrlf=true diff.color=auto status.color=auto apply.whitespace=strip core.repositoryformatversion=0 core.filemode=false core.bare=false remote.origin.url=/home/lan/src/XXX remote.origin.fetch=+refs/heads/*:refs/remotes/origin/* branch.master.remote=origin branch.master.merge=refs/heads/master branch.XXX.remote=origin I can live with such slowdown as far as cygwin not fast and I am ready to wait right now. I don't think the situation become realy worser than now :-) -
At least, we could do something like this to catch the breakage
when we (re)pack, to prevent damage from propagating.
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 73d448b..5d0692a 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -65,6 +65,7 @@ static int no_reuse_delta;
static int local;
static int incremental;
static int allow_ofs_delta;
+static int revalidate_sha1;
static struct object_entry **sorted_by_sha, **sorted_by_type;
static struct object_entry *objects;
@@ -974,8 +975,31 @@ static void add_preferred_base(unsigned char *sha1)
it->pcache.tree_size = size;
}
-static void check_object(struct object_entry *entry)
+static void check_object(struct object_entry *entry, int ith, unsigned *last)
{
+ if (revalidate_sha1) {
+ unsigned char sha1[20];
+ enum object_type type;
+ unsigned long size;
+ void *buf;
+
+ buf = read_sha1_file(entry->sha1, &type, &size);
+ hash_sha1_file(buf, size, typename(type), sha1);
+ if (hashcmp(sha1, entry->sha1))
+ die("'%s': hash mismatch", sha1_to_hex(entry->sha1));
+ free(buf);
+
+ if (progress) {
+ unsigned percent = ith * 100 / nr_objects;
+ if (percent != *last || progress_update) {
+ fprintf(stderr, "%4u%% (%u/%u) done\r",
+ percent, ith, nr_objects);
+ progress_update = 0;
+ *last = percent;
+ }
+ }
+ }
+
if (entry->in_pack && !entry->preferred_base) {
struct packed_git *p = entry->in_pack;
struct pack_window *w_curs = NULL;
@@ -1082,10 +1106,16 @@ static void get_object_details(void)
{
uint32_t i;
struct object_entry *entry;
+ unsigned last_percent = 999;
+
+ if (progress && revalidate_sha1)
+ fprintf(stderr, "Revalidating %u objects.\n", nr_objects);
prepare_pack_ix();
for (i = 0, entry = objects; i < nr_objects; i++, entry++)
- check_object(entry);
+ check_object(entry, i+1, &last_percent);
+ if (progress && revalidate_sha1)
+ fputc('\n', stderr);
if (nr_objects == nr_result) {
/*
@@ ...I think it would be better to retest the SHA1 when we're about to _write_ the object out to the pack, replacing check_pack_inflate() and revalidate_loose_object() with the full SHA1 check, and testing objects which data isn't reused from a pack too. And make it conditional on !pack_to_stdout like we already do of course. Nicolas -
