A few months ago I produced a set of patches to allow git to work even
in the presence of pack corruption given that the corrupted objects have
a good duplicate in the object store. Turns out that this work was
rather incomplete and covered only a limited set of cases.This series extend coverage to all cases I could think about, and make
repack-objects able to create a good pack in such conditions to "fix"
the corruption without having to perform a full repack.Yes, this is all about the small and trivial patch I posted a while ago
that I intended to repost with a test case. Well, the test failed
miserably, resulting in this series before it finally all passed. ;-)builtin-pack-objects.c | 79 +++++++++++++++++-----
builtin-unpack-objects.c | 2 +
cache.h | 2 +-
index-pack.c | 2 +-
pack-revindex.c | 3 +-
sha1_file.c | 85 ++++++++++++++++++-----
t/t5302-pack-index.sh | 3 +-
t/t5303-pack-corruption-resilience.sh | 96 +++++++++++++++++++++++++--
8 files changed, 223 insertions(+), 49 deletions(-)Nicolas
--
Abstract
--------With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. this, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.The Context
-----------The Git database is all checksumed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly great...
Very nicely done. I've never seen a commit message that needs its own
This ought to belong to cache.h or some other header file. Perhaps you
did this to avoid unnecessary recompilation (we've discussed this ats/incure/incur/;
--
No, I just felt this wasn't a really public thing to expose at large.
But I don't feel really strongly about that.Revised patch follows.
--------
From: Nicolas Pitre <nico@cam.org>
Date: Tue, 28 Oct 2008 20:58:42 -0400
Subject: [PATCH] close another possibility for propagating pack corruptionAbstract
--------With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.The Context
-----------The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset wh...
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed. Catch those.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 4 ++--
builtin-unpack-objects.c | 2 ++
index-pack.c | 2 +-
sha1_file.c | 2 +-
4 files changed, 6 insertions(+), 4 deletions(-)diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 0366277..d4c721b 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1038,10 +1038,10 @@ static void check_object(struct object_entry *entry)
c = buf[used_0++];
ofs = (ofs << 7) + (c & 127);
}
- if (ofs >= entry->in_pack_offset)
+ ofs = entry->in_pack_offset - ofs;
+ if (ofs <= 0 || ofs >= entry->in_pack_offset)
die("delta base offset out of bound for %s",
sha1_to_hex(entry->idx.sha1));
- ofs = entry->in_pack_offset - ofs;
if (reuse_delta && !entry->preferred_base) {
struct revindex_entry *revidx;
revidx = find_pack_revindex(p, ofs);
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 9f4bdd3..47ed610 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -370,6 +370,8 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
base_offset = (base_offset << 7) + (c & 127);
}
base_offset = obj_list[nr].offset - base_offset;
+ if (base_offset <= 0 || base_offset >= obj_list[nr].offset)
+ die("offset value out of bound for delta base object");delta_data = get_data(delta_size);
if (dry_run || !delta_data) {
diff --git a/index-pack.c b/index-pack.c
index fe75332..60ed41a 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -338,7 +338,7 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
b...
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 2 +-
cache.h | 2 +-
sha1_file.c | 20 +++++++++++---------
3 files changed, 13 insertions(+), 11 deletions(-)diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index d4c721b..9e249c9 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1002,7 +1002,7 @@ static void check_object(struct object_entry *entry)
* We want in_pack_type even if we do not reuse delta
* since non-delta representations could still be reused.
*/
- used = unpack_object_header_gently(buf, avail,
+ used = unpack_object_header_buffer(buf, avail,
&entry->in_pack_type,
&entry->size);diff --git a/cache.h b/cache.h
index a3c77f0..1a9edf3 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@ extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t
extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
-extern unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
+extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
extern const char *packed_object_info_detail(struct packed_git *, off_t, u...
In the same spirit as commit 8eca0b47ff, let's try to survive a pack
corruption by making packed_object_info() able to fall back to alternate
packs or loose objects.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
sha1_file.c | 36 ++++++++++++++++++++++++++++++------
1 files changed, 30 insertions(+), 6 deletions(-)diff --git a/sha1_file.c b/sha1_file.c
index 7698177..384a430 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1314,8 +1314,10 @@ unsigned long get_size_from_delta(struct packed_git *p,
} while ((st == Z_OK || st == Z_BUF_ERROR) &&
stream.total_out < sizeof(delta_head));
inflateEnd(&stream);
- if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head))
- die("delta data unpack-initial failed");
+ if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
+ error("delta data unpack-initial failed");
+ return 0;
+ }/* Examine the initial part of the delta to figure out
* the result size.
@@ -1382,15 +1384,29 @@ static int packed_delta_info(struct packed_git *p,
off_t base_offset;base_offset = get_delta_base(p, w_curs, &curpos, type, obj_offset);
+ if (!base_offset)
+ return OBJ_BAD;
type = packed_object_info(p, base_offset, NULL);
+ if (type <= OBJ_NONE) {
+ struct revindex_entry *revidx = find_pack_revindex(p, base_offset);
+ const unsigned char *base_sha1 =
+ nth_packed_object_sha1(p, revidx->nr);
+ mark_bad_packed_object(p, base_sha1);
+ type = sha1_object_info(base_sha1, NULL);
+ if (type <= OBJ_NONE)
+ return OBJ_BAD;
+ }/* We choose to only get the type of the base object and
* ignore potentially corrupt pack file that expects the delta
* based on a base with a wrong size. This saves tons of
* inflate() calls.
*/
- if (sizep)
+ if (sizep) {
*sizep = get_size_from_delta(p, w_curs, curpos);
+ if (*sizep == 0)
+ type = OBJ_BAD;
+ }return type;
}
@@ -1500,8 +1516,9 @@ static int packed_object_info(struct ...
The check_object() function tries to get away with the least amount of
pack access possible when it already has partial information on given
object rather than calling the more costly packed_object_info().When things don't look right, it should just give up and fall back to
packed_object_info() directly instead of die()'ing.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 23 +++++++++++++++++------
1 files changed, 17 insertions(+), 6 deletions(-)diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9e249c9..b595d04 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1005,6 +1005,8 @@ static void check_object(struct object_entry *entry)
used = unpack_object_header_buffer(buf, avail,
&entry->in_pack_type,
&entry->size);
+ if (used == 0)
+ goto give_up;/*
* Determine if this is a delta and if so whether we can
@@ -1016,6 +1018,8 @@ static void check_object(struct object_entry *entry)
/* Not a delta hence we've already got all we need. */
entry->type = entry->in_pack_type;
entry->in_pack_header_size = used;
+ if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
+ goto give_up;
unuse_pack(&w_curs);
return;
case OBJ_REF_DELTA:
@@ -1032,16 +1036,20 @@ static void check_object(struct object_entry *entry)
ofs = c & 127;
while (c & 128) {
ofs += 1;
- if (!ofs || MSB(ofs, 7))
- die("delta base offset overflow in pack for %s",
- sha1_to_hex(entry->idx.sha1));
+ if (!ofs || MSB(ofs, 7)) {
+ error("delta base offset overflow in pack for %s",
+ sha1_to_hex(entry->idx.sha1));
+ goto give_up;
+ }
c = buf[used_0++];
ofs = (ofs << 7) + (c & 127);
}
ofs = entry->in_pack_offset - ofs;
- if (ofs <= 0 || ofs >= entry->in_pack_offset)
- die("delta base offset out of bound for %s",
- sha1_to_hex(e...
It currently calls die() whenever given offset is not found thinking
that such thing should never happen. But this offset may come from a
corrupted pack whych _could_ happen and not be found. Callers should
deal with this possibility gracefully instead.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 2 ++
pack-revindex.c | 3 ++-
sha1_file.c | 18 ++++++++++++------
3 files changed, 16 insertions(+), 7 deletions(-)diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index b595d04..963b432 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1053,6 +1053,8 @@ static void check_object(struct object_entry *entry)
if (reuse_delta && !entry->preferred_base) {
struct revindex_entry *revidx;
revidx = find_pack_revindex(p, ofs);
+ if (!revidx)
+ goto give_up;
base_ref = nth_packed_object_sha1(p, revidx->nr);
}
entry->in_pack_header_size = used + used_0;
diff --git a/pack-revindex.c b/pack-revindex.c
index 6096b62..1de53c8 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -140,7 +140,8 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
else
lo = mi + 1;
} while (lo < hi);
- die("internal error: pack revindex corrupt");
+ error("bad offset for revindex");
+ return NULL;
}void discard_revindex(void)
diff --git a/sha1_file.c b/sha1_file.c
index 384a430..9ce1df0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1388,9 +1388,12 @@ static int packed_delta_info(struct packed_git *p,
return OBJ_BAD;
type = packed_object_info(p, base_offset, NULL);
if (type <= OBJ_NONE) {
- struct revindex_entry *revidx = find_pack_revindex(p, base_offset);
- const unsigned char *base_sha1 =
- nth_packed_object_sha1(p, revidx->nr);
+ struct revindex_entry *revidx;
+ const unsigned char *base_sha1;
+ revidx = find_pack_revindex(p, base_offset);
+ if (!revidx)
+ return OBJ_BAD;
+ base_sha1...
When the pack data to be reused is found to be bad, let's fall back to
full object access through the generic path which has its own strategies
to find alternate object sources in that case. This allows for "fixing"
a corrupted pack simply by copying either another pack containing the
object(s) found to be bad, or the loose object itself, into the object
store and launch a repack without the need for -f.Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 28 +++++++++++++++++++---------
1 files changed, 19 insertions(+), 9 deletions(-)diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 963b432..826c762 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -277,6 +277,7 @@ static unsigned long write_object(struct sha1file *f,
*/if (!to_reuse) {
+ no_reuse:
if (!usable_delta) {
buf = read_sha1_file(entry->idx.sha1, &type, &size);
if (!buf)
@@ -358,20 +359,30 @@ static unsigned long write_object(struct sha1file *f,
struct revindex_entry *revidx;
off_t offset;- if (entry->delta) {
+ if (entry->delta)
type = (allow_ofs_delta && entry->delta->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
- reused_delta++;
- }
hdrlen = encode_header(type, entry->size, header);
+
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
datalen = revidx[1].offset - offset;
if (!pack_to_stdout && p->index_version > 1 &&
- check_pack_crc(p, &w_curs, offset, datalen, revidx->nr))
- die("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1));
+ check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) {
+ error("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1));
+ unuse_pack(&w_curs);
+ goto no_reuse;
+ }
+
offset += entry->in_pack_header_size;
datalen -= entry->in_pack_header_size;
+ if (!pack_to_stdout && p->index_v...
Signed-off-by: Nicolas Pitre <nico@cam.org>
---
t/t5303-pack-corruption-resilience.sh | 96 ++++++++++++++++++++++++++++++---
1 files changed, 89 insertions(+), 7 deletions(-)diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh
index 31b20b2..ac181ea 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -41,11 +41,17 @@ create_new_pack() {
git verify-pack -v ${pack}.pack
}+do_repack() {
+ pack=`printf "$blob_1\n$blob_2\n$blob_3\n" |
+ git pack-objects $@ .git/objects/pack/pack` &&
+ pack=".git/objects/pack/pack-${pack}"
+}
+
do_corrupt_object() {
ofs=`git show-index < ${pack}.idx | grep $1 | cut -f1 -d" "` &&
ofs=$(($ofs + $2)) &&
chmod +w ${pack}.pack &&
- dd if=/dev/zero of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs &&
+ dd of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs &&
test_must_fail git verify-pack ${pack}.pack
}@@ -60,7 +66,7 @@ test_expect_success \
test_expect_success \
'create corruption in header of first object' \
- 'do_corrupt_object $blob_1 0 &&
+ 'do_corrupt_object $blob_1 0 < /dev/zero &&
test_must_fail git cat-file blob $blob_1 > /dev/null &&
test_must_fail git cat-file blob $blob_2 > /dev/null &&
test_must_fail git cat-file blob $blob_3 > /dev/null'
@@ -119,7 +125,7 @@ test_expect_success \
'create corruption in header of first delta' \
'create_new_pack &&
git prune-packed &&
- do_corrupt_object $blob_2 0 &&
+ do_corrupt_object $blob_2 0 < /dev/zero &&
git cat-file blob $blob_1 > /dev/null &&
test_must_fail git cat-file blob $blob_2 > /dev/null &&
test_must_fail git cat-file blob $blob_3 > /dev/null'
@@ -134,6 +140,15 @@ test_expect_success \
git cat-file blob $...
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| debian developer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Adrian Bunk | Re: LSM conversion to static interface |
git: | |
| Gerrit Renker | [PATCH 26/37] dccp: Integration of dynamic feature activation - part 1 (socket set... |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Frans Pop | svc: failed to register lockdv1 RPC service (errno 97). |
| Linus Torvalds | Re: [GIT]: Networking |
