Re: support for large packs and 64-bit offsets

Previous thread: Re: [PATCH 4/8] git-repack --max-pack-size: add fixup_header_footer() by Junio C Hamano on Sunday, April 8, 2007 - 8:04 pm. (9 messages)

Next thread: [PATCH 07/10] index-pack: learn about pack index version 2 by Nicolas Pitre on Monday, April 9, 2007 - 1:06 am. (2 messages)
To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Monday, April 9, 2007 - 1:06 am

This patch series adds support for large packs to GIT. It creates a new
index format to accommodate both larger offsets and to store a checksum
for each raw object in a pack. More details is provided in each patch
commit message.

This is not a replacement for the pack size limiting feature. I think that
GIT should support both: large packs and split packs.

Nicolas

-

To: Nicolas Pitre <nico@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, April 9, 2007 - 1:19 pm

I like this series. I haven't had time to test it myself yet,
and probably won't be able to do so before Junio merges it into a
pu or next release. But it looks OK on a first read.

It is unfortunate that we are changing the index file format without
also bringing in packv4 support at the same time. I have just been
too swamped in useless bulls**t in day-job work to spend time on
Git lately.

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Nicolas Pitre <nico@...>, Junio C Hamano <junkio@...>, <git@...>
Date: Monday, April 9, 2007 - 2:02 pm

Me too.. I just wonder whether it should have some more test coverage:
- force v2 index by default
- force a mode where a random smattering of index entries are done using
64-bit notation (even if they obviously won't need the high 33 bits)

As it is, nobody would normally even *use* the new format, much less the
new capabilities.. And if it's not used, it's not tested, and it's going
to have bugs.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, Shawn O. Pearce <spearce@...>, <git@...>
Date: Monday, April 9, 2007 - 2:26 pm

Easy enough. That's what I did for testing of course. And I'd hope it
would be always on by default eventually. The only reason why it is not

That could be achieved with an additional criteria, like the offset's
LSB value which should be pretty random.

Once Junio merges the series in a branch of its own, we could have a
separate patch not in that branch that simply forces those features on
in branch 'next'. People would only need to remenber not publishing
repos with that version using dumb protocols.

Nicolas
-

To: Linus Torvalds <torvalds@...>
Cc: Junio C Hamano <junkio@...>, Shawn O. Pearce <spearce@...>, <git@...>
Date: Monday, April 9, 2007 - 3:46 pm

Something like this, although I suppose this might require a more
permanent solution for the test suite:

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 8cf2871..8d20ea9 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -607,8 +607,12 @@ static void write_index_file(off_t last_obj_offset)
uint32_t array[256];
uint32_t index_version;

+#if 0
/* if last object's offset is >= 2^31 we should use index V2 */
index_version = (last_obj_offset >> 31) ? 2 : 1;
+#else
+ index_version = 2;
+#endif

/* index versions 2 and above need a header */
if (index_version >= 2) {
@@ -664,8 +668,15 @@ static void write_index_file(off_t last_obj_offset)
list = sorted_by_sha;
for (i = 0; i < nr_objects; i++) {
struct object_entry *entry = *list++;
+#if 0
uint32_t offset = (entry->offset <= 0x7fffffff) ?
entry->offset : (0x80000000 | nr_large_offset++);
+#else
+ /* force some offsets to the 64-bit table for testing */
+ uint32_t offset =
+ (!(entry->offset & 1) && entry->offset <= 0x7fffffff) ?
+ entry->offset : (0x80000000 | nr_large_offset++);
+#endif
offset = htonl(offset);
sha1write(f, &offset, 4);
}
@@ -675,7 +686,11 @@ static void write_index_file(off_t last_obj_offset)
while (nr_large_offset) {
struct object_entry *entry = *list++;
uint64_t offset = entry->offset;
+#if 0
if (offset > 0x7fffffff) {
+#else
+ if (offset & 1 || offset > 0x7fffffff) {
+#endif
uint32_t split[2];
split[0] = htonl(offset >> 32);
split[1] = htonl(offset & 0xffffffff);
diff --git a/index-pack.c b/index-pack.c
index a833f64..1e6db2b 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -718,8 +718,13 @@ static const char *write_index_file(const char *index_name, unsigned char *sha1)
die("unable to create %s: %s", index_name, strerror(errno));
f = sha1fd(fd, index_name);

+#if 0
/* if last object...

To: Nicolas Pitre <nico@...>
Cc: Junio C Hamano <junkio@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Monday, April 9, 2007 - 2:34 pm

Better to force features by config file option or command line
option, then by code hacked into a branch. This way we can enable
the feature all of the time in a test case, and push it. E.g. the
sliding window controls; nobody really uses them in practice (I
don't think anyway) but they are handy in the test cases to force
particular conditions, without creating huge temporary packfiles.

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, April 9, 2007 - 1:32 pm

Well... I still did index v2 with pack v4 in mind. The diference
between index v2 and v3 would be minimal.

Pack v4 is coming along. Slowly but still coming.

Nicolas
-

To: Nicolas Pitre <nico@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, April 9, 2007 - 1:43 pm

I take it you are working on it alone at this point? I'd love
to get back into it, but I don't think I've got the cycles for at
least a couple of weeks.

Here's something we didn't think about, but that occurred to me today
when reading this series: If we move the SHA-1 table out of the index
and into the packfile (like we are planning) dumb commit-walkers
(http-fetch) will have problems. Right now they download the
indexes of every available packfile to determine if they need to
download the corresponding packfile to obtain a needed object.

Moving the SHA-1 table from the index into the packfile will mean
the client cannot do this `optimization'. Instead it will need to
perform a byte-range request for part of the packfile to decide
if it needs to fetch the remainder of that packfile; or it must
download the entire packfile. Since not all HTTP servers support
byte-range requests the former may not always be viable and the
latter is obviously not a good idea.

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 3:49 pm

If we really care about older dumb clients, one option is to
generate not .idx but .idx2, and have a corresponding .idx only
to support them. But at that point, it's probably cleaner to
have an explicit option to produce .idx file of a particular
version, and tell people to pack public repositories they expect
older dumb clients to access with that option to keep things
backward compatible.

-

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 3:53 pm

Sure, fine. But I think you missed my point above - right now if
we move the SHA-1 table out of the .idx file I'm not sure we know
how to support the dumb clients *at all*. Even if they understand
the latest-and-greatest file formats...

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 4:18 pm

We do an incremental .keep pack for packs 100 objects or more.
If .idx omits SHA-1 values but keeps the crc and offset, that
would be around 12 bytes per object (but you may need an index
into the real SHA-1 table in the .pack file, I dunno) so that
would be 1200 bytes. If we duplicate SHA-1 table also in .idx
that would make it 32-byte per object, totalling 3200 bytes,
which admittedly is near 3-fold increase, but it may be worth if
we want to avoid the hassle.

-

To: Shawn O. Pearce <spearce@...>
Cc: Junio C Hamano <junkio@...>, <git@...>
Date: Monday, April 9, 2007 - 4:02 pm

The table could live in both the pack and the index for those repos
expected to be exportable through dumb protocols.

Nicolas
-

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.

While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-count-objects.c | 2 +-
builtin-fsck.c | 2 +-
builtin-pack-objects.c | 4 ++--
cache.h | 8 ++++----
pack-check.c | 4 ++--
sha1_file.c | 17 ++++++-----------
sha1_name.c | 2 +-
7 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 6263d8a..ff90ebd 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -111,7 +111,7 @@ int cmd_count_objects(int ac, const char **av, const char *prefix)
for (p = packed_git; p; p = p->next) {
if (!p->pack_local)
continue;
- packed += num_packed_objects(p);
+ packed += p->num_objects;
num_pack++;
}
printf("count: %lu\n", loose);
diff --git a/builtin-fsck.c b/builtin-fsck.c
index 4d8b66c..44a02d3 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -653,7 +653,7 @@ int cmd_fsck(int argc, char **argv, const char *prefix)
verify_pack(p, 0);

for (p = packed_git; p; p = p->next) {
- uint32_t i, num = num_packed_objects(p);
+ uint32_t i, num = p->num_objects;
for (i = 0; i < num; i++)
fsck_sha1(nth_packed_object_sha1(p, i));
}
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 45ac3e4..6bff17b 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -164,7 +164,7 @@ static int cmp_offset(const void *a_, const void *b_)
static void prepare_pack_revindex(struct pack_revindex *rix)
{
struct packed_git *p = rix-&gt...

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.

It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 2 +-
builtin-unpack-objects.c | 2 +-
git-compat-util.h | 8 ++++++++
index-pack.c | 2 +-
sha1_file.c | 2 +-
5 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 6bff17b..ee607a0 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1014,7 +1014,7 @@ static void check_object(struct object_entry *entry)
ofs = c & 127;
while (c & 128) {
ofs += 1;
- if (!ofs || ofs & ~(~0UL >> 7))
+ if (!ofs || MSB(ofs, 7))
die("delta base offset overflow in pack for %s",
sha1_to_hex(entry->sha1));
c = buf[used_0++];
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 3956c56..63f7db6 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -209,7 +209,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
base_offset = c & 127;
while (c & 128) {
base_offset += 1;
- if (!base_offset || base_offset & ~(~0UL >> 7))
+ if (!base_offset || MSB(base_offset, 7))
die("offset value overflow for delta base object");
pack = fill(1);
c = *pack;
diff --git a/git-compat-util.h b/git-compat-util.h
index 139fc19..bcfcb35 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -13,6 +13,14 @@

#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))

+#ifdef __GNUC__
+#define TYPEOF(x) (__typeof__(x))
+#else
+#define TYPEOF(x)
+#endif
+
+#define MSB(x, bits) ((x) & TYPEOF(x)(~0U...

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.

Better be safe than sorry.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 19 +++++++++++++------
builtin-unpack-objects.c | 17 +++++++++++------
index-pack.c | 14 ++++++++++----
3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index ee607a0..d0be879 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -369,7 +369,7 @@ static int revalidate_loose_object(struct object_entry *entry,
return check_loose_inflate(map, mapsize, size);
}

-static off_t write_object(struct sha1file *f,
+static unsigned long write_object(struct sha1file *f,
struct object_entry *entry)
{
unsigned long size;
@@ -503,16 +503,23 @@ static off_t write_one(struct sha1file *f,
struct object_entry *e,
off_t offset)
{
+ unsigned long size;
+
+ /* offset is non zero if object is written already. */
if (e->offset || e->preferred_base)
- /* offset starts from header size and cannot be zero
- * if it is written already.
- */
return offset;
- /* if we are deltified, write out its base object first. */
+
+ /* if we are deltified, write out base object first. */
if (e->delta)
offset = write_one(f, e->delta, offset);
+
e->offset = offset;
- return offset + write_object(f, e);
+ size = write_object(f, e);
+
+ /* make sure off_t is sufficiently large not to wrap */
+ if (offset > offset + size)
+ die("pack too large for current definition of off_t");
+ return offset + size;
}

static void write_pack_file(void)
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 63f7db6..f821906 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -...

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.

The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.

In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.

Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.

So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.

Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:

Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead ...

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

Same as previous patch but for index-pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
index-pack.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/index-pack.c b/index-pack.c
index 66fb0bc..d33f723 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -15,6 +15,7 @@ struct object_entry
off_t offset;
unsigned long size;
unsigned int hdr_size;
+ uint32_t crc32;
enum object_type type;
enum object_type real_type;
unsigned char sha1[20];
@@ -86,6 +87,7 @@ static unsigned char input_buffer[4096];
static unsigned int input_offset, input_len;
static off_t consumed_bytes;
static SHA_CTX input_ctx;
+static uint32_t input_crc32;
static int input_fd, output_fd, pack_fd;

/* Discard current buffer used content. */
@@ -128,6 +130,7 @@ static void use(int bytes)
{
if (bytes > input_len)
die("used more bytes than were available");
+ input_crc32 = crc32(input_crc32, input_buffer + input_offset, bytes);
input_len -= bytes;
input_offset += bytes;

@@ -224,8 +227,10 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
unsigned long size;
off_t base_offset;
unsigned shift;
+ void *data;

obj->offset = consumed_bytes;
+ input_crc32 = crc32(0, Z_NULL, 0);

p = fill(1);
c = *p;
@@ -276,7 +281,9 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
}
obj->hdr_size = consumed_bytes - obj->offset;

- return unpack_entry_data(obj->offset, obj->size);
+ data = unpack_entry_data(obj->offset, obj->size);
+ obj->crc32 = input_crc32;
+ return data;
}

static void *get_data_from_pack(struct object_entry *obj)
@@ -521,7 +528,7 @@ static void parse_pack_objects(unsigned char *sha1)
fputc('\n', stderr);
}

-static int write_compressed(int fd, void *in, unsigned int size)
+static int write_compressed(int fd, void *in, unsigned int size, uint32_t *obj_crc)
{
z_stream stream;
unsigned long maxsi...

To: Junio C Hamano <junkio@...>
Cc: Nicolas Pitre <nico@...>, <git@...>
Date: Monday, April 9, 2007 - 1:06 am

Pack index version 2 goes as follows:

- 8 bytes of header with signature and version.

- 256 entries of 4-byte first-level fan-out table.

- Table of sorted 20-byte SHA1 records for each object in pack.

- Table of 4-byte CRC32 entries for raw pack object data.

- Table of 4-byte offset entries for objects in the pack if offset is
representable with 31 bits or less, otherwise it is an index in the next
table with top bit set.

- Table of 8-byte offset entries indexed from previous table for offsets
which are 32 bits or more (optional).

- 20-byte SHA1 checksum of sorted object names.

- 20-byte SHA1 checksum of the above.

The object SHA1 table is all contiguous so future pack format that would
contain this table directly won't require big changes to the code. It is
also tighter for slightly better cache locality when looking up entries.

Support for large packs exceeding 31 bits in size won't impose an index
size bloat for packs within that range that don't need a 64-bit offset.
And because newer objects which are likely to be the most frequently used
are located at the beginning of the pack, they won't pay the 64-bit offset
lookup at run time either even if the pack is large.

Right now an index version 2 is created only when the biggest offset in a
pack reaches 31 bits. It might be a good idea to always use index version
2 eventually to benefit from the CRC32 it contains when reusing pack data
while repacking.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
builtin-pack-objects.c | 85 ++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 03e36f0..c7e0a42 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -169,13 +169,33 @@ static void prepare_pack_revindex(struct pack_revindex *rix)
int i;
const char *index = p->index_data;

- index += 4 * 256;
rix->revindex = xmalloc(sizeof(*rix->revinde...

To: Nicolas Pitre <nico@...>
Cc: <git@...>
Date: Monday, April 9, 2007 - 1:32 am

Although write_pack_file() iterates objects[] array in the
ascending order of index and calls write_one() for each of them,
in the presense of "we write base object before delta" rule, is
it always true that the last object in the objects[] array has
the largest offset?

-

To: Junio C Hamano <junkio@...>
Cc: <git@...>
Date: Monday, April 9, 2007 - 10:54 am

Oops, indeed it is not. It is true in the index-pack but not here.

Please could you amend this patch with the following:

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index c7e0a42..8cf2871 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -548,11 +548,11 @@ static off_t write_one(struct sha1file *f,
return offset + size;
}

-static void write_pack_file(void)
+static off_t write_pack_file(void)
{
uint32_t i;
struct sha1file *f;
- off_t offset;
+ off_t offset, last_obj_offset = 0;
struct pack_header hdr;
unsigned last_percent = 999;
int do_progress = progress;
@@ -575,6 +575,7 @@ static void write_pack_file(void)
if (!nr_result)
goto done;
for (i = 0; i < nr_objects; i++) {
+ last_obj_offset = offset;
offset = write_one(f, objects + i, offset);
if (do_progress) {
unsigned percent = written * 100 / nr_result;
@@ -592,9 +593,11 @@ static void write_pack_file(void)
if (written != nr_result)
die("wrote %u objects while expecting %u", written, nr_result);
sha1close(f, pack_file_sha1, 1);
+
+ return last_obj_offset;
}

-static void write_index_file(void)
+static void write_index_file(off_t last_obj_offset)
{
uint32_t i;
struct sha1file *f = sha1create("%s-%s.%s", base_name,
@@ -605,7 +608,7 @@ static void write_index_file(void)
uint32_t index_version;

/* if last object's offset is >= 2^31 we should use index V2 */
- index_version = (objects[nr_result-1].offset >> 31) ? 2 : 1;
+ index_version = (last_obj_offset >> 31) ? 2 : 1;

/* index versions 2 and above need a header */
if (index_version >= 2) {
@@ -1769,6 +1772,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (reuse_cached_pack(object_list_sha1))
;
else {
+ off_t last_obj_offset;
if (nr_result)
prepare_pack(window, depth);
if (progress == 1 && pack_to_stdout) {
@@ -1778,9 +1782,9 @@ int cmd_pack_objects(int argc, const char **argv, const ...

Previous thread: Re: [PATCH 4/8] git-repack --max-pack-size: add fixup_header_footer() by Junio C Hamano on Sunday, April 8, 2007 - 8:04 pm. (9 messages)

Next thread: [PATCH 07/10] index-pack: learn about pack index version 2 by Nicolas Pitre on Monday, April 9, 2007 - 1:06 am. (2 messages)