index-pack died on pread

Previous thread: [ANNOUNCE]: PyGit and libgit-thin by Luiz Fernando N. Capitulino on Monday, July 23, 2007 - 8:35 am. (5 messages)

Next thread: [PATCH] Test case for "git diff" outside a git repo by Steven Grimm on Monday, July 23, 2007 - 9:22 am. (1 message)
To: GIT <git@...>
Date: Monday, July 23, 2007 - 8:52 am

Hello,

it's more and more common that I get an index-pack death for pread
that returns 0... Did anybody encountered the same?

Some more details:
# uname -a
HP-UX aa B.11.11 U 9000/800 1009938148 unlimited-user license
# git --version
git version 1.5.2.4
# git-clone git://git.kernel.org/pub/scm/git/git
Initialized empty Git repository in /home/tpiiuser/mr/git/.git/
remote: Generating pack...
remote: Done counting 55910 objects.
remote: Deltifying 55910 objects...
remote: 100% (55910/55910) done
Indexing 55910 objects...
remote: Total 55910 (delta 39003), reused 55304 (delta 38552)
100% (55910/55910) done
Resolving 39003 deltas...
fatal: cannot pread pack file: No such file or directory (n=0,
errno=2, fd=3, ptr=40452958, len=428, rdy=0, off=123601)
fatal: index-pack died with error code 128
fetch-pack from 'git://git.kernel.org/pub/scm/git/git' failed.

... please note that I added printing details to die() for pread.

HPUX pread manpage seems to be here:
http://modman.unixdev.net/?sektion=2&page=pread&manpath=HP-UX-11.11

Michal

PS: Please CC me.
-

To: Michal Rokos <michal.rokos@...>
Cc: GIT <git@...>
Date: Monday, July 23, 2007 - 1:04 pm

Ok, that's bogus. When "n" is zero, the errno (and thus the error string)
is not changed by pread, so that's a very misleading error report.

So what seems to have happened is that the pack-file is too short, so we
got a return value of 0, and then reported it as if it had an errno.

The reason for returning zero from pread would be:

- broken pread. I don't think HPUX should be a problem, so that's
probably not it.

- the pack-file got truncated

- the offset is corrupt, and points to beyond the size of the packfile.

In this case, since the offset is just 123601, I suspect it's a truncation
issue, and your pack-file is simply corrupt. Either because of some

One thing to look out for is that the "git.kernel.org" machines aren't the
"primary" ones, and the data gets mirrored from other machines. If the
mirroring is incomplete, I could imagine that the remote side simply ended
up terminating the connection, and you ended up with a partial pack-file.

Some of the kernel.org machines ran out of disk space the other day, so
maybe you happened to hit it in an unlucky window. Does it still happen?

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Wednesday, July 25, 2007 - 7:15 pm

m

To: Robin Rosenberg <robin.rosenberg.lists@...>
Cc: Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Wednesday, July 25, 2007 - 7:44 pm

Interesting.

It's true that pread() is used much less than normal reads, and maybe the
cygwin pread() is indeed broken. But it's intriguing how apparently both
HP-UX and Cygwin are showing the same breakage.

But if git was doing something odd with pread(), then NO_PREAD shouldn't
work either, because that just enables the exact same code, except we now
supply a pread emulation library function (which does it using "lseek()":
not a *good* emulation, but good enough for the very limited use that git
puts it to).

So yeah, it looks like pread() is broken under cygwin and hpux-11.11.

Googling for "pread" "cygwin" does seem to show that it's been buggy at
times. Eg:

Fixes since 1.5.21-1:

...

- Fix pread in case current file offset is non-zero. (Hideki Iwamoto)

but I'm a bit surprised that hp-ux has the bug. pread isn't *that*
complex.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Thursday, July 26, 2007 - 8:42 am

Maybe because neither _has_ POSIX pread? This is cygwin's pread, I believe:

http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/cygwin/fhandler_disk_fil...

Windows has means to implement it natively, but the interface is so ugly
(see OVERLAPPED, if interested), that it is no wonder Cygwin chose to
just use lseek.
-

To: Alex Riesen <raa.lkml@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Thursday, July 26, 2007 - 12:13 pm

HP-UX? No pread()? It wouldn't link if it didn't have pread(). So it
clearly has pread(), it's just somehow broken.

And no, git doesn't need "POSIX pread". Look at what git does if you say

I'm not saying that's great programming, but the "git_pread()" that git
will use in the absense of a real pread() is actually even *less* of a
POSIX pread, since it doesn't even try to save/restore the old position
(it knows that git doesn't care).

So I don't think that explains why git doesn't like cygwin's pread. I
suspect an earlier version of cygwin had an even more broken version of
pread at some point.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, Alex Riesen <raa.lkml@...>, GIT <git@...>
Date: Thursday, July 26, 2007 - 11:43 pm

If you mean the offset associated with fd, we actually do.

The original HP-UX error is confusing, as we ask pread() to
transfer 428 bytes and it returns 0 (not returning -1 with
EINTR). Return value of zero is understandable, if the starting
position is at or after the EOF, but the offset is 123601 and
56k objects packed from git.git repository should be longer than
that, so that also sounds implausible.

-

To: Junio C Hamano <gitster@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, Alex Riesen <raa.lkml@...>, GIT <git@...>
Date: Friday, July 27, 2007 - 1:36 am

Ahh, for some reason I thought we didn't (probably because the user likely

Yeah. It is suspicious.

If somebody could run git under gdb on HP-UX (preferably compiled
statically), and just disassemble the pread() thing, it would be
interesting.

PA-RISC assembly is *almost* entirely unreadable, but it might show
whether hpux-11.11 actually has a pread() system call or whether it is
doing the "emulate with lseek" and maybe obviously buggily at that..

It really isn't that complex a system call. So I'm surprised at bugs
there, and that makes me worry that there is something in git that
triggers this.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, Alex Riesen <raa.lkml@...>, Junio C Hamano <gitster@...>, GIT <git@...>
Date: Friday, July 27, 2007 - 9:38 am

The (only) user in Git does care indeed. The sequence of events goes
like this:

- Receive pack from remote location, parsing objects along and writing
a copy to the local pack file, as well as computing the SHA1 of non
delta objects.

- When all objects are parsed, the received pack SHA1 is verified.
If SHA1 doesn't match due to corruption in received data then everything
stops here with a "pack is corrupted (SHA1 mismatch)" error.

- For each received deltas: resolve those deltas to compute their
object SHA1. This is where pread() is involved on the local pack
file. If pread() fails to return proper data then the data won't
inflate properly and everything stops here with a "serious inflate
inconsistency" error.

- [OPTIONAL] Complete a thin pack with missing base objects for deltas
that weren't resolved yet. This involves pread() again, but it
_also_ append data to the same local pack file in the process. In
this case it is important that pread() restores the file position
when it returns or the appended objects won't be written where they
should.

This is optional because a thin pack is not received all the time,
not on a clone for example.

- Write trailing SHA1 to the local pack before moving it to its final
location. This also relies on pread() restoring the file position on
the local pack file or the trailing pack SHA1 won't be written where
it should.

So if pread() doesn't properly restore the file position then local pack
corruption will occur and the pack will be unusable. If pread() doesn't

Well, we have usage cases for a real pread() as well as our own
emulation which work. And the emulated pread() works in all cases so
far, so...

Nicolas
-

To: GIT <git@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, Alex Riesen <raa.lkml@...>, Junio C Hamano <gitster@...>, GIT <git@...>
Date: Friday, July 27, 2007 - 5:50 am

Maybe this may help:

I compiled Git under Linux with NO_PREAD=1, and also _disabled_
file position save/restore in git_pread(). Now (I clone small repo to
save traffic):

moonlight:/tmp$ git-clone git://people.freedesktop.org/~keithp/parsecvs
Initialized empty Git repository in /tmp/parsecvs/.git/
remote: Generating pack...
remote: Done counting 476 objects.
remote: Deltifying 476 objects.
remote: 100% (476/476) done
Indexing 476 objects...
remote: Total 476 (delta 339), reused 0 (delta 0)
100% (476/476) done
Resolving 339 deltas...
100% (339/339) done
error: packfile /tmp/parsecvs/.git/objects/pack/pack-dda2f32249fab26059a035bd273dce9feaf6bade.pack does not match index
error: packfile /tmp/parsecvs/.git/objects/pack/pack-dda2f32249fab26059a035bd273dce9feaf6bade.pack cannot be accessed
error: packfile /tmp/parsecvs/.git/objects/pack/pack-dda2f32249fab26059a035bd273dce9feaf6bade.pack does not match index
error: packfile /tmp/parsecvs/.git/objects/pack/pack-dda2f32249fab26059a035bd273dce9feaf6bade.pack cannot be accessed
fatal: failed to unpack tree object HEAD

Errors are different, but seems Git is sensitive to pread() that messes
file position. The problem may be in index-pack.c:

static const char *open_pack_file(const char *pack_name)
{
if (from_stdin) {
...
pack_fd = output_fd;
} else {
...
pack_fd = input_fd;
}
...
}

There's no dup() call, so when we mess pack_fd (that is used in
pread() only), we also mess one more file descriptor that is used
sequentially (output_fd in my case), and so may corrupt the pack.

--
Tomash Brechko
-

To: GIT <git@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, Alex Riesen <raa.lkml@...>, Junio C Hamano <gitster@...>
Date: Friday, July 27, 2007 - 6:33 am

I was wrong on the dup() part, since dup()'ed descriptors share the
same file position. Anyway, if my guess is right, the fix would
probably be not to use broken pread() that messes file position,
rather than to be ready and workaround that.

--
Tomash Brechko
-

To: Linus Torvalds <torvalds@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Thursday, July 26, 2007 - 12:51 pm

Not that. I meant the value read returned wasn't checked nor returned.
Suppose read failed (on Windows it happens all the time, especially if you
stress it a bit) - you'll never know it did and the buffer will contain either
garbage or previous data. Now imagine we _did_ have a past-eof

Yep, that could be another reason.
-

To: Alex Riesen <raa.lkml@...>
Cc: Robin Rosenberg <robin.rosenberg.lists@...>, Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Thursday, July 26, 2007 - 2:02 pm

So? The point is, that *works*. We do it ourselves.

"emulated with lseek" is trivial, and should work. It worries me that two
totally independent systems are showing problems. There's something else
going on here.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Michal Rokos <michal.rokos@...>, GIT <git@...>
Date: Monday, July 23, 2007 - 2:03 pm

I doubt it can be that. pread() is always used on pack data that we
already received and validated, and part of the validation is the final
pack SHA1. No code path leads to pread() before the final pack SHA1 is
tested OK.

The only way for the received pack to be truncated and pread() to fail
is if write_or_die() somehow failed to write the pack data without Git
noticing.

Nicolas
-

To: Michal Rokos <michal.rokos@...>
Cc: GIT <git@...>
Date: Monday, July 23, 2007 - 11:32 am

strange. pread(2) should not return ENOENT. Not in HP-UX
not anywhere.

Could you recompile with NO_PREAD=1 and try again?
Maybe HP-UX pread(2) implementation is just broken.
-

To: Alex Riesen <raa.lkml@...>
Cc: GIT <git@...>
Date: Wednesday, July 25, 2007 - 4:07 pm

Alex,

When I tried to recompile with NO_PREAD=1 the problem disappeared.

Do you want me to try something more with it?

Michal
-

To: Michal Rokos <michal.rokos@...>
Cc: Junio C Hamano <junkio@...>, Linus Torvalds <torvalds@...>, GIT <git@...>
Date: Wednesday, July 25, 2007 - 4:48 pm

No. Put NO_PREAD=1 in your config.mak on HP-UX.

-

Previous thread: [ANNOUNCE]: PyGit and libgit-thin by Luiz Fernando N. Capitulino on Monday, July 23, 2007 - 8:35 am. (5 messages)

Next thread: [PATCH] Test case for "git diff" outside a git repo by Steven Grimm on Monday, July 23, 2007 - 9:22 am. (1 message)