Look at the diffs between ad6571a78ac74e9fa27e581834709067dba459af and it's parent with and without rename detection enabled. (In linux-2.6 git tree) (formated for narrow screens) $ REV=ad6571a78ac74e9fa27e581834709067dba459af 0000000000000000000000000000000000000000 D include/asm-ppc64/termios.h 0000000000000000000000000000000000000000 D include/asm-ppc/termios.h Notice how the the fact that include/asm-ppc64/termios.h is deleted gets lost? Looks broken to me. -Wayne -
It looks broken to me, too. I rebuilt from a reasonably ancient source (v0.99) and re-run the test but I could not get it to produce 'A' for include/asm-powerpc/termios.h. So I rewound it further to 4d235c8044a638108b67e22f94b2876657130fc8 commit, which is really ancient version, but it still says it is renamed from asm-ppc64 directory. FWIW, all the v0.99* tagged versions seem to detect that rename correctly and not lose anything in my tests. Which version of git do you run and on what platform? It might be that something in the diffcore chain is broken in non-i386 and/or non-GNU/Linux and/or non-GCC environment. Shoot, I thought it would be a good practice-case for me to use 'git bisect' in reverse to find the commit that fixed a bug ;-). My copy of linux-2.6 repository for testing is fully packed so I could not try the commit that introduced diffcore-rename.c, but that is what I really wanted to try. -
I suspect that what is deleted is not asm-ppc64/termios.h but asm-ppc/termios.h. The below is an output without grep which seems to confuse things. $ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af | sed -ne 's/^.* \([AMCRDU]\)/\1/p' R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h R098 include/asm-ppc64/termios.h include/asm-powerpc/termios.h D include/asm-ppc/mman.h D include/asm-ppc/termbits.h D include/asm-ppc/termios.h $ git-diff-tree -r ad6571a78ac74e9fa27e581834709067dba459af | sed -ne 's/^.* \([AMCRDU]\)/\1/p' A include/asm-powerpc/mman.h A include/asm-powerpc/termbits.h A include/asm-powerpc/termios.h D include/asm-ppc/mman.h D include/asm-ppc/termbits.h D include/asm-ppc/termios.h D include/asm-ppc64/mman.h D include/asm-ppc64/termbits.h D include/asm-ppc64/termios.h The first 3 A and last 3 D are accounted for in the -M output as renames from asm-ppc64 to asm-powerpc. Middle 3 D from asm-ppc are shown in the -M output. So I do not think we are losing anything. Am I missing something? -
Odd. I get the same answer on my x86 box: $ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af | sed -ne 's/^.* \([AMCRDU]\)/\1/p' R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h R098 include/asm-ppc64/termios.h include/asm-powerpc/termios.h D include/asm-ppc/mman.h D include/asm-ppc/termbits.h D include/asm-ppc/termios.h But here is the output on my quad xeon running in 64-bit mode: (fedora core 2) $ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af | sed -ne 's/^.* \([AMCRDU]\)/\1/p' R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h A include/asm-powerpc/termios.h D include/asm-ppc/mman.h D include/asm-ppc/termbits.h D include/asm-ppc/termios.h This is the same version of git both rebuilt just for this test. However, I noticed a whole collection of errors from valgrind when I run this command line: ==13457== Invalid read of size 4 ==13457== at 0x805402C: locate_rename_dst (diffcore-rename.c:28) ==13457== by 0x805464B: diffcore_rename (diffcore-rename.c:356) ==13457== by 0x805249D: diffcore_std (diff.c:1093) ==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273) ==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298) ==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363) ==13457== by 0x804A884: main (diff-tree.c:551) ==13457== Address 0x1BBD781C is 20 bytes inside a block of size 71 free'd ==13457== at 0x1B9003C3: free (vg_replace_malloc.c:235) ==13457== by 0x805205B: diff_free_filepair (diff.c:775) ==13457== by 0x805422A: diffcore_rename (diffcore-rename.c:415) ==13457== by 0x805249D: diffcore_std (diff.c:1093) ==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273) ==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298) ==13457== by 0x804A30B: diff_tre...
I get even more, including: ==3234== Use of uninitialised value of size 4 ==3234== at 0x80507C1: alloc_filespec (diff.c:224) ==3234== by 0x8052387: diff_addremove (diff.c:1144) ==3234== by 0x8049B74: show_file (diff-tree.c:97) ==3234== by 0x8049E17: diff_tree (diff-tree.c:118) Damn, too bad valgrind doesn't work on ppc64, so I can't use it on my main machine. It seems to be in development on ppc32, so maybe some day. I'll look at it on my other machines instead, Linus -
How about... today? :-) My port of Valgrind-2.4.1 to ppc32 works pretty well. You can get it from: http://www.valgrind.org/downloads/variants.html?pmk I assume you're compiling git as 32-bit executables on your G5. I don't see any reason why the git binaries would need to be 64-bit. Regards, Paul. -
Well, git seems to assume it can mmap() the entirety of any file under its control, so a 64-bit git could handle larger files. Still, I'm using 32-bit git on ppc64. -hpa -
But beware that the delta code would break since it currently won't cope with any file offset larger than 32 bits. I reserved the zero byte to prefix any extended encoding but felt it wasn't really needed yet and I just didn't bother coding it. Nicolas -
Right now git isn't 64-bit clean anyway (well, it should _work_ fine on 64-bit architectures, but it just can't handle files >32 bits). I think the core object format should be all perfectly ok (since all the sizes etc are in ascii), so this is not a huge deal. If you have an existing git archive and suddenly notice that you need 32+ bit file formats, you don't have to throw your archive away and re-generate it from scratch in some new format. In fact, I think even the streaming pack format is 64-bit-safe: it has binary sizes, but they are all length-encoded, and I think the data structure is safe. Also, the index file - in order to not be unnecessarily big, and to be able to use standard htonl/ntohl helpers etc - uses 32-bit lengths for files. Even that _that_ should be 64-bit safe, because the length isn't actually _used_ for anything but to verify the curret state, and as with the timestamps etc, it's ok to "only" check the low 32 bits. So the on-disk format should be capable of handling 64-bit entities. HOWEVER. I might be wrong. I tried to think about it, but I didn't care too much, because I think git would suck at truly huge files. If only because compressing them will take forever. And the _implementation_ uses "unsigned int" and has never been tested with anything else, so it would need a lot of testing. Also, the "pack index" file can only handle 32-bit offsets - you can make a pack-file that is bigger than that, and it should be fine from a _streaming_ standpoint (ie in the way we use them for network transport), but you can't index them in .git/objects/packs. Which isn't a disaster: you might choose to say that you never pack huge files. That might be ok for some cases (maybe the huge file is a one-off satellite picture). It would suck if the huge file is a incrementally created log-file, where packing really would be nice. IF we ever hit this, and IF people decide that git actually makes sense for those kinds of files, we CAN cha...
Which is a valid thing to do because git also assumes that a file offset fits within 32-bit as far as I know. Each object in a packfile should also fit in 32-bit offset and the size of one packfile also needs to be within 32-bit offset. I was unsure if the last limitation would cause problems in the real life but we will see in a couple of years ;-). -
I use whatever the defaults are. And yes, it seems to me -m32. Linus -
FWIW, I think I fixed the leak that patch we reverted was attempting to fix, and on my i386 box valgrind seems much happier. -
I meant to say explicitly that it runs quite happily under a ppc64 kernel, but only does 32-bit executables at present. Paul. -
It works, but it does end up complaining about things like ==23756== Invalid read of size 4 ==23756== at 0x25A38990: strlen (in /lib/libc-2.3.5.so) .. ==23756== Address 0x25B86754 is 3 bytes after a block of size 17 alloc'd which seems to be just strlen prefetching the next word or something like that. But it's still nicer than not having it at all, even if it appears I'll have to do some filtering of my own. Linus -
The strlen() in glibc for ppc is unbearably clever hand-coded
assembly, which loads up 8 bytes at a time (once it has the address
8-byte aligned), and does various ANDs and ORs and ADDs and
conditional branches. If some of the 8 bytes aren't defined, it will
in many cases branch one way or the other based on the undefined
bytes, but end up computing the same result on either branch.
Valgrind is right in that strlen is loading up some bytes that are
past the end of a malloc'd block. In fact those bytes don't end up
affecting the result, and in fact the load couldn't cause a segfault,
but it's not surprising that Valgrind can't see that, since the value
of the extra bytes can actually affect whether a conditional branch is
taken or not, but we end up with the same result either way.
Valgrind sets LD_PRELOAD so that you get a simple Valgrind-supplied
set of string functions, including strlen, from vgpreload_memcheck.so
rather than the fancy glibc ones. However, that doesn't seem to catch
the calls to strlen from inside glibc - the call from vfprintf is a
direct branch rather than going through the PLT, for instance.
I could add a suppression to suppress all errors in strlen, but that
would mean you would miss real errors, where the string is not
null-terminated within the malloc'd block, and strlen runs off the
end.
I wish I had a good answer for this problem, but I don't. Maybe we
need a debugging version of glibc that doesn't use the fancy
bit-fiddling algorithms in the string functions.
(Just for interest: here are the comments from strlen.S:
1) Given a word 'x', we can test to see if it contains any 0 bytes
by subtracting 0x01010101, and seeing if any of the high bits of each
byte changed from 0 to 1. This works because the least significant
0 byte must have had no incoming carry (otherwise it's not the least
significant), so it is 0x00 - 0x01 == 0xff. For all other
byte values, either they have the high bit set initially, or when
...Just curious: Why is it using LD_PRELOAD and not the VGs symbol redirection mechanism, which should catch strlen even if used inside of glibc? Josef -
Valgrind-2.4.1 does symbol redirects for stpcpy and strnlen in libc.so.6, and stpcpy and strchr in ld-linux.so.2. I could extend that list on ppc to include strlen et al., but I don't know if that would solve the problem for ld.so, since strlen doesn't appear in the symbol table for ld.so. Paul. -
This simplifies and fixes the initialization of a "diff_filespec" when
allocated.
The old code would not initialize "sha1_valid". Noticed by valgrind.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
This does not fix the issue Wayne saw, but I'm not going to look at the
later valgrind errors before I've fixed the first ones.
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -214,14 +214,10 @@ struct diff_filespec *alloc_filespec(con
{
int namelen = strlen(path);
struct diff_filespec *spec = xmalloc(sizeof(*spec) + namelen + 1);
+
+ memset(spec, 0, sizeof(*spec));
spec->path = (char *)(spec + 1);
- strcpy(spec->path, path);
- spec->should_free = spec->should_munmap = 0;
- spec->xfrm_flags = 0;
- spec->size = 0;
- spec->data = NULL;
- spec->mode = 0;
- memset(spec->sha1, 0, 20);
+ memcpy(spec->path, path, namelen+1);
return spec;
}
-
| Bron Gondwana | BUG: mmapfile/writev spurious zero bytes (x86_64/not i386, bisected, reproducable) |
| Greg Kroah-Hartman | [PATCH 012/196] nozomi driver |
| J.C. Pizarro | Re: linux+glibc memory allocator, poor performance |
| Pavel Roskin | ndiswrapper and GPL-only symbols redux |
git: | |
| Andy Parkins | svn:externals using git submodules |
| Linus Torvalds | Re: fatal: Out of memory, malloc failed |
| Peter Karlsson | RCS keyword expansion |
| Dennis Schridde | Odd number of elements in anonymous hash |
| Jarek Poplawski | [PATCH 00/14]: Killing qdisc->ops->requeue(). |
| jamal | [PATCH 2/3][NET_BATCH] net core use batching |
| Patrick McHardy | pkt_sched: add DRR scheduler |
| Marcel Holtmann | Bluetooth fixes for 2.6.27 |
| Charlie Clark | openbsd fail2ban |
| Hari | DHCP question |
| Richard Stallman | Real men don't attack straw men |
| Nick Guenther | Re: When will OpenBSD support UTF8? |
