Hi,
Being a pervert abusing the way subversion doesn't deal with branches
and tags, I'm actually not a user of git-svn or git-svnimport, because
they just can't deal easily with my perversion. So I'm writing a script
to do the conversion for me, and since I also like to learn new things
when I'm coding, I'm writing it in ruby.
Anyways, one of the things I'm trying to convert is my svk repository
for debian packaging of xulrunner (so, a significant subset of the
mozilla tree), which doesn't involve a lot of revisions (around 280,
because I only imported releases or CVS snapshots), but involves a lot
of files (roughly 20k).
The first thing I noticed when twisting around the svk repo so that
git-svn could somehow import it a while ago, is that running git-svn
was in my case significantly slower than svnadmin dump | svnadmin load
(more than 2 times slower).
And now, with my own script, I got the same kind of "slowdown". So I
investigated it, and it didn't take long to realize that replacing
git-hash-object by a simple reimplementation in ruby was *way* faster.
git-hash-object being more than probably what you do the most when you
import a remote repository, it is not much of a surprise that forking
thousands of times is a huge performance waste.
So, just for the record, I did a lame hack of git-svn to see what kind
of speedup could happen in git-svn. You can find this lame hack as a
patch below. I did some tests (with a 1.5.2.1 release) and here are the
results, importing only the trunk (192 revisions), with no checkout, and
redirecting stdout to /dev/null:
original git-svn:
real 25m1.871s
user 8m51.593s
sys 12m31.659s
patched git-svn:
real 14m45.870s
user 7m31.928s
sys 4m1.047s
Some notes about the patch:
- I've not looked at the rest of the code to see if there was a way to
get the size of the file so that SHA-1 sum and compression could be
done in one pass and without copying the whole file in memory.
- The object creation in the .git/objects directory is not as safe as
what git-hash-object does.
Some notes about git-svn:
- A few lines above the patched zone, the file is already read once to
do the MD5 sum. It should be possible to do SHA-1, MD5 sums and
deflate in just one pass.
- It might be worth testing if git-cat-file is called a lot. If so,
implementing a simple git-cat-file equivalent that would work for
unpacked objects could improve speed.
The same things obviously apply to git-cvsimport and other scripts
calling git-hash-object a lot.
Mike
diff --git a/git-svn.perl b/git-svn.perl
index d3c8cd0..202c228 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -2417,6 +2417,8 @@ use warnings;
use Carp qw/croak/;
use IO::File qw//;
use Digest::MD5;
+use Digest::SHA1;
+use Compress::Zlib;
# file baton members: path, mode_a, mode_b, pool, fh, blob, base
sub new {
@@ -2603,15 +2605,26 @@ sub close_file {
$buf eq 'link ' or die "$path has mode 120000",
"but is not a link\n";
}
- defined(my $pid = open my $out,'-|') or die "Can't fork: $!\n";
- if (!$pid) {
- open STDIN, '<&', $fh or croak $!;
- exec qw/git-hash-object -w --stdin/ or croak $!;
+ my $size = 0;
+ my $buf = "";
+ while (my $read = sysread $fh, my $tmp, 4096) {
+ $size += $read;
+ $buf .= $tmp;
}
- chomp($hash = do { local $/; <$out> });
- close $out or croak $!;
+ my $sha1 = Digest::SHA1->new;
+ $sha1->add("blob $size\0");
+ $sha1->add($buf);
+ $hash = $sha1->hexdigest;
close $fh or croak $!;
$hash =~ /^[a-f\d]{40}$/ or die "not a sha1: $hash\n";
+ my $blob_dir = "$ENV{GIT_DIR}/objects/" . substr($hash, 0, 2);
+ my $blob_file = $blob_dir . "/" . substr($hash, 2);
+ if (! -f $blob_file) {
+ mkdir $blob_dir unless -d $blob_dir;
+ open BLOB, ">$blob_file";
+ print BLOB compress("blob $size\0" . $buf);
+ close BLOB;
+ }
close $fb->{base} or croak $!;
} else {
$hash = $fb->{blob} or die "no blob information\n";
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html| Justin C. Sherrill | Re: pkgsrc bulk build and tiff |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Ingo Molnar | [crash, bisected] Kernel BUG at ffffffff8079afb1 (__netif_schedule()) |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Evgeniy Polyakov | Re: tbench wrt. loopback TSO |
