I just imported an svn repository with about 120 tags and 140
branches, and with some repacking got the pack file down to a
comfortable 80 MB. However, .git is over 600 MB, owing to about 520 MB
of git-svn metadata. (This wasn't a problem when I only tracked a
handful of branches, since they're only a few megs apiece.)
There appears to be two kinds of metadata that takes up a significant
fraction of the space.
* An index file is saved for each branch and tag. I presume this
corresponds to the branch head, and is used to speed up importing
of new revisions to that branch. However, recreating an index with
git-read-tree is very fast, so I don't think these need to be
saved between git-svn runs.
* A "rev_db" file is saved for each branch and tag. This is a text
file with one sha1 per line -- I seem to remember that line X of
this file is the commit sha1 of svn revision X. For revisions that
didn't touch this branch/tag, there's a line of 40 zeros. And
since every revision touches just one branch, it's almost all
zeros unless the number of branches is very small.
This could probably be stored _much_ more efficiently. Just
gzipping it with the standard options shrinks it by between a
factor of 4 (for one of the busiest branches) and 300 (for a tag,
which is written just once). But I understand that we need quick
random access here?
The index files should be easy enough to erase between runs, if they
indeed just correspond to the branch head. The rev_db files are
trickier; exactly what kind of lookups are required? Could it perhaps
be done with just one file, instead of one per branch/tag?
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html| Alan Cox | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Bart Van Assche | Re: Integration of SCST in the mainstream Linux kernel |
| Andrew Morton | Re: [RFC/PATCH] Documentation of kernel messages |
git: | |
| Winkler, Tomas | RE: iwlwifi: fix build bug in "iwlwifi: fix LED stall" |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Mark Lord | Re: [BUG] New Kernel Bugs |
