Commit ancestry grafting is a local repository issue and even if you manage to lie to your local git that 300,000th commit is the epoch, the commit object you send out to the downloader would record its true parent (or parents, if it is a merge), so the downloader would want to go further back. And no, rewriting that commit and feeding a parentless commit to the downloader is not an option, because such a commit object would have different object name and unpack-objects would be unhappy. If you choose not to have full history in your public repository for whatever reason (ISP server diskquota comes to mind) that is OK, but be honest about it to your downloaders. Tell them that you do not have the full history, and they first need to clone from some other repository you started your development upon, in order to use what you added upon. "This repository does not have all the history -- please first clone from XX repository (you need at least xxx commit), and then do another 'git pull' from here", or something like that. It _might_ work if you tell your downloader to have a proper graft file in his repository to cauterize the commit ancestry chain _before_ he pulls from you, though. I haven't tried it (and honestly I did not feel that is something important to support, so it might work by accident but that is not by Maybe you did not use grafts properly to cauterize? I tried the following and am getting expected results. I did not have patience to do 300,000, so I cut things at #4, though. -- 8< -- #!/bin/sh rm -fr .git git init-db echo 0 >path git add path for i in 1 2 3 4 5 6 7 do echo $i >path git commit -a -m "Iteration #$i" git tag "iter#$i" done git checkout -b mine iter#4 for i in A B C D do echo $i >path git commit -a -m "Alternate #$i" git tag "alt#$i" done git log --pretty=oneline --topo-order echo merge base is `git merge-base master mine` | git name-rev --stdin git-rev-parse iter#4 >.git/info/grafts echo...
I'm a bit curious about how this was done for the public kernel repo. I'd like to import glibc to git, but keeping history since 1972 seems a bloody waste, really. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
That's exactly my point. Futhermore make your downloaders import that
useless history spread this waste.
I guess kernel repo will encounter this problem in short term. It's
being bigger and bigger and developpers may be borred to deal with so
many useless objects. But I'm not saying that it's bad thing to keep
that history. It just would be nice to allow developpers that don't
care about old history to get rid of it.
Thanks
--
Franck
-Ach, no. The current kernel repo only has history since April 17 (around 155 MB of objects, with less than optimal packing), when it started using git for versioning. The kernel repo also sees a lot of very rapid development. The full kernel tree, with history since 1991 or some such, is about 3.2 GB. It was for this reason that the early history was dropped. I don't think another drop will be necessary any time soon, since incremental updates are fairly cheap over git and git+ssh. Only gitk suffers, but You could ofcourse create a new repository with the files from the version you want, but then you'd have a hard time merging the two repos if you ever want to import the old history. Linus; Is this what you did with the public kernel repo? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Just to make sure this is corrected, the 3.2GB was for a fully unpacked tree, which is still fairly bad in the current tree. The historical tree, packed, runs about 266M in a single pack. It's always possible to use a "graft" to tie the history together, and if you really need to merge changes across the boundary, my graft-ripple (in the archives) tool can make it happen, though it does some ... nasty things to the history tree in the process. (It might be useful on a throwaway tree to provide a way to merge, then, from which a set of diffs could be taken and applied back on an un-messy tree.) -- Ryan Anderson sometimes Pug Majere -
Dear diary, on Thu, Jan 19, 2006 at 02:44:15PM CET, I got a letter There is some "accurate" history only from the moment the kernel got tracked in BK, and it is certainly far less. The question is, what is the "official" kernel history repository? There is at least http://www.kernel.org/pub/scm/linux/kernel/git/tglx/history.git with a 251M pack and http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git with a 165M pack - IIRC the latter is obsoleted by the former and perhaps should be blasted to prevent confusion? Getting a little offtopic here... Linus, would it be deemed useful to have the script I've pasted in <20060119130519.GB28365@pasky.or.cz> (earlier in this thread) in the kernel's scripts/ directory, pointing at the canonical history repository? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams -
Dear diary, on Thu, Jan 19, 2006 at 12:10:23PM CET, I got a letter FWIW, with the ELinks GIT repository we just started from scratch and then converted the old CVS repository, and provided this script in contrib/grafthistory.sh: #!/bin/sh # # Graft the ELinks development history to the current tree. # # Note that this will download about 80M. if [ -z "`which wget 2>/dev/null`" ]; then echo "Error: You need to have wget installed so that I can fetch the history." >&2 exit 1 fi [ "$GIT_DIR" ] || GIT_DIR=.git if ! [ -d "$GIT_DIR" ]; then echo "Error: You must run this from the project root (or set GIT_DIR to your .git directory)." >&2 exit 1 fi cd "$GIT_DIR" echo "[grafthistory] Downloading the history" mkdir -p objects/pack cd objects/pack wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d... wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d... echo "[grafthistory] Setting up the grafts" cd ../.. mkdir -p info # master echo 0f6d4310ad37550be3323fab80456e4953698bf0 06135dc2b8bb7ed2e441305bdaa82048396de633 >>info/grafts # REL_0_10 echo 43a9a406737fd22a8558c47c74b4ad04d4c92a2b 730242dcf2cdeed13eae7e8b0c5f47bb03326792 >>info/grafts echo "[grafthistory] Refreshing the dumb server info wrt. new packs" cd .. git-update-server-info So you checkout the ELinks repository and if you want the full history you just run this script and it does everything for you. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams -
Thanks Junio for answering
well, dealing with a repo that has more than 300,000 objects becomes a
I don't try to hide or lie to my downloaders. I just want them to
avoid to deal with totaly pointless history. My work have been started
recently and is based on current XX repository. IMHO storing, dealing
with objects which are more than 10 years old is useless.
I don't see why it is so bad to create a "grafted" repository ? I want
it to be small but still want to merge by using git-resolve with XX
Well in my graft file I did:
$ cat > .git/info/grafts
<shaid> <shaid>
$
By reading "Documentation/repository-layout.txt", I thought it would
have been the right thing to do. If I did the same like you did ie:
$ cat > .git/info/grafts
<shaid>
$
It works.
Thanks
--
Franck
-Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter Were the objects packed? It would be interesting to have some data about how GIT performs with that much objects... -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams -
The historical linux archive has a lot more than 300,000 objects. In fact, even the _current_ kernel archive has almost 200,000 objects. Maybe somebody was thinking "commits", not "objects". Something with 300,000 commits is indeed a pretty big project. Anyway, from a scalability standpoint, git should have no problem at all with tons of objects, as long as you pack the old history. There are a few things that get slower: - if you end up doing things that look at history, they are obviously at least linear is history size. Often there are other downsides too (using lots of memory). Example: try even just a simple "gitk" on the (regular, new) kernel archive, and it will take a while before the whole thing has been done. Of course, you'll see the top entries interactively, so mostly you won't care, but I routinely limit it some way just to make it not make the CPU fans come on. So I do something like gitk --since=1.week.ago gitk v2.6.15.. instead of plain gitk, just because it makes operations cheaper. - a full clone takes a long time. Git _could_ fairly easily have an extension to add a date specifier to clone too: git clone --since=1.month.ago <source> <dst> and just leave any older stuff (you could always fetch it later), but we've just never done it. Maybe we should. It _should_ be pretty simple to do from a conceptual standpoint. but "everyday" operations shouldn't slow down from having a long history. I can still apply 4-5 patches a second to the kernel archive, for example, as you can see from git log --pretty=fuller | grep CommitDate | less -S and looking for one of the patch series I've applied from Andrew.. Linus -
that would be great ! something like:
git clone --since=v2.6.15 <src> <dst>
would be very useful for me. How would it work ? Does it automatically
but it's really a pain to run for example git-repack or git-prune commands.
Thanks
--
Franck
-I think we'd have to set up the grafts file, yes. However, it's actually less of an advantage than you'd think: especially for long development histories, the incremental packing is very very efficient. In contrast, if you only get recent versions, there's nothing to be incremental against, so the size of the pack will not be that much smaller. So getting just a tenth of the development history will _not_ cause the pack to be just a tenth in size. It's probably closer to half the size of the full history. Anyway, it's _conceptually_ something that git wouldn't have any problems with, but that doesn't mean that it's totally trivial either. The easiest way to do it (by far) would be to expand the native git protocol with a "get all objects of this one version" or something like that, and then you'd just do a "pull and mark all unknown commits in the grafts file". So in effect, instead of getting the whole history pack, you'd get a pack that contains _one_ version (no history at all), and then (if you want to) you can get a pack that gets all stuff that isn't reachable from that one (ie "newer"). That would have the advantage that it's quite possible that many users might want to do just git clone --only=v2.6.15 <source> <target> which would do that "one single version" variant of the clone. Then, later on, you could just do git pull --graft-unknown <source> <target> to update the history. Anybody want to try that? It would be a new command to "git-daemon" (instead of "git-upoload-pack", you'd do a new "git-upload-version" command internally: it would look a lot like upload-pack, and use the same Well, you really don't need to do that very often. Linus -
Dear diary, on Thu, Jan 19, 2006 at 05:58:09PM CET, I got a letter Eek. I was burnt by git-count-objects' misleading name. I guess git-rev-list --objects --all | wc -l should give accurate results - 145941 for kernel repository back from Yes. I receive wishes for this time by time and it is buried somewhere deep in my TODO list. I'm not sure how happy the GIT tools will be about invalid parent references. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Of the 3 great composers Mozart tells us what it's like to be human, Beethoven tells us what it's like to be Beethoven and Bach tells us what it's like to be the universe. -- Douglas Adams -
