login
Header Space

 
 

Re: [QUESTION] about .git/info/grafts file

Previous thread: [PATCH] git-fetch-pack: really do not ask for funny refs by Johannes Schindelin on Wednesday, January 18, 2006 - 7:24 pm. (2 messages)

Next thread: /etc in git? by Adam Hunt on Wednesday, January 18, 2006 - 11:43 pm. (13 messages)
To: Franck <vagabon.xyz@...>
Cc: Git Mailing List <git@...>
Date: Wednesday, January 18, 2006 - 8:40 pm

Commit ancestry grafting is a local repository issue and even if
you manage to lie to your local git that 300,000th commit is the
epoch, the commit object you send out to the downloader would
record its true parent (or parents, if it is a merge), so the
downloader would want to go further back.  And no, rewriting
that commit and feeding a parentless commit to the downloader is
not an option, because such a commit object would have different
object name and unpack-objects would be unhappy.

If you choose not to have full history in your public repository
for whatever reason (ISP server diskquota comes to mind) that is
OK, but be honest about it to your downloaders.  Tell them that
you do not have the full history, and they first need to clone
from some other repository you started your development upon, in
order to use what you added upon.  "This repository does not
have all the history -- please first clone from XX repository
(you need at least xxx commit), and then do another 'git pull'
from here", or something like that.

It _might_ work if you tell your downloader to have a proper
graft file in his repository to cauterize the commit ancestry
chain _before_ he pulls from you, though.  I haven't tried it
(and honestly I did not feel that is something important to
support, so it might work by accident but that is not by

Maybe you did not use grafts properly to cauterize?  I tried the
following and am getting expected results.  I did not have
patience to do 300,000, so I cut things at #4, though.

-- 8&lt; -- 

#!/bin/sh

rm -fr .git
git init-db
echo 0 &gt;path
git add path

for i in 1 2 3 4 5 6 7
do
	echo $i &gt;path
	git commit -a -m "Iteration #$i"
	git tag "iter#$i"
done


git checkout -b mine iter#4

for i in A B C D
do
	echo $i &gt;path
	git commit -a -m "Alternate #$i"
	git tag "alt#$i"
done

git log --pretty=oneline --topo-order
echo merge base is `git merge-base master mine` | git name-rev --stdin

git-rev-parse iter#4 &gt;.git/info/grafts
echo...
To: Junio C Hamano <junkio@...>
Cc: Franck <vagabon.xyz@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 7:10 am

I'm a bit curious about how this was done for the public kernel repo. 
I'd like to import glibc to git, but keeping history since 1972 seems a 
bloody waste, really.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 9:31 am

That's exactly my point. Futhermore make your downloaders import that
useless history spread this waste.

I guess kernel repo will encounter this problem in short term. It's
being bigger and bigger and developpers may be borred to deal with so
many useless objects. But I'm not saying that it's bad thing to keep
that history. It just would be nice to allow developpers that don't
care about old history to get rid of it.

Thanks
--
               Franck
-
To: Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 9:44 am

Ach, no. The current kernel repo only has history since April 17 (around 
155 MB of objects, with less than optimal packing), when it started 
using git for versioning. The kernel repo also sees a lot of very rapid 
development.

The full kernel tree, with history since 1991 or some such, is about 3.2 
GB. It was for this reason that the early history was dropped. I don't 
think another drop will be necessary any time soon, since incremental 
updates are fairly cheap over git and git+ssh. Only gitk suffers, but 

You could ofcourse create a new repository with the files from the 
version you want, but then you'd have a hard time merging the two repos 
if you ever want to import the old history.

Linus; Is this what you did with the public kernel repo?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: Git Mailing List <git@...>
Date: Friday, January 20, 2006 - 4:48 pm

Just to make sure this is corrected, the 3.2GB was for a fully unpacked
tree, which is still fairly bad in the current tree.

The historical tree, packed, runs about 266M in a single pack.

It's always possible to use a "graft" to tie the history together, and
if you really need to merge changes across the boundary, my graft-ripple
(in the archives) tool can make it happen, though it does some ... nasty
things to the history tree in the process.  (It might be useful on a
throwaway tree to provide a way to merge, then, from which a set of
diffs could be taken and applied back on an un-messy tree.)

-- 

Ryan Anderson
  sometimes Pug Majere
-
To: Andreas Ericsson <ae@...>, <torvalds@...>
Cc: Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 1:45 pm

Dear diary, on Thu, Jan 19, 2006 at 02:44:15PM CET, I got a letter

There is some "accurate" history only from the moment the kernel got
tracked in BK, and it is certainly far less.

The question is, what is the "official" kernel history repository?
There is at least

	http://www.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

with a 251M pack and

	http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git

with a 165M pack - IIRC the latter is obsoleted by the former and
perhaps should be blasted to prevent confusion?

Getting a little offtopic here... Linus, would it be deemed useful to
have the script I've pasted in &lt;20060119130519.GB28365@pasky.or.cz&gt;
(earlier in this thread) in the kernel's scripts/ directory, pointing at
the canonical history repository?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams
-
To: Andreas Ericsson <ae@...>
Cc: Junio C Hamano <junkio@...>, Franck <vagabon.xyz@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 9:05 am

Dear diary, on Thu, Jan 19, 2006 at 12:10:23PM CET, I got a letter

FWIW, with the ELinks GIT repository we just started from scratch and
then converted the old CVS repository, and provided this script in
contrib/grafthistory.sh:


#!/bin/sh
#
# Graft the ELinks development history to the current tree.
#
# Note that this will download about 80M.

if [ -z "`which wget 2&gt;/dev/null`" ]; then
  echo "Error: You need to have wget installed so that I can fetch the history." &gt;&amp;2
  exit 1
fi

[ "$GIT_DIR" ] || GIT_DIR=.git
if ! [ -d "$GIT_DIR" ]; then
  echo "Error: You must run this from the project root (or set GIT_DIR to your .git directory)." &gt;&amp;2
  exit 1
fi
cd "$GIT_DIR"

echo "[grafthistory] Downloading the history"
mkdir -p objects/pack
cd objects/pack
wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d...
wget -c http://elinks.cz/elinks-history.git/objects/pack/pack-0d6c5c67aab3b9d5d9b245da5929c15d...

echo "[grafthistory] Setting up the grafts"
cd ../..
mkdir -p info
# master
echo 0f6d4310ad37550be3323fab80456e4953698bf0 06135dc2b8bb7ed2e441305bdaa82048396de633 &gt;&gt;info/grafts
# REL_0_10
echo 43a9a406737fd22a8558c47c74b4ad04d4c92a2b 730242dcf2cdeed13eae7e8b0c5f47bb03326792 &gt;&gt;info/grafts

echo "[grafthistory] Refreshing the dumb server info wrt. new packs"
cd ..
git-update-server-info


So you checkout the ELinks repository and if you want the full history
you just run this script and it does everything for you.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams
-
To: Junio C Hamano <junkio@...>
Cc: Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 6:51 am

Thanks Junio for answering


well, dealing with a repo that has more than 300,000 objects becomes a

I don't try to hide or lie to my downloaders. I just want them to
avoid to deal with totaly pointless history. My work have been started
recently and is based on current XX repository. IMHO storing, dealing
with objects which are more than 10 years old is useless.

I don't see why it is so bad to create a "grafted" repository ? I want
it to be small but still want to merge by using git-resolve with XX

Well in my graft file I did:

                    $ cat &gt; .git/info/grafts
                    &lt;shaid&gt; &lt;shaid&gt;

                    $

By reading "Documentation/repository-layout.txt", I thought it would
have been the right thing to do. If I did the same like you did ie:

                    $ cat &gt; .git/info/grafts
                    &lt;shaid&gt;

                    $

It works.

Thanks
--
               Franck
-
To: Franck <vagabon.xyz@...>
Cc: Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 9:09 am

Dear diary, on Thu, Jan 19, 2006 at 11:51:22AM CET, I got a letter

Were the objects packed? It would be interesting to have some data about
how GIT performs with that much objects...

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams
-
To: Petr Baudis <pasky@...>
Cc: Franck <vagabon.xyz@...>, Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 12:58 pm

The historical linux archive has a lot more than 300,000 objects. In fact, 
even the _current_ kernel archive has almost 200,000 objects.

Maybe somebody was thinking "commits", not "objects". Something with 
300,000 commits is indeed a pretty big project.

Anyway, from a scalability standpoint, git should have no problem at all 
with tons of objects, as long as you pack the old history. There are a few 
things that get slower:

 - if you end up doing things that look at history, they are obviously at 
   least linear is history size. Often there are other downsides too 
   (using lots of memory).

   Example: try even just a simple "gitk" on the (regular, new) kernel 
   archive, and it will take a while before the whole thing has been done. 
   Of course, you'll see the top entries interactively, so mostly you 
   won't care, but I routinely limit it some way just to make it not make 
   the CPU fans come on. So I do something like

	gitk --since=1.week.ago
	gitk v2.6.15..

   instead of plain gitk, just because it makes operations cheaper.

 - a full clone takes a long time. Git _could_ fairly easily have an 
   extension to add a date specifier to clone too:

	git clone --since=1.month.ago &lt;source&gt; &lt;dst&gt;

   and just leave any older stuff (you could always fetch it later), but 
   we've just never done it. Maybe we should. It _should_ be pretty simple 
   to do from a conceptual standpoint.

but "everyday" operations shouldn't slow down from having a long history. 
I can still apply 4-5 patches a second to the kernel archive, for example, 
as you can see from

	git log --pretty=fuller | grep CommitDate | less -S

and looking for one of the patch series I've applied from Andrew..

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Petr Baudis <pasky@...>, Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 1:33 pm

that would be great ! something like:

        git clone --since=v2.6.15 &lt;src&gt; &lt;dst&gt;

would be very useful for me. How would it work ? Does it automatically

but it's really a pain to run for example git-repack or git-prune commands.

Thanks
--
               Franck
-
To: Franck <vagabon.xyz@...>
Cc: Petr Baudis <pasky@...>, Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 1:49 pm

I think we'd have to set up the grafts file, yes. However, it's actually 
less of an advantage than you'd think: especially for long development 
histories, the incremental packing is very very efficient. In contrast, if 
you only get recent versions, there's nothing to be incremental against, 
so the size of the pack will not be that much smaller.

So getting just a tenth of the development history will _not_ cause the 
pack to be just a tenth in size. It's probably closer to half the size of 
the full history.

Anyway, it's _conceptually_ something that git wouldn't have any problems 
with, but that doesn't mean that it's totally trivial either. The easiest 
way to do it (by far) would be to expand the native git protocol with a 
"get all objects of this one version" or something like that, and then 
you'd just do a "pull and mark all unknown commits in the grafts file".

So in effect, instead of getting the whole history pack, you'd get a pack 
that contains _one_ version (no history at all), and then (if you want to) 
you can get a pack that gets all stuff that isn't reachable from that one 
(ie "newer").

That would have the advantage that it's quite possible that many users 
might want to do just

	git clone --only=v2.6.15 &lt;source&gt; &lt;target&gt;

which would do that "one single version" variant of the clone. Then, later 
on, you could just do

	git pull --graft-unknown &lt;source&gt; &lt;target&gt;

to update the history.

Anybody want to try that? It would be a new command to "git-daemon" 
(instead of "git-upoload-pack", you'd do a new "git-upload-version" 
command internally: it would look a lot like upload-pack, and use the same 

Well, you really don't need to do that very often.

		Linus
-
To: Linus Torvalds <torvalds@...>
Cc: Franck <vagabon.xyz@...>, Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Thursday, January 19, 2006 - 1:30 pm

Dear diary, on Thu, Jan 19, 2006 at 05:58:09PM CET, I got a letter

Eek. I was burnt by git-count-objects' misleading name. I guess

	git-rev-list --objects --all | wc -l

should give accurate results - 145941 for kernel repository back from

Yes. I receive wishes for this time by time and it is buried somewhere
deep in my TODO list. I'm not sure how happy the GIT tools will be about
invalid parent references.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams
-
Previous thread: [PATCH] git-fetch-pack: really do not ask for funny refs by Johannes Schindelin on Wednesday, January 18, 2006 - 7:24 pm. (2 messages)

Next thread: /etc in git? by Adam Hunt on Wednesday, January 18, 2006 - 11:43 pm. (13 messages)
speck-geostationary