Re: [PATCH 1/2] Delete ref $frotz by moving ref file to "deleted-$frotz~ref".

Previous thread: Re: VCS comparison table by Jakub Narebski on Saturday, October 14, 2006 - 9:40 am. (452 messages)

Next thread: Re: VCS comparison table by Jakub Narebski on Saturday, October 14, 2006 - 1:20 pm. (13 messages)
From: Junio C Hamano
Date: Saturday, October 14, 2006 - 11:47 am

Before actually implementing delete_ref(), we discussed this
"deleted-refs/" idea. but I do not think it is a direction we
would want to go.

Ref deletion is an operation that happens far far rarer than
updates and lookups, and I deliberately chose to optimize for
the latter.

There are valid reasons to delete refs, and one most frequent
one would be to discard throw-away temporary branches you may
have needed to switch to when your work was interrupted.  But
even counting that kind of deletion, I imagine that you would
not be creating and removing more than one branch per every 10
commits you would create, and I also imagine you would be
invoking not less than 5 operations that inspect project
history, such as git-log and git-diff, between commits you make.

An operation to build a new commit itself needs at least two
lookups (one to see what's current upfront, and another to see
nobody touched it upon lockless update).  Most history-related
operations at least need to look at one (typically, HEAD), and
any refname you use to spell the name of an object or revision
range (e.g. "v2.6.17..v2.6.18~10" needs to look at tags/v2.6.17
and tags/v2.6.18).  Optimizing for deletion path at the expense
of giving even a tiny penalty to lookup path is optimizing for a
wrong case, and that is why I rejected deleted-refs/ idea when I
originally did the delete_ref() implementation.

Having said that, I would definitely think there still are rooms
for optimization in the current implementation.  For example, I
do not recall offhand if I made the code to unconditionally
repack without the deleted one, or only repack when we know the
ref being deleted exists in the packed refs file.  The latter
obviously would be more efficient and if we currently do not do
that, making it do so is a very welcomed change.  Especially,
given that the latest code does not pack branch heads by
default, when a temporary throw-away branch is discarded, it is
far more likely that it is not packed and we do not ...
From: Christian Couder
Date: Monday, October 16, 2006 - 9:26 pm

I think that the ref deletion usage depends on the policy of people using 
git, and there may be people that delete a ref very often.

For example, when git becomes a major SCM, there may be people working on 
big projects that want to create a new branch for each new bug and then 
delete the branch when the code on the bug branch has been integrated into 

The operations that inspect project history may use a ref cache or something 
so that a lookup on the disk may not be needed. So only the ref creation or 

The lookup code is already using cached packed refs. It could also use 
cached loose and deleted refs, so the lookup penalty would be very very 
small. By the way, the OS may already cache loose and deleted ref file stat 
information, so that may right now be a very small penalty.

And at least, algorythmically speaking, with my patch the deletion path is 
now independent of the number of existing refs, so it's much better (while 
the lookup path stay the same).

If there are thousand of refs and a heavy I/O load, rewritting the packed 
ref file for each deletion means writing on disk something that may not fit 
in the disk cache. It may be very bad.

My patch also has a few added benefits like making it possible to have a 
read only packed ref file, while still letting people delete refs. It also 
make it possible to resurect a deleted ref, or to control branch deletion 
rights on a per ref directory basis (though that may not be very usefull).

And the fact that the packed ref file (which may be read only for added 
safety) is not rewritten each time a ref is deleted make things much safer 
if there are many users working on the same git repository.

Christian.
-

From: Junio C Hamano
Date: Monday, October 16, 2006 - 9:41 pm

I would say that is a very valid way to work with git,
regardless of the size of project.  Now, how often would you
create such a per-bug branch and delete one, compared to the
number of operations that would require ref lookups?  Your
example actually supports what I've said -- optimizing for

Stop and think about what you are saying.  What's a ref cache?
We do not have such a beast today (unless you equate it with
packed-refs file), and you would need to design and implement
it, but think about how you make that operate.  You would need
to invalidate it when you delete a ref using the deleted-ref/
approach; that's not much different from repacking packed-refs
file without the ref you just deleted, no?

Of course you can argue that instead of repacking you always
stat deleted-ref/ hierarchy; in other words, you can argue that
you can make deletion path faster by penalizing the lookup path.

So I do not think using "ref cache" (whatever it is, and however

If the goal is to optimize for deletion path, then that is
true.  My point is that we do not want to optimize for deletion
path at the expense of more costly lookup path.


-

From: Shawn Pearce
Date: Monday, October 16, 2006 - 10:07 pm

I agree completely with Junio.  I make a lot of temporary "throw
away" branches in Git; often they live on disk for 5/10 minutes at
most before getting deleted again.  I also make a smaller number
(but still significant) of longer lived branches that hang around
for days or weeks before getting deleted.

In the former case (throw away) I wouldn't want those refs added
to the packed refs file.  They just don't live around long enough
to make it worth it.  And when I delete them I want them gone.
So moving them off to a 'deleted-refs' directory to indicate they
are gone is just delaying the removal.  Not something I want.

In the latter case (longer lived) I don't mind if I have to sit
though an extra 500 ms to rewrite the entire packed refs file
during a ref delete operation.  I lived with the branch for weeks;
I can probably spare a second to finally get rid of it once its
gone upstream.  Heck, the push to move that branch upstream might
actually take longer to unpack the loose objects contained on that

Absolutely.  I figure I do ref lookups at least 3x the number of ref
deletes I perform.  And that's just thinking about the sequence of
commands I commonly perform against my "throw away" branches which
live for at most 10 minutes, let alone my longer lived branches
that hang around for weeks.

-- 
Shawn.
-

Previous thread: Re: VCS comparison table by Jakub Narebski on Saturday, October 14, 2006 - 9:40 am. (452 messages)

Next thread: Re: VCS comparison table by Jakub Narebski on Saturday, October 14, 2006 - 1:20 pm. (13 messages)