Before actually implementing delete_ref(), we discussed this
"deleted-refs/" idea. but I do not think it is a direction we
would want to go.Ref deletion is an operation that happens far far rarer than
updates and lookups, and I deliberately chose to optimize for
the latter.There are valid reasons to delete refs, and one most frequent
one would be to discard throw-away temporary branches you may
have needed to switch to when your work was interrupted. But
even counting that kind of deletion, I imagine that you would
not be creating and removing more than one branch per every 10
commits you would create, and I also imagine you would be
invoking not less than 5 operations that inspect project
history, such as git-log and git-diff, between commits you make.An operation to build a new commit itself needs at least two
lookups (one to see what's current upfront, and another to see
nobody touched it upon lockless update). Most history-related
operations at least need to look at one (typically, HEAD), and
any refname you use to spell the name of an object or revision
range (e.g. "v2.6.17..v2.6.18~10" needs to look at tags/v2.6.17
and tags/v2.6.18). Optimizing for deletion path at the expense
of giving even a tiny penalty to lookup path is optimizing for a
wrong case, and that is why I rejected deleted-refs/ idea when I
originally did the delete_ref() implementation.Having said that, I would definitely think there still are rooms
for optimization in the current implementation. For example, I
do not recall offhand if I made the code to unconditionally
repack without the deleted one, or only repack when we know the
ref being deleted exists in the packed refs file. The latter
obviously would be more efficient and if we currently do not do
that, making it do so is a very welcomed change. Especially,
given that the latest code does not pack branch heads by
default, when a temporary throw-away branch is discarded, it is
far more likely that it is not packed and we do not need ...
I think that the ref deletion usage depends on the policy of people using
git, and there may be people that delete a ref very often.For example, when git becomes a major SCM, there may be people working on
big projects that want to create a new branch for each new bug and then
delete the branch when the code on the bug branch has been integrated intoThe operations that inspect project history may use a ref cache or something
so that a lookup on the disk may not be needed. So only the ref creation orThe lookup code is already using cached packed refs. It could also use
cached loose and deleted refs, so the lookup penalty would be very very
small. By the way, the OS may already cache loose and deleted ref file stat
information, so that may right now be a very small penalty.And at least, algorythmically speaking, with my patch the deletion path is
now independent of the number of existing refs, so it's much better (while
the lookup path stay the same).If there are thousand of refs and a heavy I/O load, rewritting the packed
ref file for each deletion means writing on disk something that may not fit
in the disk cache. It may be very bad.My patch also has a few added benefits like making it possible to have a
read only packed ref file, while still letting people delete refs. It also
make it possible to resurect a deleted ref, or to control branch deletion
rights on a per ref directory basis (though that may not be very usefull).And the fact that the packed ref file (which may be read only for added
safety) is not rewritten each time a ref is deleted make things much safer
if there are many users working on the same git repository.Christian.
-
I would say that is a very valid way to work with git,
regardless of the size of project. Now, how often would you
create such a per-bug branch and delete one, compared to the
number of operations that would require ref lookups? Your
example actually supports what I've said -- optimizing forStop and think about what you are saying. What's a ref cache?
We do not have such a beast today (unless you equate it with
packed-refs file), and you would need to design and implement
it, but think about how you make that operate. You would need
to invalidate it when you delete a ref using the deleted-ref/
approach; that's not much different from repacking packed-refs
file without the ref you just deleted, no?Of course you can argue that instead of repacking you always
stat deleted-ref/ hierarchy; in other words, you can argue that
you can make deletion path faster by penalizing the lookup path.So I do not think using "ref cache" (whatever it is, and however
If the goal is to optimize for deletion path, then that is
true. My point is that we do not want to optimize for deletion
path at the expense of more costly lookup path.-
I agree completely with Junio. I make a lot of temporary "throw
away" branches in Git; often they live on disk for 5/10 minutes at
most before getting deleted again. I also make a smaller number
(but still significant) of longer lived branches that hang around
for days or weeks before getting deleted.In the former case (throw away) I wouldn't want those refs added
to the packed refs file. They just don't live around long enough
to make it worth it. And when I delete them I want them gone.
So moving them off to a 'deleted-refs' directory to indicate they
are gone is just delaying the removal. Not something I want.In the latter case (longer lived) I don't mind if I have to sit
though an extra 500 ms to rewrite the entire packed refs file
during a ref delete operation. I lived with the branch for weeks;
I can probably spare a second to finally get rid of it once its
gone upstream. Heck, the push to move that branch upstream might
actually take longer to unpack the loose objects contained on thatAbsolutely. I figure I do ref lookups at least 3x the number of ref
deletes I perform. And that's just thinking about the sequence of
commands I commonly perform against my "throw away" branches which
live for at most 10 minutes, let alone my longer lived branches
that hang around for weeks.--
Shawn.
-
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Alan Stern | Re: 2.6.22-rc2-mm1 |
| Satyam Sharma | Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures |
| William Lee Irwin III | Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] |
git: | |
| Dale Farnsworth | Re: [PATCH 03/39] mv643xx_eth: shorten reg names |
| Jarek Poplawski | Re: HTB accuracy for high speed |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
