Recently I was referred to the Grammar Police as the git-pack-objects
progress message 'Deltifying %u objects' is considered to be not
proper English to at least some small but vocal segment of the
English speaking population. Techncially we are applying delta
compression to these objects at this stage, so the new term is
slightly more acceptable to the Grammar Police but is also just
as correct.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
To go on top of your 'np/progress' series that is currently in next.
I think it might keep us out of trouble with certain individuals...
builtin-pack-objects.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 15d3750..f79fc0b 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1718,7 +1718,7 @@ static void prepare_pack(int window, int depth)
if (nr_deltas && n > 1) {
unsigned nr_done = 0;
if (progress)
- start_progress(&progress_state, "Deltifying objects",
+ start_progress(&progress_state, "Delta compressing objects",
nr_deltas);
qsort(delta_list, n, sizeof(*delta_list), type_size_sort);
ll_find_deltas(delta_list, n, window+1, depth, &nr_done);
--
1.5.3.4.1249.g895be
-Boo. I _like_ "deltifying". Sure, it's probably not in the dictionary, but that's how languages change: saying "delta compressing" all the time will get awkward, so people invent a new word using existing rules to explain a common phenomenon. Anyway, if you want to please the Grammar Police, should it not be "Delta-compressing"? "Delta" is not an adverb here, but rather the phrase acts as a compound verb (i.e., the two words work in place of a single verb). Although "Delta-compressing objects" just looks stupid. -Peff -
Yet that progress display isn't solely about "delta compressing". It also includes the search for best object match in order to keep the smallest delta possible. I'd like to add my own boo, but since I'm not a native speaker I won't argue too much. Nicolas -
In fact, isn't that progress meter _solely_ about finding the best matches? The actual deltification (that is, the creation of the deltas and writing of them to the packfile happens during the writing phase -- unless, of course, we've cached the deltas during the search phase). Perhaps one of: Finding deltas Finding delta candidates Matching objects or something similar (though I don't especially like any of them, I think you get the idea). -Peff -
Well, to find which combination is the smallest you actually have to I think we might sidestep the issue entirely by remaining somewhat vague and simply saying "compressing objects" for that phase. This is the part responsible for the reduction of a Git repository from 3GB down to 200MB anyway. Nicolas -
OK. I liked it a little more specific, but perhaps users really don't care what's going on. And it seems there is some support of simply "compressing", so that's probably reasonable. I think that is going to make the statistics line doubly confusing, though, since we never even use the word "delta" (or any wacky verbifications based on it), and then they get a line about numbers of new and reused deltas. -Peff -
Well, at least the statistics are real, and we're not inventing anything here. Nicolas -
My eyes have gotten used to "Deltifying" but I have to admit that in my early Git days I thought it looked damn odd. Today I'm far too familiar with Git to really notice this as a problem now. I really don't have an opinion either way. Actually I think the entire progress system of git-pack-objects should be reduced even further so that users aren't exposed to the internal phases we are going through. Most of them just don't care. They just Yea, that is just stupid. And is why I didn't use a - in my patch. *1*: Yes, yes, I know that's hard to predict and impossible in the really important cases (like network transfer of not yet compressed data). So just telling them that we are working and have recently done something useful towards finishing our goal is more than enough. Its like putting mirrors on the wall in a lobby; folks won't notice how long it takes for the elevator to arrive as they are too busy preening themselves. -- Shawn. -
Well, with my latest patches in that area, the typical progress on screen has been cut in half. And the different phases are intertaining. Nicolas -
Yup. Your patches were a big improvement. But I'm now sitting here
wondering if we shouldn't just allow a progress meter to overwrite
the prior one. Then you only see the current task and progress,
or the final output if we have nothing further to say about that.
Hmmph. Maybe something like this:
I like how it comes out in the end, but it really screws with the
sideband protocol. Like horribly. I got a whole ton of "remote:
remote: remote: remote: remote: remote:" during a remote clone.
diff --git a/progress.c b/progress.c
index 7629e05..099fc14 100644
--- a/progress.c
+++ b/progress.c
@@ -37,8 +37,6 @@ static void clear_progress_signal(void)
static int display(struct progress *progress, unsigned n, int done)
{
- char *eol;
-
if (progress->delay) {
if (!progress_update || --progress->delay)
return 0;
@@ -55,18 +53,26 @@ static int display(struct progress *progress, unsigned n, int done)
}
progress->last_value = n;
- eol = done ? ", done. \n" : " \r";
- if (progress->total) {
+ if (done) {
+ size_t c = strlen(progress->title);
+ while (c--)
+ fputc(' ', stderr);
+ fputs(" ", stderr);
+ for (; n > 0; n /= 10)
+ fputs(" ", stderr);
+ fputc('\r', stderr);
+ return 1;
+ } else if (progress->total) {
unsigned percent = n * 100 / progress->total;
if (percent != progress->last_percent || progress_update) {
progress->last_percent = percent;
fprintf(stderr, "%s: %3u%% (%u/%u)%s", progress->title,
- percent, n, progress->total, eol);
+ percent, n, progress->total, " \r");
progress_update = 0;
return 1;
}
} else if (progress_update) {
- fprintf(stderr, "%s: %u%s", progress->title, n, eol);
+ fprintf(stderr, "%s: %u%s", progress->title, n, " \r");
progress_update = 0;
return 1;
}
--
Shawn.
-And then you've lost some diagnostic clue (the absolute numbers) about the actual number of objects that were listed for "deltification" for example. And imagine that you see the progress moving slowly because the remote server is a NSLU2, but it says 80%. Then you go for a coffee and the progress says 20% when you return because it now has moved to a different phase. Rather counter intuitive. Nicolas -
Leave the "Total" line. Add to it the number of objects we had to Yea, I didn't consider that. That's where you need to show the number of steps and which one you are on, so the meter looks more like: Step 1/3: Counting objects: .... \r Step 2/4: Compressing objects: ... \r Step 3/3: Writing objects: .... \r only all smashed into one line of course, so only the most recent one is being displayed. -- Shawn. -
Looks good, although as Nicholas pointed out you might not know in advanced how many steps there are. So I still think that Nicholas' original point is valid is here, and if you overwrite old progress bars with new ones then you may falsely give the impression of "going backwards". Cheers, Wincent -
Yet you might not know in advance how many steps there'll be. You might or might not have the deltification phase (I simply can't let that term go...), pack indexing also have 1 or 2 steps, and if objects are unpacked instead then you have only one step. Given the asynchronous nature of the sideband messages, I think that could only create messed up displays. Some messages are terminated with \n and others with \r. Nicolas -
Obligatory me too: I totally agree with Nicolas here. -Peff -
OK, I will confess I found it a little odd at first, but I think it's a
straightforward and playful extension of the language, which is
something I like. But you know, we have the corporate git customers to
On a similar note, some complaints with progress meters, even after
recent patches:
- When fetching, one progress meter says "Indexing" which, while
technically true, is almost certainly blocking on "Downloading". In
fact, it is not clear from the existing messages exactly _when_ we
are downloading, and when we are just computing, which is something
I think a user might want to know. Objections to changing this
(though perhaps index-pack will need to be told when it is
downloading and when it is just indexing)? Objections to a
throughput indicator?
- Running git-gc, we now get something like:
Counting objects: 62317, done.
Deltifying objects: 100% (18042/18042), done.
Writing objects: 100% (62317/62317), done.
Total 62317 (delta 43861), reused 61404 (delta 43036)
Pack pack-32f8ac40c1a5ec146e45c657cb16f53fdd354095 created.
Removing unused objects 100%...
Done.
Can we get rid of total statistics (I think this is useful for some
power users, but perhaps there should be a verbosity level), the
name of the pack file (same deal), and the totally useless "Done."?
-Peff
-Agreed for the pack name. Certainly no one cares. Maybe the "Removing unused objects" should use the common progress infrastructure? It could even use the delayed interface, just like when checking out files, so no progress at all is displayed when that operation completes within a certain delay. And the removal of unused objects is usually quick. But I like the statistics. They might be pretty handy to diagnoze performance issues on remote servers for example. Nicolas -
Discussion on the list tonight came to the conclusion that showing the name of the packfile we just created during git-repack is not a very useful message for any end-user. For the really technical folk who need to have the name of the newest packfile they can use something such as `ls -t .git/objects/pack | head -2` to find the most recently created packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- Nicolas Pitre <nico@cam.org> wrote: > On Thu, 18 Oct 2007, Jeff King wrote: > > Can we get rid of total statistics (I think this is useful for some > > power users, but perhaps there should be a verbosity level), the > > name of the pack file (same deal), and the totally useless "Done."? > > Agreed for the pack name. Certainly no one cares. This makes it so. git-repack.sh | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/git-repack.sh b/git-repack.sh index e72adc4..7220635 100755 --- a/git-repack.sh +++ b/git-repack.sh @@ -83,9 +83,6 @@ for name in $names ; do fullbases="$fullbases pack-$name" chmod a-w "$PACKTMP-$name.pack" chmod a-w "$PACKTMP-$name.idx" - if test "$quiet" != '-q'; then - echo "Pack pack-$name created." - fi mkdir -p "$PACKDIR" || exit for sfx in pack idx -- 1.5.3.4.1249.g895be -
Are you volunteering (I think you know the progress code best)? They are by far the most useful of the three lines I mentioned, but I just wonder if they are a bit meaningless and cluttery for light users. We can always cut the others and see how it looks. -Peff -
Frankly, I think effort should be spent on the refs update display at this point. Something that looks like: * refs/heads/origin: fast forward to branch 'master' of git://gi t.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 old..new: 66ffb04..4fa4d23 [ note that I arbitrarily cut the long line before the 80th column to show the effect within an email ] You usually get long lines that gets wrapped, so that means 3 lines of screen space for one updated branches. Is the "66ffb04..4fa4d23" information really useful? Might someone ever care? Nicolas -
The reason its formatted the way it is today is someone can grab
that line into a copy-paste buffer and throw it onto a "git-log"
or "gitk" command line with ease to see what new stuff has come in.
Me, I just use the reflog if I care (`origin@{1}..origin`) to see
what a fetch changed in the tracking branch.
However I *don't* need the remote branch name or the remote URL,
especially if we are storing it into a tracking branch. That's most
likely coming from a configured remote that the user fetches from
frequently. I don't think about Linus' URL, I think about the fact
that in my linux-2.6 repository his directory is my origin remote.
Maybe something like this would be more useful:
* origin: fast-forwarded: 66ffb04..4fa4d23
Or if you are using refs/remotes style tracking branches:
* origin/master: fast-forwarded: 66ffb04..4fa4d23
* origin/pu: forcing update: 66ffb04..4fa4d23
Too terse? Yea, probably. But it is a whole lot shorter.
My other pet peeve here is the display from send-pack and
receive-pack when you push a ref. Hello?
updating 'refs/heads/master'
from de61e42b539ffbd28d2a2ba827bb0eb79767057b
to d7e56dbc4f60f6bd238e8612783541d89f006fb7
...
refs/heads/master: de61e42b539ffbd28d2a2ba827bb0eb79767057b -> d7e56dbc4f60f6bd238e8612783541d89f006fb7
That's like 4 too many SHA-1 strings for the average user.
--
Shawn.
-I have used it occasionally when tracking repos to see what new commits have happened. Usually I use a separate branch to mark "what I've seen" (i.e., fetch, gitk origin..master, pull), but if it's a branch that I'm not actively tracking, the display is useful. What is really useless in that line is the fact that _every_ ref is going to have the name of the remote, even though we only support fetching from one remote at a time. Perhaps something like: Fetching from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 * refs/heads/origin: fast forward to branch 'master' although that URL is almost a line by itself. :) Also, why do we abbreviate "refs/heads/master" from the remote, but we don't abbreviate refs/heads/origin for the local? Maybe something like: * local heads/origin -> remote heads/master (fast forward) or for separate remote * local remotes/origin/master -> remote heads/master (fast forward) Thoughts? -Peff -
Maybe we should have a shortcut notation for <ref>@{1}..<ref> instead?
I end up using that all the time since the fetch result has long
scrolled off the screen when I want to look at what was fetched.
Usully this llooks like:
git pull
git log @{1}..
That looks fine for a push. I'd say "remote foo -> local bar" for a
fetch.
Nicolas
-The fetch side is easy to change on top of the db/fetch-pack series that is currently in next. What's hard is the remote side. receive-pack on the remote side doesn't have the local's refname but it prints its own message upon a successful update. -- Shawn. -
I must confess to never using reflogs in that case...for some reason
they just never come to mind. But now that you and Shawn mention them,
there's really no reason to leave the hash..hash for the fetch.
As for a shortcut notation, what about allowing '..' notation inside a
reflog. I.e., <ref>@{a..b} is the same as <ref>@{a}..<ref>@{b}? So you
could perhaps do origin/master@{1..}?
I'm not sure if there are syntactic issues with parsing out the '..' (or
Yes, I was tempted to suggest that. I think it might need some context,
Agreed.
-Peff
-I'd love it, but the way our current SHA1 parser works, I don't think it
can really do it.
Basically, we currently assume that a SHA1 expression always expands to a
*single* SHA1.
And then, on top of that SHA1 expression parser, we then have a totally
separate logic (which is *not* part of the SHA1 expression parser itself)
that handles the "a..b" and "a...b" cases.
In other words, all the magic "head@{xyz}" logic is all in sha1_name.c,
but that never handles any ranges at all.
And then the range handling is a separate thing in revision.c and
builtin-rev-parse.c.
So I think your syntax is wonderful, but it would require moving the
complex range handling into sha1_name.c, and would also require that file
to get a more complex interface (right now all the sha1_name.c routines
just take the "fill in this one SHA1 hash" approach, but ".." and "..."
means that you have multiple objects *and* you can mark one of them as
See above: I don't think we have syntax problems per se: it's just that
right now the "complex" parser (the one that knows about ^, ~, and @{x}
etc) simply cannot do anything but a single SHA1 due to internal interface
issues.
Linus
-Ah, right. I hadn't thought of that. While it would be a nice convenience feature, it's probably not worth the deep internal hackery that would be required. -Peff -
What about a preprocessor that could match <1>@{<2>..<3>} in the
argument list and substitute that with <1>@{<2>}..<1>@{<3>} before it is
actually parsed?
Nicolas
-Could be done, but I almost think it would be better to just make the sha1_name.c interfaces take some extended data structure which allows looking up multiple SHA1's at the same time. Sure, we'd leave the "simple" interfaces around for users that simply want just one SHA1 value, but that would allow us to remove duplicate code from revision.c and rev-parse.c. And it would allow us to generally make the SHA1 parsing fancier: there may well be other expressions that are worth doing. Linus -
Btw, I knew I had wanted this in the past, but I had forgotten why. I now
remembered.
The thing is, sometimes you want an expression for "all the parents of X".
We don't have that, and again, the current internal implementation of
sha1_name.c makes it essentially impossible to do within that interface
(ie you can do it on *top* of that interface in revision.c, but it cannot
be a general SHA1 expression).
So wouldn't it be nice to have a "commit^*" expression to go with
"commit^" and "commit^2" etc? One that would name all the parents. It's
useful, for example, for saying that you still want to see that commit,
but not any of its parents:
git log commit^*..
could basically work to show that commit (and all subsequent commits), but
not the commits leading up to it. Right now, you can't easily say that in
the git "sha1 expression algebra".
There are some other cases where you'd like to have things expand to more
than one commit. We currently do those with special flags, like --all, but
in many ways it would be really nice to be able to do SHA1 operations on
them. If we were to make the SHA1 arithmetic able to handle multiple
SHA1's (instead of just one), I could see us being able to do things like
git diff {master,pu}:Makefile
as a way of saying
git diff master:Makefile pu:Makefile
which already works - simply because we could make the ":name" be able to
operate on multiple commit SHA1's and turn them into multiple blob (or
tree) SHA1's.
(The above may not sound very useful, but
git diff {ORIG_HEAD...MERGE_HEAD}:file
would essentially expand to "base version of file", ORIG_HEAD:file and
MERGE_HEAD:file, and we could fairly easily teach diff to do a nice
three-way diff for things like this! So it does have potential to be a
reasonably powerful model)
Linus
-Yes! I agree entirely. This is actually not very difficult. I think the only time we run `git-index-pack --stdin` is from within git-fetch-pack and git-receive-pack. These are the only two points where index-pack's stdin is attached to a network socket and not to a file. Its also where you'd want this to say "Transferring", "Uploading" or "Downloading". Really the important one to change here is probably the call in fetch-pack.c as that is the most visible and most time consuming operation for the average user (think git-clone on a large project). The same change probably should also be made for unpack-objects as fetch-pack/receive-pack may have chosen to use that if the object Yea. I keep forgetting to write a patch to do this. I've had much the same thought as you. The verbosity should probably be controlled like merge-recursive's is, but should default to not showing the "Total" line or the "Pack .. created" line. For the average user there isn't any valuable information in either line. I also think that the progress meter of git-prune-packed should be fixed to use the standard progress meter system. And maybe also be delayed so it doesn't trip if its going to be very quick. -- Shawn. -
This is not very considerate to non-native speakers, though, who might not grasp the neogolism. Perhaps just "compressing" if it gets awkward. Sam. -
I would have thought it would be easier for non-native speakers, since it has a precise meaning which I assumed we defined in a glossary (but now I see that we don't, though that can be corrected). But of course, I'm a native speaker (and AIUI, so are you), so perhaps we should wait for somebody else to comment. But I do agree that making sense to non-native speakers should be a priority when phrasing things like this. -Peff -
Forward thinking, that's probably most sensible, since git 4.7 might not use delta compression, but maybe wavelet compression, or other scheme entirely. Using deltas is an implementation detail, after all. Dave. -
Git already uses two types of compression (zlib on all objects, deltas between objects in packfiles). So just saying "compressing" is actually a bit ambiguous, and I think noting that what we are _actually_ doing right now is delta compression is worthwhile. -Peff -
True. But then, zlib compression is also delta compressing and huffman coding. Git's delta compression is quite similar; it just uses a larger window than the sliding 64k gzip one. Sam.` -
Heh. I just remembered that my packv4 topic had a phase it called "Dictifying objects". Which ran before "Deltifying objects". It was just another form of compression, but it was also timeconsuming and an entirely different loop so it got its own progress meter. I'm leaning to making it just say "Compressing objects". Simple, to the point, reasonably describes the action, and most people will understand what it means: "Oh, time to go get coffee if that number there is reeeealy big..." :-) -- Shawn. -
I think this is sensible. Nicolas -
| Oleg Nesterov | Re: [PATCH, RFC] reimplement flush_workqueue() |
| Linus Torvalds | Re: Linux 2.6.27-rc8 |
| Pavel Roskin | ndiswrapper and GPL-only symbols redux |
| Greg Kroah-Hartman | [PATCH 017/196] aoechr: Convert from class_device to device |
git: | |
| David Symonds | Re: git and binary files |
| Matthieu Moy | git push to a non-bare repository |
| Felipe Oliveira Carvalho | Re: [RFC] Zit: the git-based single file content tracker |
| Jakub Narebski | Re: [VOTE] git versus mercurial (for DragonflyBSD) |
| Patrick McHardy | netfilter 05/29: netns ebtables: part 2 |
| Templin, Fred L | [Resend][PATCH 01/05] ipv6: RFC4214 Support (4) |
| Laszlo Attila Toth | [PATCHv7 0/5 + 3] Interface group patches |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Han Boetes | shutdown gets stuck at `syncing discs...' |
| Leon Dippenaar | New tcp stack attack |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
