Hi,
This patch series replaces the series I sent out a while ago, which added
"commit annotations". Since "commit notes" was liked much better, here
they are.
It picks up the same idea, having a pseudo-branch whose revisions contain
a .git/objects/??/* like file structure, and whose blobs are the commit
notes.
By default, that pseudo-branch is "refs/notes/commits", but it is
overridable by the config variable core.notesRef, which in turn can be
overridden by the environment variable GIT_NOTES_REF. If the given ref
does not exist yet, it is interpreted as empty.
The biggest obstacle was a thinko about the scalability. Tree objects
take free form name entries, and therefore a binary search by name is not
possible.
Patch 6/6 is only a WIP patch, but it shows the road ahead. It adds code
to generate .git/notes-index from refs/notes/commits (or any other ref you
specify as notes ref), which is reused until refs/notes/commits^{tree}
changes. Patch 6/6 is only meant to assess which data structure yields
best performance, and how big the costs are.
However, as long as there are no public, fetchable commit notes, I think
the first 5 patches are safe for application and testing.
Ciao,
Dscho
Johannes Schindelin (6):
Rename git_one_line() to git_line_length() and export it
Introduce commit notes
Add git-notes
Add a test script for "git notes"
Document git-notes
notes: add notes-index for a substantial speedup.
.gitignore | 1 +
Documentation/cmd-list.perl | 1 +
Documentation/config.txt | 15 ++
Documentation/git-notes.txt | 45 ++++
Makefile | 5 +-
cache.h | 1 +
commit.c | 15 +-
commit.h | 1 +
config.c | 5 +
environment.c | 1 +
git-notes.sh | 61 ++++++
notes.c ...I might be misunderstanding, but in the case of the notes tree objects isn't it true that the name entries aren't free form, but are guaranteed to be of a fixed length form: XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX In which case you can binary search? Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com -
Hmph, you are right. In this sequence: hex = sha1_to_hex(commit->object.sha1); snprintf(name, sizeof(name), "%s:%.*s/%.*s", notes_ref_name, 2, hex, 38, hex + 2); if (get_sha1(name, sha1)) return; Instead, we could read the tree object by hand in the commit that is referenced by notes_ref_name, which has uniform two letter names for subtrees which can be binary searched, open the tree for that entry, again by hand, and do another binary search because that tree has uniform 38-letter names. That certainly could be done. Sounds like a "fun" project for some definition of the word. -
Hi, I disagree. One disadvantage to using tree objects is that it is much easier to have pilot errors. You could even make a new working tree checking out refs/notes/commits and change/add/remove files. Ciao, Dscho -
I suspect you read me wrong. I was saying that it is possible to use a specialized tree object parser in place of get_sha1() only in the above code to read the tree objects that represents a 'note'. You obviously would want to do a sanity check such as: - The size of the tree object your customized tree parser is fed is multiple of expected entry size (mode word + 20 SHA1 + 2 + NUL for fan-out, replace 2 with 38 for lower level); - mode word for the entry is sane (an entry in the fan-out tree would point at a tree object, an entry in lower level would point at a blob); - The name part (2 or 38) are lowercase hexadecimal strings; -
Hi, In which case it is not _that_ attractive any more, since you - have to have a fallback anyway, and - have a relatively complex thing. Instead, I want to go with the hash map approach, if only to have a O(1) behaviour instead of O(log N). Ciao, Dscho -
Actually, this commit adds two methods for a notes index: - a sorted list with a fan out to help binary search, and - a modified hash table. It also adds a test which is used to determine the best algorithm. --- Not signed off because this is not suitable to be applied as-is. It is only meant to test the different approaches. notes.c | 392 ++++++++++++++++++++++++++++++++++++-- t/t3302-notes-index-expensive.sh | 118 ++++++++++++ 2 files changed, 490 insertions(+), 20 deletions(-) create mode 100755 t/t3302-notes-index-expensive.sh diff --git a/notes.c b/notes.c index 5d1bb1a..5a90abf 100644 --- a/notes.c +++ b/notes.c @@ -1,10 +1,370 @@ #include "cache.h" #include "commit.h" +#include "tree-walk.h" #include "notes.h" #include "refs.h" static int initialized; +/* + * There are two choices of data structure for the notes index. + * + * A) Fan out enhanced sorted list. + * + * This is a regular sorted list with a 256 entry fan out. In other words, + * every time an entry is looked up, a binary search is performed over the + * sublist defined by the first byte of the SHA-1. + * + * The disadvantage is an average runtime logarithmic in the number of + * commit notes. The advantages are a compact representation on disk, + * and a _guaranteed_ logarithmic runtime. + * + * You could even squeeze out one more byte per entry, since the + * first byte is known from the fan out list. This would complicate our + * algorithm, though. + * + * B) Hash + * + * This is not your classical hash. It is _mostly_ like a hash, with + * a few notable exceptions: + * + * - it is possibly larger than size suggests: since it is file based, + * it is easier to write at the end than to wrap around. + * + * - as a consequence we can make the entries _strictly_ sorted. This + * is not only nice to look at, but makes incremental updates much, + * much easier. + * + * The disadvantages of a hash is its loose packing. In order to ope...
I know this is a nice backwards compatible way to organize notes, and to make them reasonably efficiently found, but I'd almost rather just see them crammed into the packfile alongside of the commit it annotates, so that the packfile reader can quickly find the annotation at the same time it finds the commit. aka packv4. Ok, enough dreaming for today. -- Shawn. -
Hi, Yes, I also dream of having the time to play with packv4. If you read my comments in the commit-annotation thread, you'll see that I stated that packv4 would solve the problem, too. The reason I did this series was not to push commit notes, but to make good for stalling Johan's efforts. Including a proof that the commit notes as I introduced them can be relatively cheap, too. Ciao, Dscho -
Hi, [this explains what Patch 6/6 is all about:] If GIT_NOTES_TIMING_TESTS is set, t3302 will output some timing data. It will create three repositories, the first with 10 commits and a commit note for each, the second with 100, the third with 1000. For each repository, it times "git log" 100 times in several modes: - with GIT_NOTES_REF set to a non-existing ref (should be equivalent to the timings without this patch series), - with no .git/notes-index, - recreating .git/notes-index as a hash map _every_ time, - creating .git/notes-index as a hash map, and using it the rest of the time, - recreating .git/notes-index as a sorted list _every_ time, and - creating .git/notes-index as a sorted list only the first time, and then using it to find the notes by binary search. Here is the output: * expecting success: create_repo 10 * ok 1: setup 10 * expecting success: test_notes 10 diff --git a/expect b/output * ok 2: notes work * expecting success: time_notes 100 no-notes 0.13user 0.08system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+45271minor)pagefaults 0swaps no-cash 0.14user 0.13system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+49421minor)pagefaults 0swaps hash-cache-create 0.16user 0.24system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+73111minor)pagefaults 0swaps hash-cache 0.10user 0.08system 0:00.18elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+45660minor)pagefaults 0swaps sorted-list-cache-create 0.23user 0.17system 0:00.40elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+72056minor)pagefaults 0swaps sorted-list-cache 0.12user 0.08system 0:00.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+46854minor)pagefaults 0swaps * ok 3: notes timing * expecting success: create_repo 100 * ok 1: setup 100 * expecting success: test_notes 100 diff --...
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Documentation/cmd-list.perl | 1 + Documentation/git-notes.txt | 45 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-notes.txt diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 2143995..f05e291 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -140,6 +140,7 @@ git-mergetool ancillarymanipulators git-mktag plumbingmanipulators git-mktree plumbingmanipulators git-mv mainporcelain +git-notes mainporcelain git-name-rev plumbinginterrogators git-pack-objects plumbingmanipulators git-pack-redundant plumbinginterrogators diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt new file mode 100644 index 0000000..331ed89 --- /dev/null +++ b/Documentation/git-notes.txt @@ -0,0 +1,45 @@ +git-notes(1) +============ + +NAME +---- +git-notes - Add commit notes + +SYNOPSIS +-------- +[verse] +'git-notes' (edit | show) [commit + +DESCRIPTION +----------- +This command allows you to add notes to commit messages, after the +fact. To discern these notes from the message stored in the commit +object, the notes are indented like the message, after an unindented +line saying "Notes:". + +To enable commit notes, you have to set the config variable +core.notesRef to something like "refs/notes/commits". This setting +can be overridden by the environment variable "GIT_NOTES_REF". + + +SUBCOMMANDS +----------- + +edit:: + Edit the notes for a given commit (defaults to HEAD). + +show:: + Show the notes for a given commit (defaults to HEAD). + + +Author +------ +Written by Johannes Schindelin <johannes.schindelin@gmx.de>...
Incidentally, a test for "git notes" implies a test for the whole commit notes machinery. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- t/t3301-notes.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 63 insertions(+), 0 deletions(-) create mode 100755 t/t3301-notes.sh diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh new file mode 100755 index 0000000..eb50191 --- /dev/null +++ b/t/t3301-notes.sh @@ -0,0 +1,63 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test commit notes' + +. ./test-lib.sh + +test_expect_success setup ' + : > a1 && + git add a1 && + test_tick && + git commit -m 1st && + : > a2 && + git add a2 && + test_tick && + git commit -m 2nd +' + +cat > fake_editor.sh << EOF +echo "\$MSG" > "\$1" +echo "\$MSG" >& 2 +EOF +chmod a+x fake_editor.sh +VISUAL="$(pwd)"/fake_editor.sh +export VISUAL + + +test_expect_success 'need notes ref' ' + ! MSG=1 git notes edit && + ! MSG=2 git notes show +' + +test_expect_success 'create notes' ' + git config core.notesRef refs/notes/commits && + MSG=b1 git notes edit && +cat .git/new-notes && +test b1 = "$(cat .git/new-notes)" && + test 1 = $(git ls-tree refs/notes/commits | wc -l) && + test b1 = $(git notes show) && + git show HEAD^ && + ! git notes show HEAD^ +' + +cat > expect << EOF +commit 268048bfb8a1fb38e703baceb8ab235421bf80c5 +Author: A U Thor <author@example.com> +Date: Thu Apr 7 15:14:13 2005 -0700 + + 2nd + +Notes: + b1 +EOF + +test_expect_success 'show notes' ' + ! (git cat-file commit HEAD | grep b1) && + git log -1 > output && + git diff expect output +' + +test_done -- 1.5.3.rc1.2718.gd2dc9-dirty -
This script allows you to edit and show commit notes easily.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
.gitignore | 1 +
Makefile | 2 +-
git-notes.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 1 deletions(-)
create mode 100755 git-notes.sh
diff --git a/.gitignore b/.gitignore
index 20ee642..125613f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -83,6 +83,7 @@ git-mktag
git-mktree
git-name-rev
git-mv
+git-notes
git-pack-redundant
git-pack-objects
git-pack-refs
diff --git a/Makefile b/Makefile
index 119d949..10a9342 100644
--- a/Makefile
+++ b/Makefile
@@ -213,7 +213,7 @@ SCRIPT_SH = \
git-merge-resolve.sh git-merge-ours.sh \
git-lost-found.sh git-quiltimport.sh git-submodule.sh \
git-filter-branch.sh \
- git-stash.sh
+ git-stash.sh git-notes.sh
SCRIPT_PERL = \
git-add--interactive.perl \
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..e0ad0b9
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" &&
+ die "No notes ref set."
+
+COMMIT=$(git rev-parse --verify --default HEAD "$2")
+NAME=$(echo $COMMIT | sed "s/^../&\//")
+
+case "$1" in
+edit)
+ MESSAGE="$GIT_DIR"/new-notes
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
+
+ GIT_INDEX_FILE="$MESSAGE".idx
+ export GIT_INDEX_FILE
+
+ CURRENT_HEAD=$(git show-ref $GIT_NOTES_REF | cut -f 1 -d ' ')
+ if [ -z "$CURRENT_HEAD" ]; then
+ PARENT=
+ else
+ PARENT="-p $OLDTIP"
+ git read-tree $GIT_NOTES_REF || die "Could not read index"
+ git cat-file blob :$NAME >> "$MESSAGE" 2> /dev/null
+ fi
+
+ ${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
+
+ grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
+ mv "$MES...Commit notes are blobs which are shown together with the commit message. These blobs are taken from the notes ref, which you can configure by the config variable core.notesRef, which in turn can be overridden by the environment variable GIT_NOTES_REF. The notes ref is a branch which contains trees much like the loose object trees in .git/objects/. In other words, to get at the commit notes for a given SHA-1, take the first two hex characters as directory name, and the remaining 38 hex characters as base name, and look that up in the notes ref. The rationale for putting this information into a ref is this: we want to be able to fetch and possibly union-merge the notes, maybe even look at the date when a note was introduced, and we want to store them efficiently together with the other objects. There is one severe shortcoming, though. Since tree objects can contain file names of a variable length, it is not possible to do a binary search for the correct base name in the tree object's contents. Therefore this approach does not scale well, because the average lookup time will be proportional to the number of commit objects, and therefore the slowdown will be quadratic in that number. However, a remedy is near: in a later commit, a .git/notes-index will be introduced, a cached mapping from commits to commit notes, to be written when the tree name of the notes ref changes. In case that notes-index cannot be written, the current (possibly slow) code will come into effect again. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Documentation/config.txt | 15 +++++++++++ Makefile | 3 +- cache.h | 1 + commit.c | 5 +++ config.c | 5 +++ environment.c | 1 + notes.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ notes.h | 9 ++++++ 8 files changed, 102 insertions(+), 1 deletions(-) create mode 100644 notes.c ...
I wonder if it is worth using the fan-out tree structure for the underlying "note" trees, as the notes-index would be the primary way to access them. Not that I've looked at the code too deeply with an intention of possibly including it early. I was hoping to see fixes to d/f code in merge-recursive from either you or Alex instead ;-) -
Actually now I think about it I think this was a stupid suggestion. Creation of a new note in a reasonably well populated note tree would be made 256-fold more efficient by having the fan-out, as write-tree does not have to recompute the other 255 tree objects thanks to the cache-tree data being fresh. -
Hi, The fan-out tree is a nice fallback solution when you cannot write the Well, yeah. I was kind of trying to cool off from my unpleasant unpack_trees() experience. But I'll look into the issue again this week. Promise. Ciao, Dscho -
The function get_one_line() really returns the line length, not the
whole line, but it is really useful, so do not hide it in commit.c,
under the wrong name.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
commit.c | 10 +++++-----
commit.h | 1 +
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/commit.c b/commit.c
index 4c5dfa9..0c350bc 100644
--- a/commit.c
+++ b/commit.c
@@ -458,7 +458,7 @@ void clear_commit_marks(struct commit *commit, unsigned int mark)
/*
* Generic support for pretty-printing the header
*/
-static int get_one_line(const char *msg, unsigned long len)
+int get_line_length(const char *msg, unsigned long len)
{
int ret = 0;
@@ -950,7 +950,7 @@ static void pp_header(enum cmit_fmt fmt,
for (;;) {
const char *line = *msg_p;
char *dst;
- int linelen = get_one_line(*msg_p, *len_p);
+ int linelen = get_line_length(*msg_p, *len_p);
unsigned long len;
if (!linelen)
@@ -1041,7 +1041,7 @@ static void pp_title_line(enum cmit_fmt fmt,
title = xmalloc(title_alloc);
for (;;) {
const char *line = *msg_p;
- int linelen = get_one_line(line, *len_p);
+ int linelen = get_line_length(line, *len_p);
*msg_p += linelen;
*len_p -= linelen;
@@ -1118,7 +1118,7 @@ static void pp_remainder(enum cmit_fmt fmt,
int first = 1;
for (;;) {
const char *line = *msg_p;
- int linelen = get_one_line(line, *len_p);
+ int linelen = get_line_length(line, *len_p);
*msg_p += linelen;
*len_p -= linelen;
@@ -1214,7 +1214,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
/* Skip excess blank lines at the beginning of body, if any... */
for (;;) {
- int linelen = get_one_line(msg, len);
+ int linelen = get_line_length(msg, len);
int ll = linelen;
if (!linelen)
break;
diff --git a/commit.h b/commit.h
index 467872e..fc6df23 100644
--- a/commit.h
+++ b/commit.h
@@ -60,6 +60,7 @@ enum cmit_fmt {
CMIT_FMT_UNSPECIFIED,
};
+extern int get_line...
| Jan Kundrát | kswapd high CPU usage with no swap |
| Renato S. Yamane | Error -71 on device descriptor read/all |
| Linus Torvalds | Linux 2.6.27-rc8 |
| Alex Chiang | [PATCH 0/7] Fixups for duplicate slot names |
git: | |
| Peter Karlsson | RCS keyword expansion |
| Wink Saville | Resolving conflicts |
| Andreas Hildebrandt | CVS-$Id:$ replacement in git? |
| Olivier Marin | [PATCH] builtin-rerere: fix conflict markers parsing |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Felipe Alfaro Solana | quagga-0.99.11 |
| Pc Nicolas | OpenBSD 4.4 httpd reverse proxy |
| Martin Schröder | pkg_add with http? |
| Evgeniy Polyakov | [resend take 2 4/4] DST Makefile/Kconfig files. |
| Volker Armin Hemmann | build error with 2.6.27.6+reiser4+ehci-hub patch. ERROR: "mii_ethtool_gset" [drive... |
| Krzysztof Oledzki | Re: Error: an inet prefix is expected rather than "0/0". |
| David Madore | atl1e Ethernet driver not seeing packets sent to 33:33:00:00:00:01 multicast |
| Block Sub System query | 1 hour ago | Linux kernel |
| kernel module to intercept socket creation | 2 hours ago | Linux kernel |
| Image size changing during each build | 2 hours ago | Linux kernel |
| Creating a device from a kernel module (mknod style) | 3 hours ago | Linux kernel |
| Soft lock bug | 7 hours ago | Linux kernel |
| sysctl - dynamic registration problem | 13 hours ago | Linux kernel |
| Question on swap as ramdisk partition | 16 hours ago | Linux kernel |
| serial driver xmit problem | 20 hours ago | Linux kernel |
| Generic Netlink subsytem | 21 hours ago | Linux kernel |
| 'Report spam filter error' page broken | 23 hours ago | KernelTrap Suggestions and Feedback |
