On Wednesday 29 July 2009, Johannes Schindelin wrote:
Thanks for the ideas. I will look into this once I have a set of
performance tests that I feel give a better picture of the notes parsing
performance.
I've just started. As I said above, t3302 (which has no fanout) is not
negatively affected by my current implementation. However this does not
say much (only that the current draft doesn't screw up completely...).
I've started making some scripts for more accurately testing the
performance of the notes parsing code at different fanout levels.
At the end of this email you will find a patch with my current state of
these scripts. (This patch is not meant to be included in the jh/notes
topic. It is only extra scripts for those interested in testing the
performance of this particular piece of code.)
METHODOLOGY
So far, there are 3 scripts:
- create_test_repo.sh: Creates a repo with 100,000 commits and one note
object per commit. Also creates 3 notes trees, each holding all
100,000 notes, but in different fanout structures:
- refs/notes/fanout0: all 100,000 notes directly inside the root tree.
- refs/notes/fanout1: notes are stored in a 2/38 structure (i.e. split
into 256 subtrees within the root tree).
- refs/notes/fanout2: notes are stored in a 2/2/36 structure (i.e. an
additional 256-fanout within each of the 256 subtrees).
- verify_correctness.sh: Verify that the "git log" output for each of the
3 notes refs is as expected. This is just to verify that the notes
parsing code is actually doing the right job.
- test_performance.sh: For each of the 3 notes refs, run "git log -n 10"
100 times, and report the time used. I feel this gives a more accurate
impression of the real-world performance of the notes parsing code,
since we:
- Test the code at several fanout levels
- Only lookup _some_ of the notes (looking up _all_ of the notes, i.e.
"git log" without -n, is clearly the worst-case scenario for any code
that loads subtree on-demand).
There is also a config script - config.sh - where you can tweak all of
the above test parameters.
PRELIMINARY RESULTS
Running the above test_performance.sh on the current state of jh/notes
give the following output on my machine:
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
30.56user 2.08system 0:32.71elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1038490minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
1.64user 0.20system 0:01.88elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+104883minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.48user 0.18system 0:00.65elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+94283minor)pagefaults 0swaps
...which indicates that even with my straightforward first draft based on
a simple linked list, there are huge performance wins in using one or more
fanout levels.
THE SCRIPTS
---
notes_performance/config.sh | 9 +++
notes_performance/create_test_repo.sh | 90 +++++++++++++++++++++++++++++++
notes_performance/test_performance.sh | 31 +++++++++++
notes_performance/verify_correctness.sh | 32 +++++++++++
4 files changed, 162 insertions(+), 0 deletions(-)
create mode 100644 notes_performance/config.sh
create mode 100755 notes_performance/create_test_repo.sh
create mode 100755 notes_performance/test_performance.sh
create mode 100755 notes_performance/verify_correctness.sh
diff --git a/notes_performance/config.sh b/notes_performance/config.sh
new file mode 100644
index 0000000..13abae8
--- /dev/null
+++ b/notes_performance/config.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+GIT=../../git
+
+num_commits=100000
+repo_dir=test_repo
+
+log_length=10
+log_reps=100
diff --git a/notes_performance/create_test_repo.sh b/notes_performance/create_test_repo.sh
new file mode 100755
index 0000000..fee3535
--- /dev/null
+++ b/notes_performance/create_test_repo.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Create a test repo with $num_commits commits, and one note object per commit.
+# The repo has the following refs:
+# - refs/heads/master pointing to the last commit
+# - refs/notes/fanout0 referencing all notes with no fanout
+# - refs/notes/fanout1 referencing all notes with 2/38 fanout
+# - refs/notes/fanout2 referencing all notes with 2/2/36 fanout
+
+if [ -d $repo_dir ]; then
+ echo "Skipping repo creation, $repo_dir already exists"
+ exit 1
+fi
+
+mkdir $repo_dir &&
+cd $repo_dir &&
+$GIT init &&
+nr=0 &&
+echo "Creating $num_commits commits..." 1>&2 &&
+while [ $nr -lt $num_commits ]; do
+ nr=$(($nr+1)) &&
+ cat <<INPUT_END
+commit refs/heads/master
+committer Foo Bar <foobar@example.com> 1234567890 +0000
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+done |
+$GIT fast-import --quiet &&
+echo "done" 1>&2 &&
+(
+ echo "Creating $num_commits notes..." 1>&2 &&
+ nr=0 &&
+ while [ $nr -lt $num_commits ]; do
+ nr=$(($nr+1)) &&
+ cat <<INPUT_END
+blob
+mark :$nr
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ done &&
+ echo "done" 1>&2 &&
+ for fanout_levels in 0 1 2; do
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ echo "Creating notes tree with fanout level $fanout_levels..." 1>&2 &&
+ cat <<INPUT_END &&
+commit $notes_ref
+committer Foo Bar <foobar@example.com> 1234567890 +0000
+data <<COMMIT
+notes with fanout level $fanout_levels
+COMMIT
+
+INPUT_END
+
+ nr=$num_commits &&
+ $GIT rev-list refs/heads/master |
+ while read sha1; do
+ case $fanout_levels in
+ 0)
+ note_path=$sha1
+ ;;
+ 1)
+ note_path="${sha1:0:2}/${sha1:2}"
+ ;;
+ 2)
+ note_path="${sha1:0:2}/${sha1:2:2}/${sha1:4}"
+ ;;
+ esac &&
+ echo "M 100644 :$nr $note_path" &&
+ nr=$(($nr-1))
+ done &&
+ echo "done" 1>&2
+ done
+) |
+$GIT fast-import --quiet &&
+$GIT gc
diff --git a/notes_performance/test_performance.sh b/notes_performance/test_performance.sh
new file mode 100755
index 0000000..65a5dbf
--- /dev/null
+++ b/notes_performance/test_performance.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Test "git log -n $log_length" for each of the 3 notes refs
+# - refs/notes/fanout0
+# - refs/notes/fanout1
+# - refs/notes/fanout2
+
+if [ ! -d $repo_dir ]; then
+ echo "Cannot test performance, $repo_dir missing"
+ exit 1
+fi
+
+cd $repo_dir &&
+cat > time_notes <<EOF &&
+ i=0 &&
+ while [ \$i -lt $log_reps ]; do
+ $GIT log -n $log_length refs/heads/master >/dev/null &&
+ i=\$((\$i+1))
+ done
+
+EOF
+
+for fanout_levels in 0 1 2; do
+ echo "Timing $log_reps reps of 'git log -n $log_length refs/heads/master >/dev/null' at fanout level $fanout_levels..." &&
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ $GIT config core.notesRef $notes_ref &&
+ /usr/bin/time sh time_notes &&
+ echo
+done
diff --git a/notes_performance/verify_correctness.sh b/notes_performance/verify_correctness.sh
new file mode 100755
index 0000000..ca40223
--- /dev/null
+++ b/notes_performance/verify_correctness.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Verify proper contents of git log for each of the 3 notes refs
+# - refs/notes/fanout0
+# - refs/notes/fanout1
+# - refs/notes/fanout2
+
+if [ ! -d $repo_dir ]; then
+ echo "Cannot verify correctness, $repo_dir missing"
+ exit 1
+fi
+
+cd $repo_dir &&
+for fanout_levels in 0 1 2; do
+ echo "Verifying correct git log at fanout level $fanout_levels" &&
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ $GIT config core.notesRef $notes_ref &&
+ $GIT log | grep "^ " > output &&
+ nr=$num_commits &&
+ while [ $nr -gt 0 ]; do
+ echo " commit #$nr" &&
+ echo " note for commit #$nr" &&
+ nr=$(($nr-1));
+ done > expect &&
+ echo "done" &&
+ diff -u expect output || {
+ echo "Failed verification of fanout level $fanout_levels"
+ exit 1
+ }
+done
--
1.6.4.rc3.138.ga6b98.dirty
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html