Re: [PATCH 0/4] Add more tests of cvsimport

Previous thread: none

Next thread: You Received an Ecard by Graham on Thursday, February 19, 2009 - 10:53 pm. (1 message)
From: Michael Haggerty
Date: Thursday, February 19, 2009 - 10:18 pm

The test suite for "git cvsimport" is pretty limited, and I would like
to improve the situation.  This patch series contains the first of
what I hope will eventually be several additions to the "git
cvsimport" test suite.

I am the maintainer of cvs2svn/cvs2git.  Most of the new tests will
probably use fragments from the cvs2svn test suite.  I should admit
that part of my motivation for adding tests to the "git cvsimport"
test suite is to document its weaknesses, which do not seem to be
especially well known.

Patch 1 splits out some code into a library usable by multiple
CVS-related tests.

Patch 2 changes the library to add the -f option when invoking cvs (to
make it ignore the user's ~/.cvsrc file).

Patch 3 adds a new test to t9600, namely to compare the entire module
as checked out by CVS vs. git.

Patch 4 adds a new test script t9601 that tests "git cvsimport"'s
handling of CVS vendor branches.  One of these tests fails due to an
actual bug.

These ideas in the patches are logically independent of each other,
but each patch assumes that the previous patches have been applied.

I would like to point out a few things about these patches that seem a
little bit unprecedented in the git test suite.  If other approaches
would be preferred, please let me know.

The first is that I would like to introduce a library that can be used
by the "git cvsimport" tests in the t96xx series, simply to avoid code
duplication.  I put this library in t/t96xx/cvs-lib.sh, to hopefully
make its role clear.  The library has to be sourced from the main test
directory.  (It sources test-lib.sh indirectly.)

The second is that the new test script uses a small CVS repository
that is part of the test suite (i.e., the *,v files are committed
directly into the git source tree).  This is different than the
approach of t9600, which creates its own test CVS repository using CVS
commands.  The reasons for this are:

- t9600 wants to test incremental import, so it *has to* create the
  repository ...
From: Michael Haggerty
Date: Thursday, February 19, 2009 - 10:18 pm

For now the "library" just includes code (moved from
t/t9600-cvsimport.sh) that checks whether the prerequisites for "git
cvsimport" are installed.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t9600-cvsimport.sh |   29 +----------------------------
 t/t96xx/cvs-lib.sh   |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 28 deletions(-)
 create mode 100644 t/t96xx/cvs-lib.sh

diff --git a/t/t9600-cvsimport.sh b/t/t9600-cvsimport.sh
index d2379e7..b4b9896 100755
--- a/t/t9600-cvsimport.sh
+++ b/t/t9600-cvsimport.sh
@@ -1,37 +1,10 @@
 #!/bin/sh
 
 test_description='git cvsimport basic tests'
-. ./test-lib.sh
+. ./t96xx/cvs-lib.sh
 
 CVSROOT=$(pwd)/cvsroot
 export CVSROOT
-unset CVS_SERVER
-# for clean cvsps cache
-HOME=$(pwd)
-export HOME
-
-if ! type cvs >/dev/null 2>&1
-then
-	say 'skipping cvsimport tests, cvs not found'
-	test_done
-	exit
-fi
-
-cvsps_version=`cvsps -h 2>&1 | sed -ne 's/cvsps version //p'`
-case "$cvsps_version" in
-2.1 | 2.2*)
-	;;
-'')
-	say 'skipping cvsimport tests, cvsps not found'
-	test_done
-	exit
-	;;
-*)
-	say 'skipping cvsimport tests, unsupported cvsps version'
-	test_done
-	exit
-	;;
-esac
 
 test_expect_success 'setup cvsroot' 'cvs init'
 
diff --git a/t/t96xx/cvs-lib.sh b/t/t96xx/cvs-lib.sh
new file mode 100644
index 0000000..bfc1c12
--- /dev/null
+++ b/t/t96xx/cvs-lib.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+. ./test-lib.sh
+
+unset CVS_SERVER
+# for clean cvsps cache
+HOME=$(pwd)
+export HOME
+
+if ! type cvs >/dev/null 2>&1
+then
+	say 'skipping cvsimport tests, cvs not found'
+	test_done
+	exit
+fi
+
+cvsps_version=`cvsps -h 2>&1 | sed -ne 's/cvsps version //p'`
+case "$cvsps_version" in
+2.1 | 2.2*)
+	;;
+'')
+	say 'skipping cvsimport tests, cvsps not found'
+	test_done
+	exit
+	;;
+*)
+	say 'skipping cvsimport tests, unsupported cvsps version'
+	test_done
+	exit
+	;;
+esac
-- 
1.6.1.3

--

From: Michael Haggerty
Date: Thursday, February 19, 2009 - 10:18 pm

A user's ~/.cvsrc file can change the basic behavior of CVS commands.
Therefore we should ignore it in order to ensure consistent results
from the test suite.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t9600-cvsimport.sh |   16 ++++++++--------
 t/t96xx/cvs-lib.sh   |    3 +++
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/t/t9600-cvsimport.sh b/t/t9600-cvsimport.sh
index b4b9896..66393ae 100755
--- a/t/t9600-cvsimport.sh
+++ b/t/t9600-cvsimport.sh
@@ -6,12 +6,12 @@ test_description='git cvsimport basic tests'
 CVSROOT=$(pwd)/cvsroot
 export CVSROOT
 
-test_expect_success 'setup cvsroot' 'cvs init'
+test_expect_success 'setup cvsroot' '$CVS init'
 
 test_expect_success 'setup a cvs module' '
 
 	mkdir "$CVSROOT/module" &&
-	cvs co -d module-cvs module &&
+	$CVS co -d module-cvs module &&
 	cd module-cvs &&
 	cat <<EOF >o_fortuna &&
 O Fortuna
@@ -30,13 +30,13 @@ egestatem,
 potestatem
 dissolvit ut glaciem.
 EOF
-	cvs add o_fortuna &&
+	$CVS add o_fortuna &&
 	cat <<EOF >message &&
 add "O Fortuna" lyrics
 
 These public domain lyrics make an excellent sample text.
 EOF
-	cvs commit -F message &&
+	$CVS commit -F message &&
 	cd ..
 '
 
@@ -74,7 +74,7 @@ translate to English
 
 My Latin is terrible.
 EOF
-	cvs commit -F message &&
+	$CVS commit -F message &&
 	cd ..
 '
 
@@ -92,8 +92,8 @@ test_expect_success 'update cvs module' '
 
 	cd module-cvs &&
 		echo 1 >tick &&
-		cvs add tick &&
-		cvs commit -m 1
+		$CVS add tick &&
+		$CVS commit -m 1
 	cd ..
 
 '
@@ -111,7 +111,7 @@ test_expect_success 'cvsimport.module config works' '
 
 test_expect_success 'import from a CVS working tree' '
 
-	cvs co -d import-from-wt module &&
+	$CVS co -d import-from-wt module &&
 	cd import-from-wt &&
 		git cvsimport -a -z0 &&
 		echo 1 >expect &&
diff --git a/t/t96xx/cvs-lib.sh b/t/t96xx/cvs-lib.sh
index bfc1c12..6738901 100644
--- a/t/t96xx/cvs-lib.sh
+++ b/t/t96xx/cvs-lib.sh
@@ -14,6 +14,9 @@ then
 	exit
 ...
From: Michael Haggerty
Date: Thursday, February 19, 2009 - 10:18 pm

Test added for completeness (it passes).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t9600-cvsimport.sh |    2 ++
 t/t96xx/cvs-lib.sh   |   25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/t/t9600-cvsimport.sh b/t/t9600-cvsimport.sh
index 66393ae..dad9d49 100755
--- a/t/t9600-cvsimport.sh
+++ b/t/t9600-cvsimport.sh
@@ -121,4 +121,6 @@ test_expect_success 'import from a CVS working tree' '
 
 '
 
+test_expect_success 'test entire HEAD' 'test_cmp_branch_tree master'
+
 test_done
diff --git a/t/t96xx/cvs-lib.sh b/t/t96xx/cvs-lib.sh
index 6738901..0136b36 100644
--- a/t/t96xx/cvs-lib.sh
+++ b/t/t96xx/cvs-lib.sh
@@ -32,3 +32,28 @@ case "$cvsps_version" in
 	exit
 	;;
 esac
+
+test_cvs_co () {
+	# Usage: test_cvs_co BRANCH_NAME
+	if [ "$1" = "master" ]
+	then
+		$CVS co -P -d module-cvs-"$1" -A module
+	else
+		$CVS co -P -d module-cvs-"$1" -b "$1" module
+	fi
+}
+
+test_git_co_branch () {
+	# Usage: test_git_co BRANCH_NAME
+	(cd module-git && git checkout "$1")
+}
+
+test_cmp_branch_tree () {
+	# Usage: test_cmp_branch_tree BRANCH_NAME
+	# Check BRANCH_NAME out of CVS and git and make sure that all
+	# of the files and directories are identical.
+
+	test_cvs_co "$1" &&
+	test_git_co_branch "$1" &&
+	diff -r -x .git -x CVS module-cvs-"$1" module-git
+}
-- 
1.6.1.3

--

From: Michael Haggerty
Date: Thursday, February 19, 2009 - 10:18 pm

CVS's handling of vendor branches is tricky; add some tests to check
whether revisions added via "cvs imports" then imported to git via
"git cvsimport" are reflected correctly on master.

One of these tests fail and is therefore marked "test_expect_failure".
Cvsimport doesn't realize that subsequent changes on a vendor branch
affect master as long as the vendor branch is the default branch.

The test CVS repository used for these tests is derived from cvs2svn's
test suite.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t9601-cvsimport-vendor-branch.sh                 |   86 ++++++++++++++++++++
 t/t9601/cvsroot/CVSROOT/.gitignore                 |    1 +
 t/t9601/cvsroot/module/added-imported.txt,v        |   44 ++++++++++
 t/t9601/cvsroot/module/imported-anonymously.txt,v  |   42 ++++++++++
 .../module/imported-modified-imported.txt,v        |   76 +++++++++++++++++
 t/t9601/cvsroot/module/imported-modified.txt,v     |   59 +++++++++++++
 t/t9601/cvsroot/module/imported-once.txt,v         |   43 ++++++++++
 t/t9601/cvsroot/module/imported-twice.txt,v        |   60 ++++++++++++++
 t/t96xx/cvs-lib.sh                                 |    6 ++
 9 files changed, 417 insertions(+), 0 deletions(-)
 create mode 100755 t/t9601-cvsimport-vendor-branch.sh
 create mode 100644 t/t9601/cvsroot/CVSROOT/.gitignore
 create mode 100644 t/t9601/cvsroot/module/added-imported.txt,v
 create mode 100644 t/t9601/cvsroot/module/imported-anonymously.txt,v
 create mode 100644 t/t9601/cvsroot/module/imported-modified-imported.txt,v
 create mode 100644 t/t9601/cvsroot/module/imported-modified.txt,v
 create mode 100644 t/t9601/cvsroot/module/imported-once.txt,v
 create mode 100644 t/t9601/cvsroot/module/imported-twice.txt,v

diff --git a/t/t9601-cvsimport-vendor-branch.sh b/t/t9601-cvsimport-vendor-branch.sh
new file mode 100755
index 0000000..179898b
--- /dev/null
+++ b/t/t9601-cvsimport-vendor-branch.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+
+# Description of the files in the ...
From: Jeff King
Date: Thursday, February 19, 2009 - 11:25 pm

Great. I agree the test suite is terrible for cvsimport; what little is

I don't think it is a problem to document cvsimport's weakness. It is
clear from list traffic that it has shortcomings, and IMHO documenting
them clearly and rigorously with test cases is the first step to fixing
them (or admitting that people should just use something else ;) ).

The only downside I see is that it bloats git's test suite a bit (and
cvs tests are often slow to run). We can always make them optional,
I suppose.

I do wonder, though, whether it would be simpler to make a "cvs import
test suite" that could pluggably test cvs2svn, git-cvsimport, or other
converters. Then you could test each on the exact same set of test
repos. And abstracting "OK, now make a repository from this cvsroot"
wouldn't be that hard for each command (I wouldn't think, but obviously

That is definitely a good first step, though the usual naming convention

The code in t9600 (which gets moved to lib-cvs in your patch 1) sets



I think that's fine. There are other places in the test suite where
things that are a pain to produce are just included as content (e.g.,
see some of the SVN tests in the 9100 series).


It's best to leave them in, I think, to create as realistic a test as
possible. But you should mark the paths as "we don't care about
whitespace" using gitattributes. I.e.,:

diff --git a/t/t9601/.gitattributes b/t/t9601/.gitattributes
new file mode 100644
index 0000000..562b12e
--- /dev/null
+++ b/t/t9601/.gitattributes
@@ -0,0 +1 @@
+* -whitespace

-Peff
--

From: Michael Haggerty
Date: Friday, February 20, 2009 - 3:21 am

Many tests only need to compare the contents of branches/tags checked
out of CVS with those checked out of the converted repository.  Such
tests could pretty easily be made agnostic about what conversion tool
they are testing and also, for that matter, the type of the target VCS
(git, SVN, hg, ...).  Testing incremental conversions at that level of
detail would not be much harder.

But other tests would be harder to write in a neutral fashion.  For
example, the cvs2svn test suite has tests of log messages, character-set
conversions of metadata, correct commit ordering, branching topology, etc.

In fact, one of the many things I haven't gotten around to yet is
writing a nontrivial test suite for cvs2git.  I'd like to share as much
as possible with the cvs2svn test suite, but there is a lot there that


That's a good question.  I just checked, and empirically cvs uses .cvsrc
from my true home directory even if HOME is set differently.  So I think


Cool, I didn't know about that feature.  I'll include that in v2 as well.

Thanks for the feedback!

BTW, I don't want to trash "git cvsimport".  I'm not brave enough even
to try to implement incremental conversions in cvs2git.  So the fact
that cvsimport sometimes works is already impressive :-)  I also
understand that its limitations come from those of cvsps, another
impressive but flawed tool (which in turn is being used outside of its
design limits).  But I hope to raise awareness that cvsps-based tools
are not the best choice for "one-shot" conversions, and maybe work
against people's tendency to use the "default" tool unless it obviously
blows up.

Michael
--

From: Jeff King
Date: Friday, February 20, 2009 - 8:00 am

OK. I haven't looked at it and you have, so I will accept your

Yuck. But if that's the way it works, then I think your patch is the


Agreed. I have seen that advice given on the list several times, and it
seems to be working for people. So it really is about the right tool for
the job, IMHO.

-Peff
--

From: Samuel Lucas Vaz de Mello
Date: Friday, February 20, 2009 - 2:26 pm

Michael,

If I run cvs2git several times against a live cvs repo (using the same configuration), wouldn't it perform an incremental import?
Is there anything that would make it produce different commits for the history?

I've just made a simple test here performing 2 imports (the 2nd with a dozen of new commits not in the 1st) and it seemed to work fine.

I know that it will take the same time/memory as the first import, but is there something that can break the repository or produce wrong data?

Thanks,

 - Samuel




--

From: Michael Haggerty
Date: Friday, February 20, 2009 - 11:32 pm

Cool, I'd never thought of that.  It's certainly not by design, but as
you've discovered, the interaction of cvs2git and git *almost* combine
to give you an incremental import.

Alas, it is only "almost".  There are many things that can happen in a
CVS repository that would cause the overlapping part of the history to
disagree between runs of cvs2svn.  The nastiest are things that a VCS
shouldn't really even allow, but are common in CVS, like

- Retroactively adding a file to a branch or tag.  (This is a
much-beloved feature of CVS.)  Since CVS doesn't record the timestamp
when a symbol is added to a file, cvs2git tries (subject to the
constraints of other timestamps) to group all such changes into a single
changeset.  So the creation of the symbol would look different in runs N
vs N+1 of cvs2git--containing different files and likely with a
different timestamp.

- Renaming a file "with history" by renaming or copying the associated
*,v file in the repository.  This retroactively changes the entire
history of that file and thus of all changesets that involved changes to
that file.

- Changing the "text vs binary" or keyword expansion mode of a file.
These properties apply to all revisions of a file, and therefore also
have a retroactive effect.

But even aside from these retroactive changes, the output of cvs2git is
not deterministic in any practical sense (though I've tried to make it
deterministic given *identical* input).  The problem is that there are
so many ambiguities in a CVS history (because CVS doesn't record enough
information) that cvs2git has to use heuristics to decide what
individual file events should be grouped together as commits.  The
trickiest part is that the graph of naively inferred changesets can have
cycles in it, and cvs2git uses several heuristics to decide how to split
up changesets so as to remove the cycles.  (See our design notes [1] for
all the hairy details.)  The CVS commits made between runs N and N+1
could easily change some of the ...
From: Ferry Huberts (Pelagic)
Date: Friday, February 20, 2009 - 1:27 am

Hi,

 I'm actually working on coming up with a patch for a bug I hit that has to to do with safecrlf=true. Maybe now I should coordinate with you?

 See this thread
 http://article.gmane.org/gmane.comp.version-control.git/110152

 Ferry

 Michael Haggerty wrote:    The test suite for "git cvsimport" is pretty limited, and I would like to improve the situation.  This patch series contains the first of what I hope will eventually be
several additions to the "git cvsimport" test suite.  I am the maintainer of cvs2svn/cvs2git.  Most of the new tests will probably use fragments from the cvs2svn test suite.  I should admit that
part of my motivation for adding tests to the "git cvsimport" test suite is to document its weaknesses, which do not seem to be especially well known.  Patch 1 splits out some code into a library
usable by multiple CVS-related tests.  Patch 2 changes the library to add the -f option when invoking cvs (to make it ignore the user's ~/.cvsrc file).  Patch 3 adds a new test to t9600, namely to
compare the entire module as checked out by CVS vs. git.  Patch 4 adds a new test script t9601 that tests "git cvsimport"'s handling of CVS vendor branches.  One of these tests fails due to an
actual bug.  These ideas in the patches are logically independent of each other, but each patch assumes that the previous patches have been applied.  I would like to point out a few things about
these patches that seem a little bit unprecedented in the git test suite.  If other approaches would be preferred, please let me know.  The first is that I would like to introduce a library that can
be used by the "git cvsimport" tests in the t96xx series, simply to avoid code duplication.  I put this library in t/t96xx/cvs-lib.sh, to hopefully make its role clear.  The library has to be
sourced from the main test directory.  (It sources test-lib.sh indirectly.)  The second is that the new test script uses a small CVS repository that is part of the test suite (i.e., the *,v files
are committed ...
From: Michael Haggerty
Date: Saturday, February 21, 2009 - 6:05 am

I am only adding some tests of "git cvsimport"; I definitely don't plan
to become a "git cvsimport" hacker.  But we can certainly work together
on the test infrastructure if it will help you.

Michael
--

From: Ferry Huberts (Pelagic)
Date: Saturday, February 21, 2009 - 6:19 am

I'd like to add a couple of tests to t9600 (as per Johannes' suggestion) 
to test my patch

Ferry
--

Previous thread: none

Next thread: You Received an Ecard by Graham on Thursday, February 19, 2009 - 10:53 pm. (1 message)