Re: synchronizing incremental git changes to cvs

Previous thread: [PATCH] fmt-patch: add --check option by Johannes Schindelin on Saturday, May 20, 2006 - 2:43 pm. (1 message)

Next thread: gitweb.cgi, $my_uri and too many slashes in url by Brandon Philips on Saturday, May 20, 2006 - 3:58 pm. (1 message)
From: Jim Meyering
Date: Saturday, May 20, 2006 - 3:13 pm

Hello,

Can anyone point me at code to mirror a git repository to cvs?
I'd like to develop using git, and have a commit hook mirror the
day-to-day changes (tags/commits) made in the git repo to a
cvs repository.  The idea is that the only way changes get into
the cvs repo is via the git commit hook.

I've experimented with git-cvsexportcommit, and found a few bugs (it
couldn't handle simple things, like adding a file in a new directory --
fixed that, along with a few other minor problems), adding an empty file
in git still gets a patch application error on the cvs side, but I can
live with that for now.  More seriously, making a change on a git branch
mistakenly tries to apply the delta on the cvs trunk.  None of this is
particularly hard to fix -- or even critical, as long as you don't care
about branches.  I'm just hoping someone has already produced something
more robust.  From the looks of darcs/tailor, it doesn't handle the use
of git as a source.

Why am I interested?  I want to switch the development of GNU coreutils
from cvs to git.  I would also like to continue making the repository
available via cvs, for the sake of continuity.  At worst, I can always
cut the CVS cord, but that's a last resort.

Jim
-

From: Johannes Schindelin
Date: Saturday, May 20, 2006 - 4:05 pm

Hi,


If you only want to make a cvs repository available for tracking the 
project, git-cvsserver is what you want. It is even faster than the 
original cvs...

Ciao,
Dscho

-

From: Jim Meyering
Date: Sunday, May 21, 2006 - 6:40 am

That might work if I had sufficient access to the system hosting the
public CVS repository.  But there are restrictions (like no ssh access).
Currently I rsync the master repo to an intermediate site, from which
it is periodically pulled by savannah.  Paranoia on both sides.

If I end up leaving savannah, can someone propose a good site,
i.e., secure, yet with git and rsync access?

I haven't made the leap to git yet, but git-cvsimport (from git-1.3.2)
seems to do a very good job of converting the cvs module (89MB).

FYI, here are some stats on the resulting git repository:

Size (nothing repacked):
  1051MB (du -sh, actual, on reiserfs 3)
   708MB (du --apparent-size)

Size repacked, (via git-repack -a -d && git-prune-packed)
    65MB (du --si -s)

20k+ patchsets (counted by cvsps)
40k+ revisions (counted by cvs ... rlog cu|grep -c '^revision')

While repacking, git said something about more than 100K objects.
There were 120K files under .git/ before repacking.
-

From: Pavel Roskin
Date: Monday, May 22, 2006 - 9:29 am

Hello, Jim!


I believe you have a very good reason to talk to decision makers in FSF.
Savannah is very poorly maintained, and I actually took one of my
projects (Orinoco driver) to SourceForge Subversion.

If losing a Linux driver is next to nothing, losing GNU coreutils is a
big deal for the GNU development site.  You are likely to be heard if

Subversion is as easy as CVS for potential users, but it has a useful
"log" command if nothing else.  It also have real changesets, which

Sorry, I don't know any free git hosters, but here's what you can do:

1) Pressure Savannah to support git
2) Use arch on Savannah
3) Move to Subversion on SourceForge, GNA.org or Berlios and use git-svn

-- 
Regards,
Pavel Roskin

-

From: Martin Langhoff
Date: Saturday, May 20, 2006 - 5:09 pm

Ive thought a couple of times about writing an exporter that would
replay things into a true CVS repo, but it's truly not worth it. We've
already got git-cvsserver that does all that -- better for me to focus

cvsexportcommit is clearly for manual usage, not for automagic usage.
It is a bit rough, (and I'd like to see your patches to it!) but it
wants to be driven by a smarter script to, for instance, know what

git-cvsserver is the word. It currently tracks the git repo itself
pretty well (perfectly, AFAICS) and it also tracks a git tree that is
actually imported daily from CVS -- doing

    CVSrepo ->cvsimport -> GIT -> cvsserver -> CVS checkout

git-cvsserver works great for anon cvs access (does pserver) and
TortoiseCVS and cli cvs work great with it. Eclipse works well, but it
has been quite hard to get 'right'. Optionally, it can support users
with commit rights via ssh. It does track git 'heads' but they don't
show up as branches, they show up as different modules. So you to get
a checkout of the master branch, you do:

    cvs -d pserver:anonymouys@foo.com:/var/foo.git co master

hope that helps!




martin
-

From: Jim Meyering
Date: Sunday, May 21, 2006 - 9:37 am

Thanks, but I'd rather do primary development directly using git,
rather than with CVS.
-

From: Junio C Hamano
Date: Sunday, May 21, 2006 - 11:21 am

I do not use the automated tools myself, but I sync the day-job
work in my git repository to CVS at work.  I do not develop with
CVS but use it merely as a publishing medium.  Although other
people can make commits into CVS in which case I have to slurp
the change back into my git repository.

 (0) Bootstrap.  I did use git-cvsimport myself (this repository
     started before the tool was written).  Instead:

     . cvs checkout the tip of the CVS development history

     . "git init-db", edit .gitignore to ignore CVS, and "git
       add ."

     . "git commit -m epoch"; the git side of development
       history in this repository starts at that point for me. 

     . "git branch origin"; the tip of CVS repository is kept
       track of with this branch.  I work in "master".

     I think I could have done the above with git-cvsimport,

 (1) Beginning of the day.  In case other people did work on
     the CVS side, I do this:

     . "git checkout origin", "cvs -q update".  If there is no
       change, go to step (3).

     . add any new files with "git add", and update the "origin"
       branch with "git commit -a -m 'from CVS'".

 (2) Merge other's work into my git master branch.

     . "git checkout master" and "git pull . cvs"; conflict
       resolve as needed.

 (3) Do my work.

     . "git checkout master" if I haven't done so.     

     . hack away, grow "master" branch using full power of git
       including the use of topic branches etc.

 (4) Publish, when "master" changes are ready.

     . To avoid conflicts with other people working on CVS,
       perform (1) again to make sure "origin" matches the tip
       of CVS.

     . "git checkout origin", "git pull . master".

     . generate the consolidated log I am about to push back to
       CVS with "git log --no-merges ORIG_HEAD.. | git shortlog >L".

     . add any new files with "cvs add", and "cvs commit -F L"

     . go back to (3) and continue.

This can be extended to ...
From: Jim Meyering
Date: Sunday, May 21, 2006 - 2:31 pm

...

Thank you for describing the process you use.
However, since I don't have to allow independent cvs commits,
I hope to bend git-cvsexportcommit to my needs.

I haven't yet tried to restrict the mirroring to commits on a specific
git branch.  So far, in the toy example I'm using to test things,
I have this in .git/hooks/post-commit:

#!/bin/sh
sha1_id=$(git-rev-parse --verify HEAD)
cvsdir=/var/tmp/work-c
cd $cvsdir && GIT_DIR=/var/tmp/git-experiment/work-g/.git \
    git-cvsexportcommit -v -c -p $sha1_id

I'll clean up and post the changes I've made to git-cvsexportcommit

I'll send one report separately.
-

From: Jim Meyering
Date: Sunday, May 21, 2006 - 2:35 pm

In a very shallow audit, I spotted code where overflow was not detected.
But it's hardly critical.

Currently,

  git-diff HEAD HEAD

is equivalent to this

  git-diff HEAD HEAD~18446744073709551616   # aka 2^64

Exercising git-rev-parse directly, currently I get this:

  $ git-rev-parse --no-flags --sq HEAD~18446744073709551616
  '639ca5497279607665847f2e3a11064441a8f2a6'

It'd be better to produce a diagnostic and fail:

  $ ./git-rev-parse --no-flags --sq -- HEAD~18446744073709551616 > /dev/null
  fatal: ambiguous argument 'HEAD~18446744073709551616': unknown revision or filename

The code in question is in sha1_name.c (get_sha1_1):

               int num = 0;
               ...
               while (cp < name + len)
                       num = num * 10 + *cp++ - '0';

Looking at how to fix it, my first reflex was to replace that loop
with this one:

		while (cp < name + len) {
			int tmp = num * 10 + *cp++ - '0';
			if (INT_MAX / 10 < num || tmp < num)
				return -1;
			num = tmp;
		}

But INT_MAX is used nowhere else, so I wonder if git avoids using
it for some reason.  At least `make check' gripes about __INT_MAX__.
Anyhow, here's the patch I used.  With it, git still passes `make test'.

diff --git a/sha1_name.c b/sha1_name.c
index dc68355..c813ba0 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -429,8 +429,12 @@ static int get_sha1_1(const char *name,
 		int num = 0;
 		int len1 = cp - name;
 		cp++;
-		while (cp < name + len)
-			num = num * 10 + *cp++ - '0';
+		while (cp < name + len) {
+			int tmp = num * 10 + *cp++ - '0';
+			if (INT_MAX / 10 < num || tmp < num)
+				return -1;
+			num = tmp;
+		}
 		if (has_suffix == '^') {
 			if (!num && len1 == len - 1)
 				num = 1;
-

From: Jim Meyering
Date: Sunday, May 21, 2006 - 11:57 pm

This is another one of those `would be nice' sort of changes.
Probably not worth much at this early stage in development, but
eventually worth changing.

There are about 20 uses of atoi, and most calls can return
a usable result in spite of an invalid input -- just because
atoi returns the same thing for "99" as "99-and-any-suffix".
It would be better not to ignore invalid inputs.

-------------------
Also, integer overflow in object.c can cause trouble.
When the xrealloc byte count exceeds 2^32 (for a 32-bit int),
xrealloc will happily return a buffer of the requested (small) size,
but the following memset will scribble zeroes far beyond the end
of that new buffer.

static int nr_objs;
int obj_allocs;
...
void created_object(const unsigned char *sha1, struct object *obj)
{
...
	if (obj_allocs - 1 <= nr_objs * 2) {
		int i, count = obj_allocs;
		obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs);
		objs = xrealloc(objs, obj_allocs * sizeof(struct object *));
		memset(objs + count, 0, (obj_allocs - count)
				* sizeof(struct object *));

But this may be only theoretical, because the problem doesn't strike
until there are over 250M objects (assuming 32-bit int and 8-byte pointers).
-

From: Morten Welinder
Date: Monday, May 22, 2006 - 6:16 am

atoi has undefined behaviour for "99-and-any-suffix".  You might
get lucky and get back 99, but you might also get a random value
or a core dump.

Morten
-

From: Jim Meyering
Date: Monday, May 22, 2006 - 6:31 am

I've never heard of that.
POSIX says that atoi(str) is equivalent to:

    (int) strtol(str, (char **)NULL, 10)
    except that the handling of errors may differ.
    If the value cannot be represented, the behavior is undefined.

Since strtol works fine with such a suffix, and since 99 can be
represented, I don't see why there would be any undefined behavior.

Do you know of an implementation for which `atoi ("99-and-any-suffix")'
does anything other than return 99?
-

From: Jeff King
Date: Monday, May 22, 2006 - 6:37 am

Where do you get that from? The standard claims that it converts "the
initial portion of the string pointed to" (7.20.1.2). Furthermore, atoi
is equivalent to strtol with a base of 10 (with the exception of range
errors). From 7.20.1.4, paragraph 2:
  The strtol [...] functions [...] decompose the input string into three
  parts: an initial, possibly empty, sequence of white-space characters
  [...], a subject sequence resembling an integer represented in some
  radix determined by the value of base, and a final string of one or
  more unrecognized characters...
If no conversion can be performed (i.e., you feed it garbage with no
number), zero is returned.

atoi does NOT handle range errors, however; the behavior is undefined in
that case. In practice, I expect most implementations do some sort of
wrapping.

-Peff
-

From: Morten Welinder
Date: Monday, May 22, 2006 - 6:54 am

My copy (which is admittedly a draft because I am cheap) does not
restrict undefined behaviour to _range_ errors, but simply says
"Except for the behavior on error, they are equivalent to [the strtol call]"

M.
-

From: Jim Meyering
Date: Monday, May 22, 2006 - 3:27 am

git doesn't always detect write failures.  A write I/O error,
(e.g., hardware I/O error or simply disk full)
doesn't provoke nonzero exit status:

    $ ./git-cat-file -t HEAD > /dev/full && echo did not detect write failure
    did not detect write failure

This is perhaps more important than the other things
I've reported, since it can lead to porcelain being unable
to detect a real failure in the plumbing.

Here are two more:

    $ ./git-ls-tree HEAD > /dev/full && echo fail
    fail
    $ ./git-show > /dev/full && echo fail
    fail

If you were using gnulib, I'd suggest simply adding this line

    atexit (close_stdout);

near the beginning of each `main'.  Then you wouldn't have to
manually track down each and every place where a write to stdout
can occur -- not to mention the maintenance burden of keeping
things correct as the code evolves.
-

Previous thread: [PATCH] fmt-patch: add --check option by Johannes Schindelin on Saturday, May 20, 2006 - 2:43 pm. (1 message)

Next thread: gitweb.cgi, $my_uri and too many slashes in url by Brandon Philips on Saturday, May 20, 2006 - 3:58 pm. (1 message)