Re: [PATCH] fmt-merge-msg: avoid open "-|" list form for Perl 5.6

Previous thread: [PATCH] Add git-annotate, a tool for assigning blame. by Ryan Anderson on Monday, February 20, 2006 - 3:46 am. (13 messages)

Next thread: none
From: Johannes Schindelin
Date: Monday, February 20, 2006 - 11:37 am

Hi,

I just had a failure when pulling, because since a few days (to be exact, 
since commit 1cb30387, git-fmt-merge-msg uses a syntax which is not 
understood by Perl 5.6.

It is this:

	open $fh, '-|', 'git-symbolic-ref', 'HEAD' or die "$!";

I know that there was already some discussion on this list, but I don't 
remember if we decided on leaving 5.6 behind or not.

Somebody remembers?

Ciao,
Dscho

-

From: Eric Wong
Date: Monday, February 20, 2006 - 12:10 pm

This is just 5.8 shorthand for the following (which is 5.6-compatible,
and probably for earlier versions, too):

	my $pid = open my $fh, '-|';
	defined $pid or die "Unable to fork: $!\n";
	if ($pid == 0) {
		exec 'git-symbolic-ref', 'HEAD' or die "$!";
	}
	<continue with original code here>


IIRC, there was no clear decision.

I still have some Debian Woody machines/chroots with 5.6 around in some
places.  I don't use git on them, but I may someday, but upgrading to
Sarge is more likely on those.

-- 
Eric Wong
-

From: Andreas Ericsson
Date: Monday, February 20, 2006 - 2:01 pm

I think we agreed not to bother at all with Perl 5.4 and earlier, and 
not to bend over backwards to support 5.6. This seems like a simple fix 
though, so I'm sure Junio will accept a patch.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Junio C Hamano
Date: Monday, February 20, 2006 - 2:15 pm

Correct.  I wasn't being careful enough.

-

From: Junio C Hamano
Date: Monday, February 20, 2006 - 3:05 pm

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * Eric, thanks for the hint.  I have this four-patch series.
   Could people with perl 5.6 please check them?

 git-fmt-merge-msg.perl |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

615782c9609bf23be55b403e994d88c1047be996
diff --git a/git-fmt-merge-msg.perl b/git-fmt-merge-msg.perl
index c34ddc5..a77e94e 100755
--- a/git-fmt-merge-msg.perl
+++ b/git-fmt-merge-msg.perl
@@ -28,11 +28,12 @@ sub andjoin {
 }
 
 sub repoconfig {
-	my $fh;
 	my $val;
 	eval {
-		open $fh, '-|', 'git-repo-config', '--get', 'merge.summary'
-		    or die "$!";
+		my $pid = open(my $fh, '-|');
+		if (!$pid) {
+			exec('git-repo-config', '--get', 'merge.summary');
+		}
 		($val) = <$fh>;
 		close $fh;
 	};
@@ -41,25 +42,32 @@ sub repoconfig {
 
 sub current_branch {
 	my $fh;
-	open $fh, '-|', 'git-symbolic-ref', 'HEAD' or die "$!";
+	my $pid = open($fh, '-|');
+	die "$!" unless defined $pid;
+	if (!$pid) {
+	    exec('git-symbolic-ref', 'HEAD') or die "$!";
+	}
 	my ($bra) = <$fh>;
 	chomp($bra);
+	close $fh or die "$!";
 	$bra =~ s|^refs/heads/||;
 	if ($bra ne 'master') {
 		$bra = " into $bra";
 	} else {
 		$bra = "";
 	}
-
 	return $bra;
 }
 
 sub shortlog {
 	my ($tip, $limit) = @_;
 	my ($fh, @result);
-	open $fh, '-|', ('git-log', "--max-count=$limit", '--topo-order',
-			 '--pretty=oneline', $tip, '^HEAD')
-	    or die "$!";
+	my $pid = open($fh, '-|');
+	die "$!" unless defined $pid;
+	if (!$pid) {
+	    exec('git-log', "--max-count=$limit", '--topo-order',
+		 '--pretty=oneline', $tip, '^HEAD') or die "$!";
+	}
 	while (<$fh>) {
 		s/^[0-9a-f]{40}\s+//;
 		push @result, $_;
-- 
1.2.2.g5be4ea


-

From: Junio C Hamano
Date: Monday, February 20, 2006 - 3:19 pm

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * Fifth of the four patch series.  I cannot count ;-).

 git-cvsimport.perl |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

eb815c1bb8a40ae18d80e99f8547137ea05318bf
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index 24f9834..b46469a 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -846,8 +846,12 @@ while(<CVS>) {
 			print "Drop $fn\n" if $opt_v;
 		} else {
 			print "".($init ? "New" : "Update")." $fn: $size bytes\n" if $opt_v;
-			open my $F, '-|', "git-hash-object -w $tmpname"
+			my $pid = open(my $F, '-|');
+			die $! unless defined $pid;
+			if (!$pid) {
+			    exec("git-hash-object", "-w", $tmpname)
 				or die "Cannot create object: $!\n";
+			}
 			my $sha = <$F>;
 			chomp $sha;
 			close $F;
-- 
1.2.2.g5be4ea


-

From: Alex Riesen
Date: Tuesday, February 21, 2006 - 10:30 am

Does not work here (ActiveState Build 811, Perl 5.8.6):

$ perl -e 'open(F, "-|")'
'-' is not recognized as an internal or external command,
operable program or batch file.
-

From: Sam Vilain
Date: Tuesday, February 21, 2006 - 1:36 pm

Portability, Ease of Coding, Few CPAN Module Dependencies.  Pick any two.

Sam.
-

From: Alex Riesen
Date: Tuesday, February 21, 2006 - 2:57 pm

Sometimes an upgrade is just out of question. Besides, that'd mean an
upgrade to another operating system, because very important scripts
over here a just not portable to anything else but
    "ActiveState Perl on Windows (TM)"
I just have no choice.
-

From: Johannes Schindelin
Date: Tuesday, February 21, 2006 - 3:19 pm

Hi,


Maybe I am stating the obvious, but it seems that

	open (F, "git-blabla -option |");

would be more portable.

Alex, would this work on ActiveState?

Perl gurus, is the latter way to open a pipe considered awful or what?

Ciao,
Dscho

P.S.: Eric, we rely on fork() anyway. Most of git's programs just don't 
work without a fork().

-

From: Eric Wong
Date: Tuesday, February 21, 2006 - 3:35 pm

It's OK as long as all arguments are are shell-safe (quoted/escaped

Yes, apparently there's some fork() emulation in some *doze places and
not others.

-- 
Eric Wong
-

From: Shawn Pearce
Date: Tuesday, February 21, 2006 - 3:38 pm

Yes but that gets broken up and processed according to your shell.
Which could be ugly if you try to include shell meta-characters.
On the other hand if the entire string passed to open is a constant
in the script then there's really no danger and it would be more

Which is why GIT requires Cygwin on Windows.  So why not use
the Cygwin perl when using GIT?  I think that uses Cygwin's fork
emulation to implement fork, rather than the ActiveState emulation
of fork.

Of course fork on Cygwin is painfully slow.  :-|

-- 
Shawn.
-

From: Martin Langhoff
Date: Tuesday, February 21, 2006 - 4:00 pm

And

    open (F, "git-blabla|", '-option', '$%!|');

would be portable AND safe ;-)

cheers,


martin
-

From: Sam Vilain
Date: Tuesday, February 21, 2006 - 3:38 pm

Sure, but perhaps IPC::Open2 or some other CPAN module has solved this 
problem already.

I guess what I'm saying is that if you want to limit the modules that 
Perl script uses, you end up either impacting on the portability of the 
script or rediscovering problems with early wheel designs.

Sam.
-

From: Alex Riesen
Date: Wednesday, February 22, 2006 - 9:35 am

IPC::Open2 works! Well "kind of": there are still strange segfaults regarding
stack sometimes. And I don't know yet whether and how the arguments are escaped

IPC::Open{2,3} seem to be installed on every system I have access to.
-

From: Johannes Schindelin
Date: Wednesday, February 22, 2006 - 12:44 pm

Hi,


I can confirm that the platforms I usually work on also provide it 
(Linux, Linux, old IRIX, old macosx, MinGW32).

Ciao,
Dscho

-

From: Sam Vilain
Date: Wednesday, February 22, 2006 - 12:51 pm

Checking in Module::CoreList, that module goes right back to the Perl 
5.0 release, so every normal Perl 5 distribution should have it.

Sam.
-

From: Junio C Hamano
Date: Wednesday, February 22, 2006 - 12:54 pm

Good digging, but IIRC this thread started because something
that _claims_ to be 5.8 does not grok open(F, '-|') correctly,
so...


-

From: Johannes Schindelin
Date: Wednesday, February 22, 2006 - 3:00 pm

Hi,


Note that there is a notable decrease in performance in my preliminary 
tests (about 10%).

Ciao,
Dscho

-

From: Junio C Hamano
Date: Wednesday, February 22, 2006 - 3:25 pm

Doesn't open(F, "| foo bar") or open(F, "foo bar |") with
careful shell quoting work?

-

From: Alex Riesen
Date: Thursday, February 23, 2006 - 1:00 am

I'll keep that in mind. But there are places where a safe pipe is unavoidable
(filenames. No amount of careful quoting will save you).
-

From: Junio C Hamano
Date: Thursday, February 23, 2006 - 1:45 am

Huh?

-

From: Alex Riesen
Date: Thursday, February 23, 2006 - 2:35 am

Because you never know what did the next interpreter took for unquoting:
$SHELL, /bin/sh cmd /c, or something else.
-

From: Alex Riesen
Date: Thursday, February 23, 2006 - 2:41 am

And that stupid activestate thing actually doesn't use any. Just tried:

  perl -e '$,=" ";open(F, "sleep 1000 ; # @ARGV |") and print <F>'

It passed the whole string "1000 ; # @ARGV" to sleep from $PATH.
It failed to sleep at all, of course. The same code works perfectly on
almost any UNIX system.
-

From: Andreas Ericsson
Date: Thursday, February 23, 2006 - 2:48 am

Not to be unhelpful or anything, but activestate perl seems to be quite 
a lot of bother. Is it worth supporting it?

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Alex Riesen
Date: Thursday, February 23, 2006 - 3:10 am

It's not activestate perl actually. It's only one platform it also
_has_ to support.
Is it worth supporting Windows?
-

From: Andreas Ericsson
Date: Thursday, February 23, 2006 - 6:29 am

With or without cygwin? With cygwin, I'd say "yes, unless it makes 
things terribly difficult to maintain and so long as we don't take 
performance hits on unices". Without cygwin, I'd say "What? It runs on 
windows?".

If we claim to support windows but do a poor job of it, no-one else will 
start working on a windows-port. If we don't claim to support windows 
but say that "it's known to work with cygwin, although be aware of these 
performance penalties...", eventually someone will come along with their 
shiny Visual Express and hack up support for it, even if some tools will 
be missing and others unnecessarily complicated.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Alex Riesen
Date: Thursday, February 23, 2006 - 7:07 am

There not much difference with or without cygwin. The penalties of
doing any kind of support for it will pile up (as they started to do
with pipes).
Someday we'll have to start dropping features on Windows or restrict them
beyond their usefullness. The fork emulation in cygwin isn't perfect,
signals do not work reliably (if at all), filesystem is slow and locked down,
and exec-attribute is NOT really useful even on NTFS (it is somehow related
to execute permission and open files. I still cannot figure out how exactly

That seem to be the case, except for shiny.
(I really don't know what could possibly mean by that. It stinks, smears,
and sometimes bounces. Never saw it shining).
-

From: Andreas Ericsson
Date: Thursday, February 23, 2006 - 7:22 am

The logo has a little glint thing on it, like those things that go 
'ting' on a front tooth in commercials for toothpaste and particularly 
healthy chewing gum.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Linus Torvalds
Date: Thursday, February 23, 2006 - 10:13 am

One thing that would help a bit would be to avoid shell.

There are many portable interpreters out there, and I don't mean perl. And 
writing a small "specialized for git" one isn't even that hard. In fact, 
most of the shell (and bash) hackery we do now would be unnecessary if we 
just made a small "git interpreter" that ran "git scripts".

The fact that it would also help portability is just an added advantage.

		Linus
-

From: Junio C Hamano
Date: Thursday, February 23, 2006 - 12:32 pm

Before anybody mentions tcl ;-).

I agree with the above in principle, but I am afraid that is
only half of the solution to the problem Alex is having.

In the longer term, libified git with script language bindings
would make the way git things work together a lot better.  I've
always wanted to make merge-base a subroutine callable from
other things, so that I can say "git diff A...B" to mean "diff
up to B since B forked from A" ;-).

That way, we would eliminate the current common pattern of
piping rev-list output to diff-tree, or ls-files/diff-files
output to update-index --stdin.  These components live in the
single process, a calling "git script", and will talk with each
other internally.

But we do need to talk to non-git things.  git-grep needs a way
for ls-files to drive xargs/grep, for example.  diff --cc reads
from GNU diff output.  And for these external tools, the way
they expect the input to be fed to them or their output is taken
out is via UNIXy pipe.

And the breakage Alex wants to work around is that the platform
is not friendly to pipes, if you deny Cygwin.  So I suspect
avoiding shell would not help much.


-

From: Johannes Schindelin
Date: Thursday, February 23, 2006 - 12:38 pm

Hi,


Darn, I had my suggestion sent out: Java ;-)

Ciao,
Dscho
-

From: Linus Torvalds
Date: Thursday, February 23, 2006 - 12:54 pm

I do see the smileys, but the fact is, "perl" is a hell of a lot more 
portable than either, if we want to talk executing processes and pipelines 
etc. But even perl is clearly not portable enough, and has tons of version 
skew.

Java, afaik, has absolutely _zero_ support for creating a new process and 
piping its output to another one and doing things like safe argument 
expansion. Which is what almost all of the git scripts are all about.

		Linus
-

From: Johannes Schindelin
Date: Thursday, February 23, 2006 - 1:19 pm

Hi,


You are right, but for the wrong reason. Java is actually a wonderful 
thing to create new processes and talk between threads.

But Java is HUGE. No, it is rather HOOODGEEE.

And I don't know if something like Lua does any good. The problem is not 
so much the language. It is the fork().

AFAIAC, cygwin is pretty good at hiding Windows behind sortofa POSIX 
layer. <tongue-in-cheek>It hides it behind a POSIX layer *and* a 
performance hit.</tongue-in-cheek>

I would rather like to see how all the fork()ing and |'ing can be done 
with MinGW32.

Ciao,
Dscho

-

From: Linus Torvalds
Date: Thursday, February 23, 2006 - 12:51 pm

Well, I was thinking more of the "embeddable" ones - things that are so 
small that they can be compiled with the project. Things like Lua.

Now, Lua is not really very useful for this use case: our scripts are much 
more about combining other programs - piping the output from one to the 
other - than about any traditional scripting. Which, afaik, Lua isn't good 

Yeah, we should libify some of it, to make things easier. That said, I 
don't belive in the "big-picture" libification. The fact is, a lot of git 
really _is_ about piping things from one part to another, and library 
interfaces work horribly badly for that. You really want more of a 
"stream" interface, and that's just not something I see happening.

I think one of the strengths of git is that you can use it in a very 
traditional UNIX manner, and do your own pipelines. And that will 
obviously NEVER work well under Windows, if only because it's not the 
natural way to do things.

Again, libification does nothing for that thing.

What I'd suggest using an embedded interpreter for is literally just the 
common helper scripts. We'll never make 

	git-rev-list --header a..b -- tree | 
		grep -z '^author.*torvalds' |
		..

style interesting power-user pipelines work in windows, but we _can_ make 
the things like "git commit" work natively in windows without having to 
re-write it in C by just having an embedded interpreter.

And I very much mean _embedded_. Otherwise we'll just have all the same 

I was really thinking more of a simple shell-like script interpreter. 
Something that we can make portable, by virtue of it _not_ being real 
shell. For example, the "find | xargs" stuff we do is really not that hard 
to do portably even on windows using standard C, it's just that you can't 
do it THAT WAY portably without assuming that it's a full cygwin thing.

		Linus
-

From: Sam Vilain
Date: Thursday, February 23, 2006 - 1:31 pm

I like the term "Domain Specific Language" to refer to this sort of 
thing.  It even hints at using the right kind of tools to achieve it, too :)

Sam.
-

From: Linus Torvalds
Date: Thursday, February 23, 2006 - 11:43 pm

Just for fun, I wrote a first cut at a script engine for passing pipes 
around.

It's designed so that the "fork+exec with a pipe" should be easily 
replaced by "spawn with a socket" if that's what the target wants, but 
it also has some rather strange syntax, so I'm in no way claiming that 
this is a sane approach.

It was fun to write, though. You can already do some strange things with 
it, like writing a script like this

	set @ --since=2.months.ago Makefile
	exec git-rev-parse --default HEAD $@
		stdout arguments
	exec git-rev-list $arguments
		stdout revlist
	exec git-diff-tree --pretty --stdin
		stdin revlist
		stdout diff-tree-output
	exec less -S
		stdin diff-tree-output

which kind of shows the idea (it sets the "@" variable by hand, because 
the silly "git-script" thing doesn't set it itself).

I'm not sure this is worth pursuing (it really is a very strange kind of 
script syntax), but it was amusing to do. 

No docs - if you want to know how it works, you'll just have to read the 
equally strange sources.

		Linus

----
diff-tree 3e7dbcaae63278ccd413d93ecf9cba65a0d07021 (from d27d5b3c5b97ca30dfc5c448dc8cdae914131051)
Author: Linus Torvalds <torvalds@osdl.org>
Date:   Thu Feb 23 22:06:12 2006 -0800

    Add really strange script engine

diff --git a/Makefile b/Makefile
index 0c04882..247030b 100644
--- a/Makefile
+++ b/Makefile
@@ -164,7 +164,7 @@ PROGRAMS = \
 	git-upload-pack$X git-verify-pack$X git-write-tree$X \
 	git-update-ref$X git-symbolic-ref$X git-check-ref-format$X \
 	git-name-rev$X git-pack-redundant$X git-repo-config$X git-var$X \
-	git-describe$X git-merge-tree$X
+	git-describe$X git-merge-tree$X git-script$X
 
 # what 'all' will build and 'install' will install, in gitexecdir
 ALL_PROGRAMS = $(PROGRAMS) $(SIMPLE_PROGRAMS) $(SCRIPTS)
@@ -204,7 +204,7 @@ LIB_OBJS = \
 	quote.o read-cache.o refs.o run-command.o \
 	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
 	tag.o tree.o usage.o config.o environment.o ctype.o ...
From: Alex Riesen
Date: Thursday, February 23, 2006 - 2:43 pm

I actually was dreaming about taking a vacation and rewrite at least
the most important scripts in C, but without cygwin. Implement the
needed subset of POSIX in compat/, workaround fork.

That'd help me to present git to my collegues without requiring them
to install cygwin, perl and python first. It is a real problem to
explain why a new tool is better than the old one if the problem start
right from installation, and it probably wont matter how bad the old
tool is (it is, they know that too, but it has windows, doors and a
mostly running man for busy-waiting cursor).

A gits own interpreter would be more than, of course.

-

From: Christopher Faylor
Date: Sunday, February 26, 2006 - 12:55 pm

If the speed of cygwin's fork is an issue then I'd previously suggested
using spawn*.  The spawn family of functions were designed to emulate
Windows functions of the same name.  They start a new process without

I'm not sure if you're mixing cygwin with windows here but if signals do
not work reliably in Cygwin then that is something that we'd like to
know about.  Signals *obviously* have to work fairly well for programs
like ssh, bash, and X to work, however.

Native Windows, OTOH, hardly has any signals at all and deals with

Again, it's not clear if you're talking about Windows or Cygwin but
under Cygwin, in the default configuration, the exec attribute means the
same thing to cygwin as it does to linux.

As always, if you have questions or problems with cygwin, you can ask in
the proper forum.  The available cygwin mailing lists are here:
http://cygwin.com/lists.html.

Would getting git into the cygwin distribution solve any problems with
git adoption on Windows?  This would get an automatic green light from
anyone who was interested, if so.  Someone would just have to send an
"ITP" (Intent To Package) to the cygwin-apps mailing list and provide a
package using the guidelines here: http://cygwin.com/setup.html .

cgf
--
Christopher Faylor			spammer? ->	aaaspam@sourceware.org
Cygwin Co-Project Leader
TimeSys, Inc.
-

From: Linus Torvalds
Date: Sunday, February 26, 2006 - 1:18 pm

I thought that cygwin didn't implement the posix_spawn*() family?

Anyway, we probably _can_ use posix_spawn() in various places, and 
especially if that helps windows performance, we should.

		Linus
-

From: Christopher Faylor
Date: Sunday, February 26, 2006 - 1:40 pm

Right.  It just implements the windows version of spawn.  I looked more
closely at the posix_spawn functions after you last suggested it and,
while it would be possible to implement this in cygwin, these functions
are a lot more heavyweight than the windows-like implementation of spawn
that are already in cygwin.  So, they would come with their own
performance penalty.

The cygwin/windows version of spawn is basically like an extended version
of exec*():

pid = spawnlp (P_NOWAIT, "/bin/ls", "ls", "-l", NULL);

will start "/bin/ls" and return a pid which can be used in waitpid.
There is still some overhead to this function but it basically is just a
wrapper around the Windows CreateProcess, which means that it doesn't
go through the annoying overhead of Cygwin's fork.

The posix_spawn stuff is in my todo list but the Windows spawn stuff
could be used now.

cgf
-

From: Alex Riesen
Date: Thursday, March 2, 2006 - 7:18 am

By the way, is argv worked around?
AFAIK, windows has only one argument, returned by GetCommandLine?
-

From: Christopher Faylor
Date: Thursday, March 2, 2006 - 8:22 am

Cygwin passes an argv list between cygwin processes and a quoted command
line to pure windows processes.

cgf
-

From: Alex Riesen
Date: Thursday, March 2, 2006 - 9:20 am

What for? They can't use it anyway.

$ notepad '"abc"'
-

From: Rutger Nijlunsing
Date: Sunday, February 26, 2006 - 4:17 pm

I don't know about native Windows speed, but comparing NutCracker with
Cygwin on a simple 'find . | wc -l' already gives a clue that looking
at Cygwin to benchmark NT file inspection IO will give a skewed
picture:

##### NutCracker
$ time find . | wc -l

real    0m 1.44s
user    0m 0.45s
sys     0m 0.98s
  25794

##### Cygwin
$ time c:\\cygwin\\bin\\find . | wc -l

real    0m 6.72s
user    0m 1.09s
sys     0m 5.59s
  25794

##### CMD.EXE + DIR /S
C:\PROJECT> c:\cygwin\bin\time cmd /c dir /s >NUL
0.01user 0.01system 0:05.70elapsed 0%CPU (0avgtext+0avgdata 6320maxresident)k
0inputs+0outputs (395major+0minor)pagefaults 0swaps

##### Cygwin 'find -ls' (NutCracker doesn't have a '-ls')
C:\PROJECT> c:\cygwin\bin\time c:\cygwin\bin\find -ls | wc -l
2.79user 7.81system 0:10.60elapsed 100%CPU (0avgtext+0avgdata 14480maxresident)k
  25794


Regards,
Rutger.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-

From: Christopher Faylor
Date: Sunday, February 26, 2006 - 6:18 pm

I'm lost.  What does this have to do with the exec attribute?

Or, were you just climbing aboard the "Cygwin sure is slow" bandwagon?

cgf
-

From: Rutger Nijlunsing
Date: Monday, February 27, 2006 - 11:30 am

I tried to get on the bandwagon 'NT file IO magnitudes slower => git
magnitudes slower', but missed the parade a week ago. Then another
parade showed up, but I managed to delete most of it with a
misfortunate shift-something in mutt... And then even messed up in
keeping the wrong paragraph... *hmpf*

However, the point I was trying to make was that git might be sped up
by a magnitude (although not all of the magnitudes in comparison to
Linux) by looking at why the file IO is this slow: Windows' file IO is
_not_ the only reason. Using a different/new/better fitted interface
to Cygwin or Win32 for a specific git task might help, although I have
no clue what or how.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-

From: Christopher Faylor
Date: Monday, February 27, 2006 - 11:34 am

I'm going to revisit Cygwin's file I/O soon to see if I can figure out
what's adding the overhead and see if it really is inavoidable.  It's
been a while since I've gone through that exercise so it should prove
instructive.

cgf
-

From: Andreas Ericsson
Date: Monday, February 27, 2006 - 2:19 am

Well, naturally. Cygwin is a userland implementation of a sane 
filesystem on top of a less sane one. File IO is bound to be slower when 
one FS is emulated on top of another. I think cygwin users are aware of 
this and simply accept the speed-for-sanity tradeoff. I know I would.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Rutger Nijlunsing
Date: Monday, February 27, 2006 - 11:45 am

MKS NutCracker tries to solve the same issues as Cygwin tries to
solve. But maybe less sane, I don't know. But a simple 'find' is
several times faster than a Cygwin 'find'. Yes, very
unscientific. Just as unscientific as 'git is slow on Windows,
therefore Windows IO is slow'.

I'm not saying Cygwin is bad (actually, I'm installing on every
Windows PC I get my hand on ;), but using Cygwin for all file IO
instead of native Windows IO makes git a magnitude slower on Windows
than could-be. So a small portability layer with a function like
'given all filenames with all mtimes' might help, or we could look at
why Cygwin is slower in this case. Alas my Windows profiling skills
aren't that good...

Regards,
Rutger.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-

From: Alex Riesen
Date: Thursday, March 2, 2006 - 6:40 am

By "slow filesystem" I actually meant the native filesystem access.
Cygwin does make it 6 times slower, that's right, and this can be
considered a disaster of course, but not as big as the windows api.
-

From: Alex Riesen
Date: Thursday, March 2, 2006 - 7:10 am

Christopher, I'm terribly sorry for the long delays,
but that is something I can't change at this moment.


The effort of porting git to spawn-like interface has already started,

That's not enough.

That makes the rest of installed system kind of useless in cygwin
environment. After interrupting a build process, which uses java
(don't ask) only make stops. The rest of the process runs happily
away.

Now, I know that windows has no signals at all and nothing which
even closely resembles them. I wont be pressing anyone to
implement them in windows, having the knowledge.
What I'd actually suggest is to drop their implementation entierly,
returning ENOSYS, so that programs are not fooled into believing
that the system will work as expected. It never will.
"Ctrl-C" in windows console is just a shortcut to TerminateProcess,

I'm talking about git and native windows interaction: I cannot use umask,
because I have to use stupid windows programs, and they always create
"executable" *.c and *.h, and I cannot blindly remove it with something
like "chmod -R -x", because it'd remove it also from executables. The
poor executables lose their _rights_ to be executed (why does cygwin use
windows permissions? They cannot correlate to unix attributes, can they?)
An .bat or .cmd without right to execute it is a pain in my build system
(and no, I'm not allowed to change that damn stupid build system).

Is there any way to tell cygwin that the files it hasn't seen or touched yet
are _not_executables_?
-

From: Christopher Faylor
Date: Thursday, March 2, 2006 - 8:00 am

Are you saying that typing CTRL-C doesn't work when you use "git pull"?
If so, give me a real bug report that I can look into.  I interrupt
"busy" processes on cygwin all of the time so I'm not going to spend a
few hours typing "git pull" on my system only to find out that you are
talking about an environment that uses ActiveState perl on Windows 95
using Netware.



Actually, Windows does understand CTRL-C and any native windows console
program should honor CTRL-C in a manner similar to UNIX, i.e., if the
program doesn't trap SIGINT with 'signal()', it will cause the program
to terminate.  There are also other mechanisms for a native windows
program to deal with CTRL-C so this really shouldn't be an issue for

You're not being clear again, but if you are actually promoting the
notion of cygwin not implementing signals then that is a really daft
idea.  Really.  Go to the Cygwin web site and look at all of the
packages which have been ported.  Now think about how they would work if
Cygwin didn't support signals.  bash wouldn't work, openssh, X wouldn't

Let me say it again since it isn't clear that you are getting it.  If
signals in a pure cygwin environment don't work then that is *a bug*.
If you are running pure windows programs in the mix with cygwin programs
then if *they* don't stop when you hit CTRL-C, that is undoubtedly a bug
in that pure windows program.

If you find that a pure windows program terminates when run from a
windows command prompt but keeps running when run by a cygwin program
then that is likely a cygwin problem that can be reported to the cygwin

I'd suggest that using git with native windows programs should probably
be considered "unsupported" since you seem to be having so much trouble


Please read the Cygwin user's guide for a discussion about how file
permissions are implemented.  And, then, when you are outraged about how
unclear that documentation is please send comments and improvements to
the cygwin mailing list.

I don't see why it is ...
From: Alex Riesen
Date: Thursday, March 2, 2006 - 9:10 am

It does. Almost always. It's the seldom cases when this does not


I am NOT reporting a problem. Everyone knows there are these problems,
it's just almost no one (including me) cares enough about getting anything
to work sanely on windows.

Please, stop assuming that every my complaint is a bug report about
cygwin. It is not. You can use my mails as you please, even as bug reports.
If you ask nicely, I can provide more details maybe. But I am not asking
YOU for anything, and not complaining to YOU about anything.

I _do_not_ like how Cygwin workarounds windows, but I respect the
effort and understand why it happens. Still, I'd prefer it die. I'll try to

In windows you have to do hell of a lot useless typing to write what you

That's right. They are not _ported_. I'm not interested in xterm which


Maybe. I wouldn't blame that poor windows programmer though: it's hard,

gui applications detach from cmd (not from cygwin console),


find . -name '*.[ch]' -o -name '*.[ch]pp' -o -name Makefile -o -name
'*.txt' -o ...ooh! damn it^C -print0| xargs -0 chmod -x
You oversimplifying.
-

From: Andreas Ericsson
Date: Thursday, March 2, 2006 - 10:39 am

Ye gawds, Alex. If you complained this much to your employer you'd get 
to run whatever OS you want.

Alex Riesen wrote:

[ lots of things ]

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Alex Riesen
Date: Thursday, March 2, 2006 - 3:01 pm

I never stopped. I usually manage to convince them, it just hasn't
happened here yet.

-

From: Christopher Faylor
Date: Sunday, February 26, 2006 - 1:33 pm

Well, with Cygwin, you've at least got the ear of one of the Cygwin
maintainers, which should be worth something.

Even if I disappear, you can always send concerns to the Cygwin mailing
list.  Do the ActiveState folks respond to complaints about things as
basic as pipes not working in perl?

Cygwin's goal is to make Windows look as much like Linux as we can
manage, so, unless we're total incompetents (which has been hinted in
this mailing list from time to time), it has *got* to be better,
source-code-wise to target Windows-running-Cygwin than
just-plain-Windows.  However, as has been noted, that means that there
will be a speed tradeoff.

I think that, for most projects, the convenience of not having to
clutter the code with substantial accommodations for the windows/POSIX
mismatch usually offsets the annoyance of the speed penalty.  Maybe
that's not the case for git, however.

Anyway, we're willing, within the limits of available time, to help out
where git uncovers issues with Cygwin.  I just fixed some stuff in
dirent.h in the last Cygwin release, as a direct result of people noting
a problem here.  Basically, I don't want git to be a morasse of #ifdef
__CYGWIN_'s and I'll do whatever I can to help.

We're always trying to tweak things to improve speed in Cygwin and am
open to intelligent suggestions about how we can make things better.
The dance between total linux compatibility and speed is one that we
struggle with all of the time and, sadly, over time, we've probably
sacrificed speed in the name of functionality.  That's probably because
it's easy to fix a problem like "close-on-exec doesn't work for sockets"
and feel good that you've fixed a bug even if you've just added a few
microseconds to fork/exec.

cgf
-

From: Eric Wong
Date: Friday, February 24, 2006 - 5:02 am

It seems that ActiveState has more problems with pipes than it does with fork.
If it supports redirects reasonably well, this avoids pipes entirely and
may be more stable as a result (but possibly slower):

# IO::File is standard in Perl 5.x, new_tmpfile
# returns an open filehandle to an already unlinked file

use IO::File;
my $out = IO::File->new_tmpfile;
file
my $pid = fork;
defined $pid or die $!;
if (!$pid) {
	# redirects STDOUT to $out file
	open STDOUT, '>&', $out or die $!;
	exec('foo','bar');
}
waitpid $pid, 0;
seek $out, 0, 0;
while (<$out>) {
	...
}

Writing and reading from a tempfile are very fast for me in Linux, and probably
not much slower than pipes.  Of course I'm still assuming file descriptors stay
shared after a 'fork', which may be asking too much on Windows.  Using something
from File::Temp to get a temp filename would still work.

-- 
Eric Wong
-

From: Johannes Schindelin
Date: Friday, February 24, 2006 - 6:44 am

Hi,


Sorry, but no. Really no. Pipes have several advantages over temporary 
files:

- The second program can already work on the data before the first 
  finishes.
- Most simple temp file handling has security issues.
- You need write access.

Hth,
Dscho

-

From: Linus Torvalds
Date: Friday, February 24, 2006 - 9:14 am

This really is a _huge_ issue in general, although probably not a very 
big one in this case.

This is what I talked about when I said "streaming" data. Look at the 
difference between

	git whatchanged -s drivers/usb

and

	git log drivers/usb

in the kernel repo. They give almost the same output, but...

Notice how one starts _immediately_, while the other starts after a few 
seconds (or, if you have a slow machine, and an unpacked archive, after 
tens of seconds or longer).

And the reason is that "git log" uses "git-rev-list" with a path limiter, 
and currently that ends up having to walk basically the whole history in 
order to generate a minimal graph.

In contrast, "git-whatchanged" uses "git-diff-tree" to limit the output, 
and git-diff-tree doesn't care about "minimal graph" or crud like that: it 
just cares about discarding any local commits that aren't interesting. It 
doesn't need to worry about updating parent chains etc, so it can do it 
all incrementally - and can thus start output as soon as it gets anything 
at all.

Now, maybe you think that "a few seconds" isn't a big deal. Sure, it's 
actually fast as hell, considering what it is doing, and anybody should be 
really really impressed that we can do that at all.

But (a) it _is_ a huge deal. Responsiveness is really important. And 
worse: (b) it scales badly with repository size. Creating the whole 
data-set before starting to output it really doesn't scale.

Now, I have ways to make "git-rev-list" better. It doesn't really need to 
walk the _whole_ history for its path limiting before it can start 
outputting stuff: it really _could_ do things more incrementally. However, 
it's a real bitch sometimes to work with incremental data when you don't 
know everything, so it gets a lot more complicated. 

So my point isn't that "git log drivers/usb" will get less and less 
responsive over time. I can fix that - eventually. My point is that in 
order to make it more responsive, I need to make it ...
From: Eric Wong
Date: Tuesday, February 21, 2006 - 1:56 pm

Both "-|" and "|-" forms of open() use fork() internally.  Iirc, fork()
doesn't work too well on that platform.

-- 
Eric Wong
-

From: Alex Riesen
Date: Tuesday, February 21, 2006 - 3:04 pm

AFAICS, it does not exist. There is emulation of it in that active-perl,
though so this works:

    if ( !fork ) { something }

but not "too well" (you have to be carefule not spawn too many (which
is around 50) processes. Perl'll crash otherwise).

-

Previous thread: [PATCH] Add git-annotate, a tool for assigning blame. by Ryan Anderson on Monday, February 20, 2006 - 3:46 am. (13 messages)

Next thread: none