login
Header Space

 
 

Re: [RFC] Convert builin-mailinfo.c to use The Better String Library.

Previous thread: Re: [PATCH] Add post-merge hook. by Junio C Hamano on Tuesday, September 4, 2007 - 1:25 pm. (6 messages)

Next thread: [PATCH] Function for updating refs. by Carlos Rica on Tuesday, September 4, 2007 - 9:38 pm. (1 message)
To: Git Mailing List <git@...>
Cc: Junio C Hamano <junkio@...>
Date: Tuesday, September 4, 2007 - 4:50 pm

Hi.

This is an attempt to use "The Better String Library"[1] in builtin-mailinfo.c

The patch doesn't pass all the tests in the testsuit yet, but I thought I'd
send it out so people can decide if they like how the code looks.

I'm not sending a patch to add the library files at this time. I'll send
that patch when this patch is working.

The changes required to make it pass the tests shouldn't be very large.

/Lukas

[1] http://bstring.sourceforge.net/

---
 builtin-mailinfo.c |  795 ++++++++++++++++++++++++++--------------------------
 1 files changed, 392 insertions(+), 403 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index d7cb11d..2ddc15d 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -5,14 +5,14 @@
 #include "cache.h"
 #include "builtin.h"
 #include "utf8.h"
+#include "bstring/bstrlib.h"
 
 static FILE *cmitmsg, *patchfile, *fin, *fout;
 
 static int keep_subject;
-static const char *metainfo_charset;
-static char line[1000];
-static char name[1000];
-static char email[1000];
+static bstring metainfo_charset;
+static bstring name;
+static bstring email;
 
 static enum  {
 	TE_DONTCARE, TE_QP, TE_BASE64,
@@ -21,321 +21,291 @@ static enum  {
 	TYPE_TEXT, TYPE_OTHER,
 } message_type;
 
-static char charset[256];
+static bstring charset;
 static int patch_lines;
-static char **p_hdr_data, **s_hdr_data;
+static bstring *p_hdr_data, *s_hdr_data;
 
 #define MAX_HDR_PARSED 10
 #define MAX_BOUNDARIES 5
 
-static char *sanity_check(char *name, char *email)
+static bstring sanity_check(bstring name, bstring email)
 {
-	int len = strlen(name);
-	if (len &lt; 3 || len &gt; 60)
+	static struct tagbstring email_ind = bsStatic("&lt;@&gt;");
+	if (blength(name) &lt; 3 || blength(name) &gt; 60)
 		return email;
-	if (strchr(name, '@') || strchr(name, '&lt;') || strchr(name, '&gt;'))
+	if (binchr(name, 0, &amp;email_ind) != BSTR_ERR)
 		return email;
 	return name;
 }
 
-static int bogus_from(char *line)
+static int bogus...
To: Git Mailing List <git@...>
Cc: Junio C Hamano <junkio@...>
Date: Friday, September 7, 2007 - 6:47 am

Unfortunatley, I haven't had any time inte the last few days to code, nor read
mail. I'm assuming that there is no point in me finishing the patch and that git
will go with the strbuf solution?

/Lukas
-
To: Git Mailing List <git@...>
Date: Wednesday, September 5, 2007 - 11:27 am

On Tue, 2007-09-04 at 22:50 +0200, Lukas Sandstr
To: Lukas <lukass@...>
Cc: Git Mailing List <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, September 5, 2007 - 10:54 am

Please, no.  Let's not pull in a dependency for something as simple as a
string library.  How many distros have bstring pcakaged?  
The right version?  Does it work on Windows?  We already have strbuf.c,
lets just consolidate the string manipulation code already in git under
that interface.

Kristian

-
To: Kristian <krh@...>
Cc: Lukas <lukass@...>, Git Mailing List <git@...>, Junio C Hamano <junkio@...>
Date: Wednesday, September 5, 2007 - 1:29 pm

Kristian H
To: Matthieu Moy <Matthieu.Moy@...>
Cc: Git <git@...>
Date: Thursday, September 6, 2007 - 12:48 am

[ snip ]

When I first looked at Git source code two things struck me as odd:
1. Pure C as opposed to C++. No idea why. Please don't talk about 
portability, it's BS.
2. Brute-force, direct string manipulation. It's both verbose and 
error-prone. This makes it hard to follow high-level code logic.

- Dmitry

-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 1:50 pm

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot 
of substandard programmers use it, to the point where it's much much 
easier to generate total and utter crap with it. Quite frankly, even if 
the choice of C were to do *nothing* but keep the C++ programmers out, 
that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles 
Bader jokingly said "to piss you off", but it's actually true. I've come 
to the conclusion that any programmer that would prefer the project to be 
in C++ over C is likely a programmer that I really *would* prefer to piss 
off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using 
the "nice" library features of the language like STL and Boost and other 
total and utter crap, that may "help" you program, but causes:

 - infinite amounts of pain when they don't work (and anybody who tells me 
   that STL and especially Boost are stable and portable is just so full 
   of BS that it's not even funny)

 - inefficient abstracted programming models where two years down the road 
   you notice that some abstraction wasn't very efficient, but now all 
   your code depends on all the nice object models around it, and you 
   cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and 
portable C++ ends up to limit yourself to all the things that are 
basically available in C. And limiting your project to C means that people 
don't screw that up, and also means that you get a lot of programmers that 
do actually understand low-level issues and don't screw things up with any 
idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary 
objective, the "advantages" of C++ is just a huge mistake. The fact that 
we also piss off people who cannot see that is just a big...
To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 8:21 pm

As dinosaurs (who code exclusively in C) are becoming extinct, you
will soon find yourself alone with attitude like this.

Measuring number of people who contributed to Git is incorrect metric.
Obviously C++ developers can contribute C code. But assuming that they
prefer it that way is wrong.

I was coding in Assembly when there was no C.
Then in C before C++ was created.
Now days it's C++ and C#, and I have never looked back.
Bad developers will write bad code in any language. But penalizing
good developers for this illusive reason of repealing bad contributors
is nonsense.

Anyway I don't mean to start a religious C vs. C++ war. It's a matter
of beliefs and as such pointless.
I just wanted to get a sense of how many people share this "Git should
be in pure C" doctrine.
-- 
- Dmitry
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:21 am

Hi,



No, it's not.  As has been shown by some very good _arguments_.  Once you 
have facts to back up your claims, it is not any belief any longer.

Ciao,
Dscho
-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:32 pm

I honestly didn't. I didn't even think it's possible. In the
environment of mainstream commercial software development the last war
on this subj was over 8-10 years ago.
Even wars like "do we use exceptions/templates/stl" are pretty much
over. Now days it's "do we use Boost", or "do we use template
metaprogramming". But even more often it's Java/C# vs. C++.


Well I've heard *opinions* and anecdotal evidence. No facts though.
And it's not surprising. There could be no hard facts in such a
matter. It always boils down to "most of all, I want my software to be
X" where X is different for different people (fast,maintainable,quick
to market, scalable, beautiful, etc ... to name a few).
With different values of X any debate is pointless. And X is exactly
the matter of believes.

Anyway my curiosity is satisfied (thru the roof so to speak) and I
think it's enough on the subj. It has reminded me of good old times
though.

-- 
- Dmitry
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 7:25 pm

It is because the "environment of mainstream commercial software

Now that's a stupid argument to bring up. Commercial software

"Just to annoy mainstream commercial software developers" would be a
good reason.

-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 2:24 am

Anecdotal evidence _is_ hard facts.  That's what experience is all
about.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:47 am

As long as TeX, Emacs and vi are around, I would not worry too much
about dinosaurs in general.  But C++ is a cancerous dinosaur.  It has

The problem with C++ is that every C++ developer has his own style,
and reuse is an illusion within that style.  Take a look at classes
implementing matrix arithmetic: there are as many around as the day is
long, and all of them are incompatible with one another.

With regard to programming styles, C++ does not support multiple
inheritance.  For a single project grown from a single start, you can
get reasonable solutions.  But combining stuff is creating maintenance
messes.

With C, the situation is not dissimilar, but you spent less time

What nonsense.  Large parts of git already are shell scripts, so
obviously there is no such doctrine.  Just because C++ is not a sane
proposition does not mean that others might not work.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: <git@...>
Cc: David Kastrup <dak@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>
Date: Friday, September 7, 2007 - 3:41 am

On Friday 2007 September 07, David Kastrup wrote:

(Disclaimer: I'm certainly not joining the "C++ for git" chant; this reply is 

One could say the same about any API.  "Take a look at that C library libXYZ - 
it does exactly the same thing as libPQR but all the function calls and 

Multiple inheritance is the spawn of the devil, but C++ _does_ support it.

Forgetting about the terrible STL, to me there really is no difference between 
C and C++; you can be object oriented in C.  Take a look at the Linux kernel, 
it should be printed out, rolled up and used to beat the ideas into students 
learning C++/Java/C#.   Object oriented design is a choice, and if you really 
wanted you could do it in assembly.

I would imagine the reason people often turn up wanting to rewrite Linux and 
git in C++ is because they are so object oriented in nature already and it's 
natural to think "wouldn't this be even better if I wrote it in an object 
oriented language"?  Maybe, maybe not, but why bother?



Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
-
To: Andy Parkins <andyparkins@...>
Cc: <git@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>
Date: Friday, September 7, 2007 - 4:08 am

The difference is that you can pass structures from one library into
another with tolerable efficiency.  Because there are only basically 2

What about "With regard to programming styles" did you not understand?
I was not talking about a technical feature at class level, but about

Maintainability and extensibility certainly are valid arguments for
rewrites.  But C++ does not really shine in that regard.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 11:06 pm

El 7/9/2007, a las 2:21, Dmitry Kakurin escribi
To: <git@...>
Date: Friday, September 7, 2007 - 4:36 am

I can appreciate that. I originally got into writing compilers because 
my game (Empire) ran too slowly and I thought the existing compilers 
could be dramatically improved.

And technically, yes, you can write code in C that is &gt;= the speed of 
any other language (other than asm). But practically, this isn't 
necessarily so, for the following reasons:

1) You wind up having to implement the complex, dirty details of things 
yourself. The consequences of this are:

    a) you pick a simpler algorithm (which is likely less efficient - I 
run across bubble sorts all the time in code)

    b) once you implement, tune, and squeeze all the bugs out of those 
complex, dirty details, you're reluctant to change it. You're reluctant 
to try a different algorithm to see if it's faster. I've seen this 
effect a lot in my own code. (I translated a large body of my own C++ 
code that I'd spent months tuning to D, and quickly managed to get 
significantly more speed out of it, because it was much simpler to try 
out different algorithms/data structures.)

2) Garbage collection has an interesting and counterintuitive 
consequence. If you compare n malloc/free's with n gcnew/collections, 
the malloc/free will come out faster, and you conclude that gc is slow. 
But that misses one huge speed advantage of gc - you can do FAR fewer 
allocations! For example, I've done a lot of string manipulating 
programs in C. The basic problem is keeping track of who owns each 
string. This is done by, when in doubt, make a copy of the string.

But if you have gc, you don't worry about who owns the string. You just 
make another pointer to it. D takes this a step further with the concept 
of array slicing, where one creates windows on existing arrays, or 
windows on windows on windows, and no allocations are ever done. It's 
just pointer fiddling.

------
Walter Bright
http://www.digitalmars.com  C, C++, D programming language compilers
http://www.astoriaseminar.com  Extraordinary C++

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 7:52 am

El 7/9/2007, a las 10:36, Walter Bright escribi
To: <git@...>
Date: Friday, September 7, 2007 - 3:25 pm

That may very well be true. I've never looked at the source code for 
git, so I'm not in any position to judge it. Nor do I suggest 
translating a debugged, working, 80,000 line project into another language.

My comments here are in more general terms.

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 5:41 am

I haven't seen this in the development of git, although to be fair, you
didn't mention the number of developers that were simultaneously working
on your project. If it was you alone, I can imagine you were reluctant to
change it just to see if something is faster.

Opensource projects with many contributors (git, linux) work differently,
since one or a few among the plethora of authors will almost always be
a true expert at the problem being solved.

The current pack-format and how it's read is one such example. It was
done once, by the combined efforts of Linus and Junio (this is all off
the top of my head and I cba to go looking up the details, so bear with
me if there are errors). Linus and Junio are both very good C-programmers,
but the handling of packfiles was not what you'd call their specialty.
Along came Nicolas Pitre, another excellent C programmer, who probably
has done some similar work before. He constructed a better algorithm,
eventually resulting in the ultimate performance win with a net gain
in both time and size (gj, Nicolas).

The point is that, given enough developers, *someone* is bound to
find an algorithm that works so well that it's no longer worth
investing time to even discuss if anything else would work better,
either because it moves the performance bottleneck to somewhere else
(where further speedups would no longer produce humanly measurable
improvements), or because the action seems instantanous to the user
(further improvements simply aren't worth it, because no valuable
resource will be saved from it).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: <git@...>
Date: Friday, September 7, 2007 - 3:23 pm

On my project, one. But I've seen this problem repeatedly in other 
projects that had multiple developers. For example, I used to use 
version 1 of an assembler. It was itself written entirely in assembler. 
It ran *incredibly* slowly on large asm files. But it was written in 
assembler, which is very fast, so how could that be?

Turns out, the symbol table used internally was a linear one. A linear 
symbol table is easy to implement, but doesn't scale well at all. A 
linear symbol table was implemented because it was just harder to do 
more advanced symbol table algorithms in assembler. In this case, a 
higher level language re-implementation made the assembler much faster, 
even though that implementation was SLOWER in every detail. It was 

My point was that when I reimplemented it in D, the cost of changing the 
algorithms got much lower, so I was much more tempted to muck around 

That is a nice advantage. I don't think many projects can rely on having 

Sure, but I suggest that few projects reach this maxima. Case in point: 
ld, the gnu linker. It's terribly slow. To see how slow it is, compare 
it to optlink (the 15 years old one that comes with D for Windows). So I 
don't believe there is anything inherent about linking that should make 
ld so slow. There's some huge leverage possible in speeding up ld 
(spreading out that saved time among all the gnu developers).

So while git may have reached a maxima in performance, I don't think 
this principle is applicable in general, even for very widely used open 
source projects that would profit greatly from improved performance.

------
Walter Bright
http://www.digitalmars.com  C, C++, D programming language compilers
http://www.astoriaseminar.com  Extraordinary C++

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Saturday, September 8, 2007 - 8:25 pm

Well, when the ease-of-coding vs the exec-speed of D vs C is that of
C vs asm, C will be dead fairly soon. However, since C is so ingrained
in every language designer's head, I find that unlikely to happen any

True that. I know a fair few projects that could have done with borrowing
one or two proper gurus, but even opensource programmers are selfish in

True again, but given what I said above holds, it would be madness to
move from the lingua franca of oss hacking to a less common one, as it

Interesting. I recently did a spot of work comparing various string-hashing
algorithms. Perhaps I should head over to the ld camp and see if I can help.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: <git@...>
Date: Monday, September 10, 2007 - 6:33 am

You can write "C" in D, and you'll get exactly the same (performance) 
results. After all, D and C share the same optimizer and code generator 
(for both implementations of D), and when the same intermediate code is 
presented, you'll get the same results.

That's why when people benchmark D against C, they deliberately do NOT 
write the D version in a C'ish manner, but refactor the code into what 
would be a more D'ish style.


I humbly suggest running a profiler over ld before spending time fixing 
the wrong thing &lt;g&gt;. I haven't looked at the ld source, but being 
experienced in similar projects I'd hazard a guess that there won't be a 
quick fix to ld's speed problems.

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Tuesday, September 11, 2007 - 4:17 am

I'm fairly sure of the same, but even small speedups in such a commonly used
tool are worth pursuing.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Sunday, September 9, 2007 - 4:22 am

Well, my good wishes go with you!  If ld.so would be affected as well,
you'd probably not help just developers.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:40 pm

Well, my first system was a Z80 computer with an editor/assembler in
ROM (4kb).  At one time I tried figuring out the size requirements of
symbols.  It was two bytes for each symbol.  Namely the value.  The
"symbol table" was located behind the source code.  Whenever this
marvel of technology encountered a label, it searched the source code
from the beginning for the definition of the label, keeping count of
all label definitions in between.  When it found the definition, the
count corresponded to the position in the symbol table.

So compilation times were O(ns), with n the number of symbol uses and
s the size of the source code.

Implementing in a higher language would not have helped: memory
efficiency was what dictated this layout.  Given that the whole
available memory was perhaps 50kB, assembly language modules could not
get so large that scale issues were deadly.  But the assembly times
did get annoying sometimes.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Wincent Colaiuta <win@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:25 am

Wincent Colaiuta wrote:
&gt; El 7/9/2007, a las 2:21, Dmitry Kakurin escribi
To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 7:30 am

El 7/9/2007, a las 8:25, Andreas Ericsson escribi
To: Andreas Ericsson <ae@...>
Cc: Wincent Colaiuta <win@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:56 am

Hi,


I have a buck here that says that you cannot hand-optimise assembly (on 
modern processors at least) as good as even gcc.

Ciao,
Dscho

-
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Andreas Ericsson <ae@...>, Wincent Colaiuta <win@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 12:09 pm

That assumes that the original task can even expressed well in C.
Multiple precision arithmetic, for example, requires access to the
carry bit.  You can code around this, for example by writing something
like

unsigned a,b,carry;

[...]

carry = (a+b) &lt; a;

but the problem is that those are ad-hoc idioms with a variety of
possibilities, and thus the compilers are not made to recognize them.
Another thing is mixed-precision multiplications and divisions: those
are _natural_ operations on a normal CPU, but have no representation
in assembly language.

As a consequence, most high performance multiple-precision packages
contain assembly language in some form or other.

gcc's assembly language template are excellent in that they actually
cooperate nicely with the optimizer, so the optimizer can do all the
address calculations and register assignments and opcode reorderings,
and then the actual operations that are not expressible in C can be
done by the programmer.

But anyway, I have worked as a graphics driver programmer for some
amount of time, and bit-stuffing memory-mapped areas with data was
still something where hand assembly was best.

I have also done BIOS terminal emulators, and being able to write
something like

ld b,whatever
myloop:
push bc
push hl
call nextchar
pop hl
pop bc
ld (hl),a
inc hl
djnz myloop

in order to suspend the terminal driver until the application comes up
with the next `whatever' output characters in an escape sequence is
_wagonloads_ more maintainable than using a state machine or whatever
else for distributing material delivered into the driver.

But this requires that nextchar can do something like
nextchar: ld (driverstack),sp
  ld sp,(appstack)
  ret

and the entrypoint, in contrast, does

outchar: ld (appstack),sp
  ld sp,(driverstack)
  ret

Cheap and expedient.  You just need to set up a small stack, and
presto: coroutines, at absolutely negligible cost.  I know that there
are some "portable" coroutine implementa...
To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Wincent Colaiuta <win@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 7:54 am

http://www.gelato.unsw.edu.au/archives/git/0504/1746.html

I win. Donate $1 to FSF next time you get the opportunity ;-)

Hand-optimized asm is faster because the optimizer in the compiler is a
general-purpose one that has to guess and make assumptions about the code
and its input to make the correct decisions. While it gets things right
in as many as 80% of the cases, there's still the 20% where it doesn't.
A human can, with sufficient research and effort, make the same optimizations
where they are correct but avoid the 20% erroneous ones.

If the compiler gets it wrong inside your innermost loop, it might be worth
shaving those extra 0.0001 seconds off of each iteration, because in the long
run, world-wide, it might save several weeks worth of CPU-time every day.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:33 am

El 7/9/2007, a las 13:54, Andreas Ericsson escribi
To: Wincent Colaiuta <win@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>, Linus Torvalds <torvalds@...>
Date: Friday, September 7, 2007 - 9:58 am

Wincent Colaiuta wrote:
&gt; El 7/9/2007, a las 13:54, Andreas Ericsson escribi
To: Andreas Ericsson <ae@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>, Linus Torvalds <torvalds@...>
Date: Friday, September 7, 2007 - 10:13 am

El 7/9/2007, a las 15:58, Andreas Ericsson escribi
To: Wincent Colaiuta <win@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>, Linus Torvalds <torvalds@...>
Date: Saturday, September 8, 2007 - 8:09 pm

Wincent Colaiuta wrote:
&gt; El 7/9/2007, a las 15:58, Andreas Ericsson escribi
To: Wincent Colaiuta <win@...>
Cc: Andreas Ericsson <ae@...>, Johannes Schindelin <Johannes.Schindelin@...>, Dmitry Kakurin <dmitry.kakurin@...>, Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:55 am

And this is of course exactly the kind of spot where you _would_ use
assembly in the real world. 99.99% of code is better written in C than
assembler, but there is that 0.01% where hand-coded assembler is a
better choice.

-- 
Karl Hasselstr
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 8:38 pm

Unlike you, I actually gave reasons for my dislike of C++, and pointed to 
examples of the kinds of failures that it leads to.

You, on the other hand, have given no sane reasons *for* using C++.

The fact is, git is better than the other SCM's. And good taste (and C) is 
one of the reasons for that.

It has nothing to do with dinosaurs. Good taste doesn't go out of style, 
and comparing C to assembler just shows that you don't have a friggin idea 
about what you're talking about.

			Linus
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:12 pm

To be very specific:
 - simple and clear core datastructures, with *very* lean and aggressive 
   code to manage them that takes the whole approach of "simplicity over 
   fancy" to the extreme.
 - a willingness to not abstract away the data structures and algorithms, 
   because those are the *whole*point* of core git. 

And if you want a fancier language, C++ is absolutely the worst one to 
choose. If you want real high-level, pick one that has true high-level 
features like garbage collection or a good system integration, rather than 
something that lacks both the sparseness and straightforwardness of C, 
*and* doesn't even have the high-level bindings to important concepts. 

IOW, C++ is in that inconvenient spot where it doesn't help make things 
simple enough to be truly usable for prototyping or simple GUI 
programming, and yet isn't the lean system programming language that C is 
that actively encourags you to use simple and direct constructs.

				Linus
-
To: <git@...>
Date: Friday, September 7, 2007 - 1:09 am

The D programming language is a different take than C++ has on growing 
C. I'm curious what your thoughts on that are (D has garbage collection, 
while still retaining the ability to directly manage memory). Can you 
enumerate what you feel are the important concepts?

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 5:41 am

Well, to me D has two significant drawbacks to be "ready to use". The
first one is that it doesn't has bit-fields. I often deal with bit-fields
on structures that have a _lot_ of instances in my program, and the
bit-field is chosen for code readability _and_ structure size efficiency.
I know you pretend that using masks manually often generates better
code. But in my case, speed does not matter _that_ much. I mean it does,
but not that this micro-level as access to the bit-field is not my
inner-loop.

  The other second issue I have, is that there is no way to do:
  import (C) "foo.h"

  And this is a big no-go (maybe not for git, but as a general issue)
because it impedes the use of external libraries with a C interface a
_lot_. E.g. I'd really like to use it to use some GNU libc extensions,
but I can't because it has too many dependencies (some async getaddrinfo
interface, that need me to import all the signal events and so on
extensions in the libc, with bitfields, wich send us back to the first
point).


  I also have a third, but non critical issue, I absolutely don't like
phobos :) Though I'm obviously free to chose another library. D has
definitely many many many real advances over C (like the .init, .size,
=2E.. and so on fields, known types, and whatever portability nightmare
the C impose us). In fact I like to use D like I code in C, using
modules and functions, and very few classes, as few as I can. And even
(under- ?) using D like this, it is a real pleasure to work with. I'm
really eager to see gdc be more stable.

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
To: <git@...>
Date: Friday, September 7, 2007 - 3:03 pm

I'm surprised this is such an important issue. Others have mentioned it, 
but regard it as a minor thing. Interestingly, the htod program (which 
converts C .h files to D import files) will convert bit fields to inline 

D does come with htod, which converts C .h files to D files. It's not 
possible to do a perfect job (because of macros), but it comes pretty 
darned close. The reason htod gets so close is because it is actually a 
real C compiler front end, not a perl or regex string processing hack.

Because it (may) require a little hand tweaking of the results (again, 
because C headers may include awful things like:
	#define BEGIN {
	#define print printf(

You're not the only one &lt;g&gt;. But I'll add that access to the standard C 
runtime library *is* a part of D, so at some level it can't be worse 
than C. There's also another runtime library available, Tango, which is 

There are a lot of people hard at work on D to make it more stable and 
increase the breadth and depth of tools available. I am fully aware that 
there may be non-technical issues to using D in a project like git, like 
availability of other D programmers, tradition, etc., but in this thread 
I'm concerned mainly with technical issues.

P.S. I'm also NOT suggesting that git be converted to D. Translating a 
working, debugged, 80,000 line codebase from one language to another is 
usually a fool's errand.

Thanks for taking the time to post your thoughts.

-----------
Walter Bright
http://www.digitalmars.com  C, C++, D programming language compilers
http://www.astoriaseminar.com  Extraordinary C++

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:41 pm

Well htod does that, but it's very impractical to write them from
scratch. Especially if you want to benefit from the fact that padding
and integer sizes are very well defined to map e.g. structs onto a raw
stream, avoiding deserialization and so on. And for that bit-fields are
a really really fast and simple way to describe things.

  I mean, take your classical example of the foreach loop. Your whole
point is that it's way shorter, and safer. And now you are saying that
people should instead of sth like:

  struct my_struct {
    unsigned some_field : 2;
    unsigned has_this_property : 1;
    unsigned is_in_this_state  : 1;
    unsigned priority_level    : 2;
    ...
  }

  people should write (IIRC it works since -&gt;some_field =3D 2 calls
-&gt;some_field(2) if the member does not exists, or maybe it's
set_some_field, it's not very relevant anyway):

  struct my_struct {
    unsigned some_field() {
      return this-&gt;real_field &gt;&gt; 30;
    }

    void some_field(unsigned value) {
      this-&gt;real_field |=3D (value &amp; 3) &lt;&lt; 30;
    }

    ...

  private:
    unsigned real_field;
  }

  Please it has to be a joke: there is 42 ways for people to write it
wrong (wrong shifts, wrong masks, and so on), it's horribly obfuscated,
hence needs a lot of comments, whereas the bitfield is 90% self
documented, and the syntax is _very_ clear, you cannot beat that. I
would be absolutely fine with it being syntactical sugar for some kind
of template call though.

  Not to mention that the usual C idiom:

  union {
    unsigned flags;
    struct {
      // many bitfields
    };
  };

  Would need an explicit copy_flags(const my_struct foo) function to
work. Not pretty, not straightforward.

  Really, I feel this is a big lack, for a language that aims at
simplicity, conciseness _and_ correctness.

  OK, maybe I'm biased, I work with networks protocols all day long, so
I often need bitfields, but still, a lot of people deal with network

  La...
To: <git@...>
Date: Friday, September 7, 2007 - 4:40 pm

True. I haven't tried yet (nobody else seems to care about it as much as 

I should point out that inline functions are inlined, and there is no 

I'm not following this. To copy a union, you just copy it with the 
assignment operator:

	U a, b;

You're right on both counts. It's because htod is built out of a fork of 
the Digital Mars C compiler. Something similar could be done with gcc, 
but I'm not the person to do it. I should also get off my lazy tail and 

GDC was just released for D 1.020, which is behind D 1.021, but 1.021 

And it's nice to hear your perspective, which is why I dropped by this 
thread.

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 4:56 pm

I know that, and that's why I said I was totally fine with the
bitfield notation to be only syntactic sugar on a template thingy if

  That was the point indeed. But if you don't have bitfields, you can't
do the union. And if the bitfield is just syntactic sugar, it may be

  Sure, but it does not works on amd64 properly (and it's the
architecture I care about) and is not ready for the current gcc (4.2,
only 4.1 builds) and so on. It's not as stable as DMD is. It does not
lags too much version-wise, it lags in maturity. But well, youth has a
cure: time :)

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
To: <git@...>
Date: Friday, September 7, 2007 - 6:54 pm

Yes, and the more people use it, the better it will get. These are all 
environmental problems, not technical limitations of the language.

-
To: Pierre Habouzit <madcoder@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 3:51 pm

Pierre Habouzit &lt;madcoder@debian.org&gt; writes:


And strictly speaking, C bitfields are completely useless for that
purpose since the compiler is free to use whatever method he wants for
allocating bit fields.  So if you want to write a portable program,
you are back to making the masks yourself.

Where bit fields work reliably is when you are not interchanging data
with other applications, but just laying out your internals.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: David Kastrup <dak@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 3:59 pm

The point is (1) D is not C, (2) we all know that linux e.g. does that
in many places using the fact that it knows how the supported compilers
(gcc icc tcc maybe some other) do their packing.

  The discussion is about D. D solves the infamous problem with longs
not having the same size everywhere, I don't see why it couldn't solve

  Thank you for the _C_ lesson.

--=20
=C2=B7O=C2=B7  Pierre Habouzit
=C2=B7=C2=B7O                                                madcoder@debia=
n.org
OOO                                                http://www.madism.org
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:31 pm

In my opinion there is basically one area which C has botched up
seriously in order to be useful as a general purpose language, and
that is conflating pointers and arrays, and allowing pointer
arithmetic.  The consequences are absolutely awful with regard to
compilers being able to optimize, and it is pretty much the primary
reason that Fortran is still quite in use for numerical work.

C has no usable two-dimensional (never mind higher dimensions) array
concept that would allow passing multidimensional arrays of
runtime-determined size into functions.  Period.

Add to that the pointer aliasing problems affecting compilers, and C
is useless for serious portable readable numerical work.

Fortran libraries like blas and lapack are ubiquitous after decades
because the language can deal with multiple-dimension arrays sensibly,
and could do so in the sixties already.

C99 helps a bit.  But messing around with restrict pointers and
similar means that to wring equal performance out of some trivial code
piece (or permitting the compiler to do so without having to take
aliasing into account) is a lot of work and leads to ugly and
inscrutable code.

That's the one thing that has seriously hampered C: the lack of a true
array type on its own, decoupled from pointers.  It does not need to
carry its dimensions with it or other
hide-the-implementation-from-the-programmer niceties: C is, after all,
a low-level language, and Fortran did not suffer from not having array
dimensions packed into the arrays as well.

But that's water down the drawbridge.  This single major deficiency is
not anything that would hamper git development.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: <git@...>
Date: Friday, September 7, 2007 - 4:49 pm

I agree. It's one of those things that probably sounded like a good idea 
at the time. The consequences were not foreseen. All languages have a 
few of these (C++ has the infamous use of &lt; &gt; for template arguments).

-
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:40 am

A design is perfect not when there is no longer anything you can add
to it, but if there is no longer anything you can take away.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: David Kastrup <dak@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 7:36 am

El 7/9/2007, a las 9:40, David Kastrup escribi
To: <git@...>
Date: Friday, September 7, 2007 - 4:15 am

I like to phrase that a slightly different way: anyone can make 
something complicated, but it takes genius to make something simple.

A very big goal for D is to make what should be simple code, simple. It 
turns out that what's simple for a computer is complex for a human. So 
to design a language that is simple for programmers is (unfortunately) a 
rather complex problem. Or perhaps I'm just not smart enough &lt;g&gt;.

A canonical example is that of a loop. Consider a simple C loop over an 
array:

void foo(int array[10])
{
     for (int i = 0; i &lt; 10; i++)
     {   int value = array[i];
         ... do something ...
     }
}

It's simple, but it has a lot of problems:

1) i should be size_t, not int
2) array is not checked for overflow
3) 10 may not be the actual array dimension
4) may be more efficient to step through the array with pointers, rather 
than indices
5) type of array may change, but the type of value may not get updated
6) crashes if array is NULL
7) only works with arrays and pointers

Since this thread is talking about C++, let's look at the C++ version:

void foo(std::vector&lt;int&gt; array)
{
   for (std::vector&lt;int&gt;::const_iterator
        i = array.begin();
        i != array.end();
        i++)
   {
     int value = *i;
     ... do something ...
   }
}

It has fewer latent bugs, but still:

1) type of array may change, but the type of value may not get updated
2) too darned much typing
3) it's more complicated, not simpler

Frankly, I don't want to write loops that way. I want to write them like 
this:

void foo(int[] array)
{
   foreach (value; array)
   {
     ... do something ...
   }
}

As a programmer, I'm specifying exactly what I want to happen without 
much extra puffery. It's less typing, simpler, and more resistant to bugs.

1) correct loop index type is selected based on the type of array
2) arrays carry with them their dimension, so foreach is guaranteed to 
step through the loop the correct number ...
To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 4:26 am

Wrong.  size_t is for holding the size of memory objects in bytes, not
in terms of indices.  For indices, the best variable is of the same
type as the declared index maximum size, so here it is typeof(10),



No.  It is a beginners' and advanced users' mistake to think using
pointers for access is a good idea.  Trivial optimizations are what a
compiler is best at, not the user.  Using pointer manipulation will
more often than not break loop unrolling, loop reversal, strength





Most of those are toy concerns.  They prevent problems that don't
actually occur much in practice.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: <git@...>
Date: Friday, September 7, 2007 - 5:14 am

The easiest way to show the error is consider the code being ported to a 
typical 64 bit C compiler. int's are still 32 bits, yet the array can be 
larger than 32 bits. You're right in that what we want to be able to do 
is typeof(array dimension), but there is no way to do that automatically 
in C, which is my point. If the array dimension changes, you have to 
carefully check to make sure every loop dependency on the type is 
updated, too.

size_t will always work, however, making it a better choice than int, at 

Because the 10 array dimension is not statically checked in C. I could 
pass it a pointer to 3 ints without the compiler complaining. This makes 
it a potential maintenance problem. Also, the maintenance programmer may 
change the array dimension in the function signature, but overlook 

Array buffer overflow errors are commonplace in C, because array 
dimensions are not automatically checked at either compile or run time. 
This is an expensive problem. Some C APIs try to deal with this by 
passing a second argument for arrays giving the dimension (snprintf, for 
example), but this tends to be sporadic, not conventional. It being 

C compilers vary widely in the optimizations they'll do for simple 
loops. I see often enough attempts by programmers to take such matters 
into their own hands. I agree with you on that - and suggest the 

Let's say our fearless maintenance programmer decides to make it an 
array of longs, not an array of ints. He overlooks changing the type of 
value in the loop. Suddenly, things subtly break because of overflows. 
Or maybe he changed the int to an unsigned, now the divides in the loop 
give different answers. Etc. There really isn't any compiler/language 

I consider an array that is NULL to have no members, so instead of 

C has structs, too, as well as more complicated user defined 
collections. Essentially, you cannot (simply) write generic algorithms 
in C, because you cannot (simply) generically express iteration. Of 
course, yo...
To: Linus Torvalds <torvalds@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:40 pm

Not to mention try finding two C++ compilers that support the same 
language features.  C is a known quantity. C++ depends on whos compiler 
you use and what class libraries you use.  Trying to make those things 
work crossplatform is not an easy task.  (Harder than it is in C at 
least.)

A number of years ago, a programmer who will not be named (and is not me), 
tried to port Perl to C++.  It was a disaster.  He found that every 
compiler handled something differently.

If you stuck to one compiler, it might work.  But trying to get GCC to 
work like MS C++ or Borland C++ or whatever is just asking for pain.

-- 
Refrigerator Rule #1: If you don't remember when you bought it, Don't eat it.
-
To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:08 pm

As I said, it's a matter of believes. As such, any reasoning and
arguing will be endless and pointless, as for any other religious

I'll give you reasons why to use C++ for Git (not why C++ is better
for any project in general, as that again would be pointless):

1. Good String class will make code much more readable (and
significantly shorter)
2. Good Buffer class - same reason
3. Smart pointers and smart handles to manage memory and
file/socket/lock handles.

As it is right now, it's too hard to see the high-level logic thru

IMHO Git has a brilliant high-level design (object database, using
hashes, simple and accessible storage for data and metadata). Kudos to
you!
The implementation: a mixture of C and shell scripts, command line

I don't see myself comparing assembler to C anywhere.
I was pointing out that I've been programming in different languages
(many more actually) and observed bad developers writing bad code in
all of them. So this quality "bad developer" is actually
language-agnostic :-).
-- 
- Dmitry
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:50 am

But all of those are incompatible with another and require major
headaches and/or interface code to get to run with one another.  And
then might use different interface styles, anyway.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:27 pm

Total BS. The string/memory management is not at all relevant. Look at the 

The only really important part is the *design*. The fact that some of it 
is in a "prototyping language" is exactly because it wasn't the core 
parts, and it's slowly getting replaced. C++ would in *no* way have been 
able to replace the shell scripts or perl parts.


You made a very clear "assembler -&gt; C -&gt; C++/C#" progression nin your 
life, comparing my staying with C as a "dinosaur", as if it was some 
inescapable evolution towards a better/more modern language.

With zero basis for it, since in many ways C is much superior to C++ (and 
even more so C#) in both its portability and in its availability of 

You can write bad code in any language. However, some languages, and 
especially some *mental* baggages that go with them are bad.

The very fact that you come in as a newbie, point to some absolutely 
*trivial* patches, and use that as an argument for a language that the 
original author doesn't like, is a sign of you being a person who should 
be disabused on any idiotic notions as soon as possible.

The things that actually *matter* for core git code is things like writing 
your own object allocator to make the footprint be as small as possible in 
order to be able to keep track of object flags for a million objects 
efficiently. It's writing a parser for the tree objects that is basically 
fairly optimal, because there *is* no abstraction. Absolutely all of it is 
at the raw memory byte level.

Can those kinds of things be written in other languages than C? Sure. But 
they can *not* be written by people who think the "high-level" 
capabilities of C++ string handling somehow matter.

The fact is, that is *exactly* the kinds of things that C excels at. Not 
just as a language, but as a required *mentality*. One of the great 
strengths of C is that it doesn't make you think of your program as 
anything high-level. It's what makes you apparently prefer other 
languages, but the thing...
To: Linus Torvalds <torvalds@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:26 am

Hi,


There is an important additional point: a language like C _holds_ you to a 
certain degree of diligence.

In my day-job I have to code in other languages, which make it "easy" to 
code.  As a result, the code I have to work with is sloppy, ugly and 
buggy.  By applying the same principles I am _forced_ to use in C, with 
Git, I produce better code.

Ciao,
Dscho

-
To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 11:09 pm

Not only have I looked at the code, I've also debugged it quite a bit.
Granted most of my problems had to do with handling paths on Windows
(i.e. string manipulations).

... and explain where I'm coming from:
My goal is to *use* Git. When something does not work *for me* I want
to be able to fix it (and contribute the fix) in *shortest time
possible* and with *minimal efforts*. As for me it's a diversion from
my main activities.
The fact that Git is written in C does not really contribute to that goal.
Suggestion to use C++ is the only alternative with existing C codebase.
So while C++ may not be the best choice "academically speaking" it's
pretty much the only practical choice.

"Democracy is the worst form of government except for all those others
that have been tried." - Winston Churchill

Now, I realize that I'm a very infrequent contributor to Git, but I
want my opinion to be heard.
People who carry the main weight of developing and maintaining Git
should make the call.
-- 
- Dmitry
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:28 am

Hi,


We are a happy little meritocracy here.  Once you proved that you're not 
full of shit (some seem to try the opposite, you know who you are), you 
can go all caps.  Before that, you'll have to show that you earn to be 
heard first.

Ciao,
Dscho


-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:52 am

Sorry, but for fixing things in C, I can look and work locally.  For
fixing things in C++, I first need to understand the class
hierarchies used in the project.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:31 am

Coupled with what you said in an earlier mail, namely
---%&lt;---%&lt;---

Considering C appeared in 1972, and C++ appeared in 1985, you have been
writing C code for 13 years. And you're telling me that git being written
in C prevents you from contributing?

If you want to do something useful in C++ for git, make it easy for C++

They already have, but every now and then someone comes along and suggest
a complete rewrite in some other language. So far we've had Java (there's
always one...), Python and now C++.

It happens to all projects, sooner or later. The funny thing is that all those
people that want their favourite software to be rewritten in their favourite
programming language always wants someone else to rewrite it for them.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To: Andreas Ericsson <ae@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:17 pm

Since this "complete rewrite" was mentioned in multiple emails I'd
like to rectify that:
What I'm offering (for Git) is to use C++ as a "better C".
Don't change any existing *working* code, but start introducing simple
C++ constructs in the new code.
Git is simple enough to not require any high-level abstractions. But
some utility classes could make code much simpler.

And BTW, I don't even like C++ that much :-), I just like it much
better than C.  I've been saying that C++ is a legacy language for
quite some time now. But we will use it for many years to come because
the size of this legacy code is huge, so there will be plenty of C++
developers available (to contribute to Git :-).
And C++ is the only way to move with existing C codebase.
-- 
- Dmitry
-
To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Linus Torvalds <torvalds@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 8:29 pm

There are far too many highly valuable contributors that have spoken
against C++ for me to believe that C++ and C will ever co-exist in the
official git repo. Good thing utility classes can be developed on top
of the existing C-code, but in a separate repo, and packed into a
library. That way, you get some hacking ground for your beloved C++
coderswhile the current git contributors can keep contributing in the

The C code base is a lot larger and C++ will drop dead pretty fast if it's

Complete and utter BS. It can also stay in C, or get language bindings for
Python/Perl/PHP/LUA(?)/whatever, or both.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-