[RFC] Convert builin-mailinfo.c to use The Better String Library.

Previous thread: Re: [PATCH] Add post-merge hook. by Junio C Hamano on Tuesday, September 4, 2007 - 1:25 pm. (6 messages)

Next thread: [PATCH] Function for updating refs. by Carlos Rica on Tuesday, September 4, 2007 - 9:38 pm. (1 message)
To: Git Mailing List <git@...>
Cc: Junio C Hamano <junkio@...>
Date: Tuesday, September 4, 2007 - 4:50 pm

Hi.

This is an attempt to use "The Better String Library"[1] in builtin-mailinfo.c

The patch doesn't pass all the tests in the testsuit yet, but I thought I'd
send it out so people can decide if they like how the code looks.

I'm not sending a patch to add the library files at this time. I'll send
that patch when this patch is working.

The changes required to make it pass the tests shouldn't be very large.

/Lukas

[1] http://bstring.sourceforge.net/

---
builtin-mailinfo.c | 795 ++++++++++++++++++++++++++--------------------------
1 files changed, 392 insertions(+), 403 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index d7cb11d..2ddc15d 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -5,14 +5,14 @@
#include "cache.h"
#include "builtin.h"
#include "utf8.h"
+#include "bstring/bstrlib.h"

static FILE *cmitmsg, *patchfile, *fin, *fout;

static int keep_subject;
-static const char *metainfo_charset;
-static char line[1000];
-static char name[1000];
-static char email[1000];
+static bstring metainfo_charset;
+static bstring name;
+static bstring email;

static enum {
TE_DONTCARE, TE_QP, TE_BASE64,
@@ -21,321 +21,291 @@ static enum {
TYPE_TEXT, TYPE_OTHER,
} message_type;

-static char charset[256];
+static bstring charset;
static int patch_lines;
-static char **p_hdr_data, **s_hdr_data;
+static bstring *p_hdr_data, *s_hdr_data;

#define MAX_HDR_PARSED 10
#define MAX_BOUNDARIES 5

-static char *sanity_check(char *name, char *email)
+static bstring sanity_check(bstring name, bstring email)
{
- int len = strlen(name);
- if (len < 3 || len > 60)
+ static struct tagbstring email_ind = bsStatic("<@>");
+ if (blength(name) < 3 || blength(name) > 60)
return email;
- if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
+ if (binchr(name, 0, &email_ind) != BSTR_ERR)
return email;
return name;
}

-static int bogus_from(char *line)
+static int bogus...

To: Git Mailing List <git@...>
Cc: Junio C Hamano <junkio@...>
Date: Friday, September 7, 2007 - 6:47 am

Unfortunatley, I haven't had any time inte the last few days to code, nor read
mail. I'm assuming that there is no point in me finishing the patch and that git
will go with the strbuf solution?

/Lukas
-

To: Git Mailing List <git@...>
Date: Wednesday, September 5, 2007 - 11:27 am

On Tue, 2007-09-04 at 22:50 +0200, Lukas Sandstr

To: Lukas <lukass@...>
Cc: Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Wednesday, September 5, 2007 - 10:54 am

Please, no. Let's not pull in a dependency for something as simple as a
string library. How many distros have bstring pcakaged?
The right version? Does it work on Windows? We already have strbuf.c,
lets just consolidate the string manipulation code already in git under
that interface.

Kristian

-

To: Kristian <krh@...>
Cc: Junio C Hamano <junkio@...>, Lukas <lukass@...>, Git Mailing List <git@...>
Date: Wednesday, September 5, 2007 - 1:29 pm

Kristian H

To: Matthieu Moy <Matthieu.Moy@...>
Cc: Git <git@...>
Date: Thursday, September 6, 2007 - 12:48 am

[ snip ]

When I first looked at Git source code two things struck me as odd:
1. Pure C as opposed to C++. No idea why. Please don't talk about
portability, it's BS.
2. Brute-force, direct string manipulation. It's both verbose and
error-prone. This makes it hard to follow high-level code logic.

- Dmitry

-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 1:50 pm

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot
of substandard programmers use it, to the point where it's much much
easier to generate total and utter crap with it. Quite frankly, even if
the choice of C were to do *nothing* but keep the C++ programmers out,
that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles
Bader jokingly said "to piss you off", but it's actually true. I've come
to the conclusion that any programmer that would prefer the project to be
in C++ over C is likely a programmer that I really *would* prefer to piss
off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using
the "nice" library features of the language like STL and Boost and other
total and utter crap, that may "help" you program, but causes:

- infinite amounts of pain when they don't work (and anybody who tells me
that STL and especially Boost are stable and portable is just so full
of BS that it's not even funny)

- inefficient abstracted programming models where two years down the road
you notice that some abstraction wasn't very efficient, but now all
your code depends on all the nice object models around it, and you
cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and
portable C++ ends up to limit yourself to all the things that are
basically available in C. And limiting your project to C means that people
don't screw that up, and also means that you get a lot of programmers that
do actually understand low-level issues and don't screw things up with any
idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary
objective, the "advantages" of C++ is just a huge mistake. The fact that
we also piss off people who cannot see that is just a big...

To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 8:21 pm

As dinosaurs (who code exclusively in C) are becoming extinct, you
will soon find yourself alone with attitude like this.

Measuring number of people who contributed to Git is incorrect metric.
Obviously C++ developers can contribute C code. But assuming that they
prefer it that way is wrong.

I was coding in Assembly when there was no C.
Then in C before C++ was created.
Now days it's C++ and C#, and I have never looked back.
Bad developers will write bad code in any language. But penalizing
good developers for this illusive reason of repealing bad contributors
is nonsense.

Anyway I don't mean to start a religious C vs. C++ war. It's a matter
of beliefs and as such pointless.
I just wanted to get a sense of how many people share this "Git should
be in pure C" doctrine.
--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:21 am

Hi,

No, it's not. As has been shown by some very good _arguments_. Once you
have facts to back up your claims, it is not any belief any longer.

Ciao,
Dscho
-

To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:32 pm

I honestly didn't. I didn't even think it's possible. In the
environment of mainstream commercial software development the last war
on this subj was over 8-10 years ago.
Even wars like "do we use exceptions/templates/stl" are pretty much
over. Now days it's "do we use Boost", or "do we use template
metaprogramming". But even more often it's Java/C# vs. C++.

Well I've heard *opinions* and anecdotal evidence. No facts though.
And it's not surprising. There could be no hard facts in such a
matter. It always boils down to "most of all, I want my software to be
X" where X is different for different people (fast,maintainable,quick
to market, scalable, beautiful, etc ... to name a few).
With different values of X any debate is pointless. And X is exactly
the matter of believes.

Anyway my curiosity is satisfied (thru the roof so to speak) and I
think it's enough on the subj. It has reminded me of good old times
though.

--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 7:25 pm

It is because the "environment of mainstream commercial software

Now that's a stupid argument to bring up. Commercial software

"Just to annoy mainstream commercial software developers" would be a
good reason.

-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 2:24 am

Anecdotal evidence _is_ hard facts. That's what experience is all
about.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:47 am

As long as TeX, Emacs and vi are around, I would not worry too much
about dinosaurs in general. But C++ is a cancerous dinosaur. It has

The problem with C++ is that every C++ developer has his own style,
and reuse is an illusion within that style. Take a look at classes
implementing matrix arithmetic: there are as many around as the day is
long, and all of them are incompatible with one another.

With regard to programming styles, C++ does not support multiple
inheritance. For a single project grown from a single start, you can
get reasonable solutions. But combining stuff is creating maintenance
messes.

With C, the situation is not dissimilar, but you spent less time

What nonsense. Large parts of git already are shell scripts, so
obviously there is no such doctrine. Just because C++ is not a sane
proposition does not mean that others might not work.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: <git@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, David Kastrup <dak@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>
Date: Friday, September 7, 2007 - 3:41 am

On Friday 2007 September 07, David Kastrup wrote:

(Disclaimer: I'm certainly not joining the "C++ for git" chant; this reply is

One could say the same about any API. "Take a look at that C library libXYZ -
it does exactly the same thing as libPQR but all the function calls and

Multiple inheritance is the spawn of the devil, but C++ _does_ support it.

Forgetting about the terrible STL, to me there really is no difference between
C and C++; you can be object oriented in C. Take a look at the Linux kernel,
it should be printed out, rolled up and used to beat the ideas into students
learning C++/Java/C#. Object oriented design is a choice, and if you really
wanted you could do it in assembly.

I would imagine the reason people often turn up wanting to rewrite Linux and
git in C++ is because they are so object oriented in nature already and it's
natural to think "wouldn't this be even better if I wrote it in an object
oriented language"? Maybe, maybe not, but why bother?

Andy

--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
-

To: Andy Parkins <andyparkins@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, <git@...>
Date: Friday, September 7, 2007 - 4:08 am

The difference is that you can pass structures from one library into
another with tolerable efficiency. Because there are only basically 2

What about "With regard to programming styles" did you not understand?
I was not talking about a technical feature at class level, but about

Maintainability and extensibility certainly are valid arguments for
rewrites. But C++ does not really shine in that regard.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 11:06 pm

El 7/9/2007, a las 2:21, Dmitry Kakurin escribi

To: <git@...>
Date: Friday, September 7, 2007 - 4:36 am

I can appreciate that. I originally got into writing compilers because
my game (Empire) ran too slowly and I thought the existing compilers
could be dramatically improved.

And technically, yes, you can write code in C that is >= the speed of
any other language (other than asm). But practically, this isn't
necessarily so, for the following reasons:

1) You wind up having to implement the complex, dirty details of things
yourself. The consequences of this are:

a) you pick a simpler algorithm (which is likely less efficient - I
run across bubble sorts all the time in code)

b) once you implement, tune, and squeeze all the bugs out of those
complex, dirty details, you're reluctant to change it. You're reluctant
to try a different algorithm to see if it's faster. I've seen this
effect a lot in my own code. (I translated a large body of my own C++
code that I'd spent months tuning to D, and quickly managed to get
significantly more speed out of it, because it was much simpler to try
out different algorithms/data structures.)

2) Garbage collection has an interesting and counterintuitive
consequence. If you compare n malloc/free's with n gcnew/collections,
the malloc/free will come out faster, and you conclude that gc is slow.
But that misses one huge speed advantage of gc - you can do FAR fewer
allocations! For example, I've done a lot of string manipulating
programs in C. The basic problem is keeping track of who owns each
string. This is done by, when in doubt, make a copy of the string.

But if you have gc, you don't worry about who owns the string. You just
make another pointer to it. D takes this a step further with the concept
of array slicing, where one creates windows on existing arrays, or
windows on windows on windows, and no allocations are ever done. It's
just pointer fiddling.

------
Walter Bright
http://www.digitalmars.com C, C++, D programming language compilers
http://www.astoriaseminar.com Extraordinary C++

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 7:52 am

El 7/9/2007, a las 10:36, Walter Bright escribi

To: <git@...>
Date: Friday, September 7, 2007 - 3:25 pm

That may very well be true. I've never looked at the source code for
git, so I'm not in any position to judge it. Nor do I suggest
translating a debugged, working, 80,000 line project into another language.

My comments here are in more general terms.

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 5:41 am

I haven't seen this in the development of git, although to be fair, you
didn't mention the number of developers that were simultaneously working
on your project. If it was you alone, I can imagine you were reluctant to
change it just to see if something is faster.

Opensource projects with many contributors (git, linux) work differently,
since one or a few among the plethora of authors will almost always be
a true expert at the problem being solved.

The current pack-format and how it's read is one such example. It was
done once, by the combined efforts of Linus and Junio (this is all off
the top of my head and I cba to go looking up the details, so bear with
me if there are errors). Linus and Junio are both very good C-programmers,
but the handling of packfiles was not what you'd call their specialty.
Along came Nicolas Pitre, another excellent C programmer, who probably
has done some similar work before. He constructed a better algorithm,
eventually resulting in the ultimate performance win with a net gain
in both time and size (gj, Nicolas).

The point is that, given enough developers, *someone* is bound to
find an algorithm that works so well that it's no longer worth
investing time to even discuss if anything else would work better,
either because it moves the performance bottleneck to somewhere else
(where further speedups would no longer produce humanly measurable
improvements), or because the action seems instantanous to the user
(further improvements simply aren't worth it, because no valuable
resource will be saved from it).

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: <git@...>
Date: Friday, September 7, 2007 - 3:23 pm

On my project, one. But I've seen this problem repeatedly in other
projects that had multiple developers. For example, I used to use
version 1 of an assembler. It was itself written entirely in assembler.
It ran *incredibly* slowly on large asm files. But it was written in
assembler, which is very fast, so how could that be?

Turns out, the symbol table used internally was a linear one. A linear
symbol table is easy to implement, but doesn't scale well at all. A
linear symbol table was implemented because it was just harder to do
more advanced symbol table algorithms in assembler. In this case, a
higher level language re-implementation made the assembler much faster,
even though that implementation was SLOWER in every detail. It was

My point was that when I reimplemented it in D, the cost of changing the
algorithms got much lower, so I was much more tempted to muck around

That is a nice advantage. I don't think many projects can rely on having

Sure, but I suggest that few projects reach this maxima. Case in point:
ld, the gnu linker. It's terribly slow. To see how slow it is, compare
it to optlink (the 15 years old one that comes with D for Windows). So I
don't believe there is anything inherent about linking that should make
ld so slow. There's some huge leverage possible in speeding up ld
(spreading out that saved time among all the gnu developers).

So while git may have reached a maxima in performance, I don't think
this principle is applicable in general, even for very widely used open
source projects that would profit greatly from improved performance.

------
Walter Bright
http://www.digitalmars.com C, C++, D programming language compilers
http://www.astoriaseminar.com Extraordinary C++

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Saturday, September 8, 2007 - 8:25 pm

Well, when the ease-of-coding vs the exec-speed of D vs C is that of
C vs asm, C will be dead fairly soon. However, since C is so ingrained
in every language designer's head, I find that unlikely to happen any

True that. I know a fair few projects that could have done with borrowing
one or two proper gurus, but even opensource programmers are selfish in

True again, but given what I said above holds, it would be madness to
move from the lingua franca of oss hacking to a less common one, as it

Interesting. I recently did a spot of work comparing various string-hashing
algorithms. Perhaps I should head over to the ld camp and see if I can help.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: <git@...>
Date: Monday, September 10, 2007 - 6:33 am

You can write "C" in D, and you'll get exactly the same (performance)
results. After all, D and C share the same optimizer and code generator
(for both implementations of D), and when the same intermediate code is
presented, you'll get the same results.

That's why when people benchmark D against C, they deliberately do NOT
write the D version in a C'ish manner, but refactor the code into what
would be a more D'ish style.

I humbly suggest running a profiler over ld before spending time fixing
the wrong thing <g>. I haven't looked at the ld source, but being
experienced in similar projects I'd hazard a guess that there won't be a
quick fix to ld's speed problems.

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Tuesday, September 11, 2007 - 4:17 am

I'm fairly sure of the same, but even small speedups in such a commonly used
tool are worth pursuing.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Andreas Ericsson <ae@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Sunday, September 9, 2007 - 4:22 am

Well, my good wishes go with you! If ld.so would be affected as well,
you'd probably not help just developers.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:40 pm

Well, my first system was a Z80 computer with an editor/assembler in
ROM (4kb). At one time I tried figuring out the size requirements of
symbols. It was two bytes for each symbol. Namely the value. The
"symbol table" was located behind the source code. Whenever this
marvel of technology encountered a label, it searched the source code
from the beginning for the definition of the label, keeping count of
all label definitions in between. When it found the definition, the
count corresponded to the position in the symbol table.

So compilation times were O(ns), with n the number of symbol uses and
s the size of the source code.

Implementing in a higher language would not have helped: memory
efficiency was what dictated this layout. Given that the whole
available memory was perhaps 50kB, assembly language modules could not
get so large that scale issues were deadly. But the assembly times
did get annoying sometimes.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Wincent Colaiuta <win@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:25 am

Wincent Colaiuta wrote:

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 7:30 am

El 7/9/2007, a las 8:25, Andreas Ericsson escribi

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>, Wincent Colaiuta <win@...>
Date: Friday, September 7, 2007 - 6:56 am

Hi,

I have a buck here that says that you cannot hand-optimise assembly (on
modern processors at least) as good as even gcc.

Ciao,
Dscho

-

To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Andreas Ericsson <ae@...>, Git <git@...>, Wincent Colaiuta <win@...>
Date: Friday, September 7, 2007 - 12:09 pm

That assumes that the original task can even expressed well in C.
Multiple precision arithmetic, for example, requires access to the
carry bit. You can code around this, for example by writing something
like

unsigned a,b,carry;

[...]

carry = (a+b) < a;

but the problem is that those are ad-hoc idioms with a variety of
possibilities, and thus the compilers are not made to recognize them.
Another thing is mixed-precision multiplications and divisions: those
are _natural_ operations on a normal CPU, but have no representation
in assembly language.

As a consequence, most high performance multiple-precision packages
contain assembly language in some form or other.

gcc's assembly language template are excellent in that they actually
cooperate nicely with the optimizer, so the optimizer can do all the
address calculations and register assignments and opcode reorderings,
and then the actual operations that are not expressible in C can be
done by the programmer.

But anyway, I have worked as a graphics driver programmer for some
amount of time, and bit-stuffing memory-mapped areas with data was
still something where hand assembly was best.

I have also done BIOS terminal emulators, and being able to write
something like

ld b,whatever
myloop:
push bc
push hl
call nextchar
pop hl
pop bc
ld (hl),a
inc hl
djnz myloop

in order to suspend the terminal driver until the application comes up
with the next `whatever' output characters in an escape sequence is
_wagonloads_ more maintainable than using a state machine or whatever
else for distributing material delivered into the driver.

But this requires that nextchar can do something like
nextchar: ld (driverstack),sp
ld sp,(appstack)
ret

and the entrypoint, in contrast, does

outchar: ld (appstack),sp
ld sp,(driverstack)
ret

Cheap and expedient. You just need to set up a small stack, and
presto: coroutines, at absolutely negligible cost. I know that there
are some "portable" coroutine implementa...

To: Johannes Schindelin <Johannes.Schindelin@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>, Wincent Colaiuta <win@...>
Date: Friday, September 7, 2007 - 7:54 am

http://www.gelato.unsw.edu.au/archives/git/0504/1746.html

I win. Donate $1 to FSF next time you get the opportunity ;-)

Hand-optimized asm is faster because the optimizer in the compiler is a
general-purpose one that has to guess and make assumptions about the code
and its input to make the correct decisions. While it gets things right
in as many as 80% of the cases, there's still the 20% where it doesn't.
A human can, with sufficient research and effort, make the same optimizations
where they are correct but avoid the 20% erroneous ones.

If the compiler gets it wrong inside your innermost loop, it might be worth
shaving those extra 0.0001 seconds off of each iteration, because in the long
run, world-wide, it might save several weeks worth of CPU-time every day.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:33 am

El 7/9/2007, a las 13:54, Andreas Ericsson escribi

To: Wincent Colaiuta <win@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 9:58 am

Wincent Colaiuta wrote:

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 10:13 am

El 7/9/2007, a las 15:58, Andreas Ericsson escribi

To: Wincent Colaiuta <win@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 8:09 pm

Wincent Colaiuta wrote:

To: Wincent Colaiuta <win@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Johannes Schindelin <Johannes.Schindelin@...>, Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Andreas Ericsson <ae@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:55 am

And this is of course exactly the kind of spot where you _would_ use
assembly in the real world. 99.99% of code is better written in C than
assembler, but there is that 0.01% where hand-coded assembler is a
better choice.

--
Karl Hasselstr

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 8:38 pm

Unlike you, I actually gave reasons for my dislike of C++, and pointed to
examples of the kinds of failures that it leads to.

You, on the other hand, have given no sane reasons *for* using C++.

The fact is, git is better than the other SCM's. And good taste (and C) is
one of the reasons for that.

It has nothing to do with dinosaurs. Good taste doesn't go out of style,
and comparing C to assembler just shows that you don't have a friggin idea
about what you're talking about.

Linus
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:12 pm

To be very specific:
- simple and clear core datastructures, with *very* lean and aggressive
code to manage them that takes the whole approach of "simplicity over
fancy" to the extreme.
- a willingness to not abstract away the data structures and algorithms,
because those are the *whole*point* of core git.

And if you want a fancier language, C++ is absolutely the worst one to
choose. If you want real high-level, pick one that has true high-level
features like garbage collection or a good system integration, rather than
something that lacks both the sparseness and straightforwardness of C,
*and* doesn't even have the high-level bindings to important concepts.

IOW, C++ is in that inconvenient spot where it doesn't help make things
simple enough to be truly usable for prototyping or simple GUI
programming, and yet isn't the lean system programming language that C is
that actively encourags you to use simple and direct constructs.

Linus
-

To: <git@...>
Date: Friday, September 7, 2007 - 1:09 am

The D programming language is a different take than C++ has on growing
C. I'm curious what your thoughts on that are (D has garbage collection,
while still retaining the ability to directly manage memory). Can you
enumerate what you feel are the important concepts?

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 5:41 am

Well, to me D has two significant drawbacks to be "ready to use". The
first one is that it doesn't has bit-fields. I often deal with bit-fields
on structures that have a _lot_ of instances in my program, and the
bit-field is chosen for code readability _and_ structure size efficiency.
I know you pretend that using masks manually often generates better
code. But in my case, speed does not matter _that_ much. I mean it does,
but not that this micro-level as access to the bit-field is not my
inner-loop.

The other second issue I have, is that there is no way to do:
import (C) "foo.h"

And this is a big no-go (maybe not for git, but as a general issue)
because it impedes the use of external libraries with a C interface a
_lot_. E.g. I'd really like to use it to use some GNU libc extensions,
but I can't because it has too many dependencies (some async getaddrinfo
interface, that need me to import all the signal events and so on
extensions in the libc, with bitfields, wich send us back to the first
point).

I also have a third, but non critical issue, I absolutely don't like
phobos :) Though I'm obviously free to chose another library. D has
definitely many many many real advances over C (like the .init, .size,
=2E.. and so on fields, known types, and whatever portability nightmare
the C impose us). In fact I like to use D like I code in C, using
modules and functions, and very few classes, as few as I can. And even
(under- ?) using D like this, it is a real pleasure to work with. I'm
really eager to see gdc be more stable.

--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org

To: <git@...>
Date: Friday, September 7, 2007 - 3:03 pm

I'm surprised this is such an important issue. Others have mentioned it,
but regard it as a minor thing. Interestingly, the htod program (which
converts C .h files to D import files) will convert bit fields to inline

D does come with htod, which converts C .h files to D files. It's not
possible to do a perfect job (because of macros), but it comes pretty
darned close. The reason htod gets so close is because it is actually a
real C compiler front end, not a perl or regex string processing hack.

Because it (may) require a little hand tweaking of the results (again,
because C headers may include awful things like:
#define BEGIN {
#define print printf(

You're not the only one <g>. But I'll add that access to the standard C
runtime library *is* a part of D, so at some level it can't be worse
than C. There's also another runtime library available, Tango, which is

There are a lot of people hard at work on D to make it more stable and
increase the breadth and depth of tools available. I am fully aware that
there may be non-technical issues to using D in a project like git, like
availability of other D programmers, tradition, etc., but in this thread
I'm concerned mainly with technical issues.

P.S. I'm also NOT suggesting that git be converted to D. Translating a
working, debugged, 80,000 line codebase from one language to another is
usually a fool's errand.

Thanks for taking the time to post your thoughts.

-----------
Walter Bright
http://www.digitalmars.com C, C++, D programming language compilers
http://www.astoriaseminar.com Extraordinary C++

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:41 pm

Well htod does that, but it's very impractical to write them from
scratch. Especially if you want to benefit from the fact that padding
and integer sizes are very well defined to map e.g. structs onto a raw
stream, avoiding deserialization and so on. And for that bit-fields are
a really really fast and simple way to describe things.

I mean, take your classical example of the foreach loop. Your whole
point is that it's way shorter, and safer. And now you are saying that
people should instead of sth like:

struct my_struct {
unsigned some_field : 2;
unsigned has_this_property : 1;
unsigned is_in_this_state : 1;
unsigned priority_level : 2;
...
}

people should write (IIRC it works since ->some_field =3D 2 calls
->some_field(2) if the member does not exists, or maybe it's
set_some_field, it's not very relevant anyway):

struct my_struct {
unsigned some_field() {
return this->real_field >> 30;
}

void some_field(unsigned value) {
this->real_field |=3D (value & 3) << 30;
}

...

private:
unsigned real_field;
}

Please it has to be a joke: there is 42 ways for people to write it
wrong (wrong shifts, wrong masks, and so on), it's horribly obfuscated,
hence needs a lot of comments, whereas the bitfield is 90% self
documented, and the syntax is _very_ clear, you cannot beat that. I
would be absolutely fine with it being syntactical sugar for some kind
of template call though.

Not to mention that the usual C idiom:

union {
unsigned flags;
struct {
// many bitfields
};
};

Would need an explicit copy_flags(const my_struct foo) function to
work. Not pretty, not straightforward.

Really, I feel this is a big lack, for a language that aims at
simplicity, conciseness _and_ correctness.

OK, maybe I'm biased, I work with networks protocols all day long, so
I often need bitfields, but still, a lot of people deal with network

La...

To: <git@...>
Date: Friday, September 7, 2007 - 4:40 pm

True. I haven't tried yet (nobody else seems to care about it as much as

I should point out that inline functions are inlined, and there is no

I'm not following this. To copy a union, you just copy it with the
assignment operator:

U a, b;

You're right on both counts. It's because htod is built out of a fork of
the Digital Mars C compiler. Something similar could be done with gcc,
but I'm not the person to do it. I should also get off my lazy tail and

GDC was just released for D 1.020, which is behind D 1.021, but 1.021

And it's nice to hear your perspective, which is why I dropped by this
thread.

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 4:56 pm

I know that, and that's why I said I was totally fine with the
bitfield notation to be only syntactic sugar on a template thingy if

That was the point indeed. But if you don't have bitfields, you can't
do the union. And if the bitfield is just syntactic sugar, it may be

Sure, but it does not works on amd64 properly (and it's the
architecture I care about) and is not ready for the current gcc (4.2,
only 4.1 builds) and so on. It's not as stable as DMD is. It does not
lags too much version-wise, it lags in maturity. But well, youth has a
cure: time :)

--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org

To: <git@...>
Date: Friday, September 7, 2007 - 6:54 pm

Yes, and the more people use it, the better it will get. These are all
environmental problems, not technical limitations of the language.

-

To: Pierre Habouzit <madcoder@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 3:51 pm

Pierre Habouzit <madcoder@debian.org> writes:

And strictly speaking, C bitfields are completely useless for that
purpose since the compiler is free to use whatever method he wants for
allocating bit fields. So if you want to write a portable program,
you are back to making the masks yourself.

Where bit fields work reliably is when you are not interchanging data
with other applications, but just laying out your internals.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: David Kastrup <dak@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 3:59 pm

The point is (1) D is not C, (2) we all know that linux e.g. does that
in many places using the fact that it knows how the supported compilers
(gcc icc tcc maybe some other) do their packing.

The discussion is about D. D solves the infamous problem with longs
not having the same size everywhere, I don't see why it couldn't solve

Thank you for the _C_ lesson.

--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:31 pm

In my opinion there is basically one area which C has botched up
seriously in order to be useful as a general purpose language, and
that is conflating pointers and arrays, and allowing pointer
arithmetic. The consequences are absolutely awful with regard to
compilers being able to optimize, and it is pretty much the primary
reason that Fortran is still quite in use for numerical work.

C has no usable two-dimensional (never mind higher dimensions) array
concept that would allow passing multidimensional arrays of
runtime-determined size into functions. Period.

Add to that the pointer aliasing problems affecting compilers, and C
is useless for serious portable readable numerical work.

Fortran libraries like blas and lapack are ubiquitous after decades
because the language can deal with multiple-dimension arrays sensibly,
and could do so in the sixties already.

C99 helps a bit. But messing around with restrict pointers and
similar means that to wring equal performance out of some trivial code
piece (or permitting the compiler to do so without having to take
aliasing into account) is a lot of work and leads to ugly and
inscrutable code.

That's the one thing that has seriously hampered C: the lack of a true
array type on its own, decoupled from pointers. It does not need to
carry its dimensions with it or other
hide-the-implementation-from-the-programmer niceties: C is, after all,
a low-level language, and Fortran did not suffer from not having array
dimensions packed into the arrays as well.

But that's water down the drawbridge. This single major deficiency is
not anything that would hamper git development.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: <git@...>
Date: Friday, September 7, 2007 - 4:49 pm

I agree. It's one of those things that probably sounded like a good idea
at the time. The consequences were not foreseen. All languages have a
few of these (C++ has the infamous use of < > for template arguments).

-

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 3:40 am

A design is perfect not when there is no longer anything you can add
to it, but if there is no longer anything you can take away.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: David Kastrup <dak@...>
Cc: Walter Bright <boost@...>, <git@...>
Date: Friday, September 7, 2007 - 7:36 am

El 7/9/2007, a las 9:40, David Kastrup escribi

To: <git@...>
Date: Friday, September 7, 2007 - 4:15 am

I like to phrase that a slightly different way: anyone can make
something complicated, but it takes genius to make something simple.

A very big goal for D is to make what should be simple code, simple. It
turns out that what's simple for a computer is complex for a human. So
to design a language that is simple for programmers is (unfortunately) a
rather complex problem. Or perhaps I'm just not smart enough <g>.

A canonical example is that of a loop. Consider a simple C loop over an
array:

void foo(int array[10])
{
for (int i = 0; i < 10; i++)
{ int value = array[i];
... do something ...
}
}

It's simple, but it has a lot of problems:

1) i should be size_t, not int
2) array is not checked for overflow
3) 10 may not be the actual array dimension
4) may be more efficient to step through the array with pointers, rather
than indices
5) type of array may change, but the type of value may not get updated
6) crashes if array is NULL
7) only works with arrays and pointers

Since this thread is talking about C++, let's look at the C++ version:

void foo(std::vector<int> array)
{
for (std::vector<int>::const_iterator
i = array.begin();
i != array.end();
i++)
{
int value = *i;
... do something ...
}
}

It has fewer latent bugs, but still:

1) type of array may change, but the type of value may not get updated
2) too darned much typing
3) it's more complicated, not simpler

Frankly, I don't want to write loops that way. I want to write them like
this:

void foo(int[] array)
{
foreach (value; array)
{
... do something ...
}
}

As a programmer, I'm specifying exactly what I want to happen without
much extra puffery. It's less typing, simpler, and more resistant to bugs.

1) correct loop index type is selected based on the type of array
2) arrays carry with them their dimension, so foreach is guaranteed to
step through the loop the correct number ...

To: Walter Bright <boost@...>
Cc: <git@...>
Date: Friday, September 7, 2007 - 4:26 am

Wrong. size_t is for holding the size of memory objects in bytes, not
in terms of indices. For indices, the best variable is of the same
type as the declared index maximum size, so here it is typeof(10),

No. It is a beginners' and advanced users' mistake to think using
pointers for access is a good idea. Trivial optimizations are what a
compiler is best at, not the user. Using pointer manipulation will
more often than not break loop unrolling, loop reversal, strength

Most of those are toy concerns. They prevent problems that don't
actually occur much in practice.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: <git@...>
Date: Friday, September 7, 2007 - 5:14 am

The easiest way to show the error is consider the code being ported to a
typical 64 bit C compiler. int's are still 32 bits, yet the array can be
larger than 32 bits. You're right in that what we want to be able to do
is typeof(array dimension), but there is no way to do that automatically
in C, which is my point. If the array dimension changes, you have to
carefully check to make sure every loop dependency on the type is
updated, too.

size_t will always work, however, making it a better choice than int, at

Because the 10 array dimension is not statically checked in C. I could
pass it a pointer to 3 ints without the compiler complaining. This makes
it a potential maintenance problem. Also, the maintenance programmer may
change the array dimension in the function signature, but overlook

Array buffer overflow errors are commonplace in C, because array
dimensions are not automatically checked at either compile or run time.
This is an expensive problem. Some C APIs try to deal with this by
passing a second argument for arrays giving the dimension (snprintf, for
example), but this tends to be sporadic, not conventional. It being

C compilers vary widely in the optimizations they'll do for simple
loops. I see often enough attempts by programmers to take such matters
into their own hands. I agree with you on that - and suggest the

Let's say our fearless maintenance programmer decides to make it an
array of longs, not an array of ints. He overlooks changing the type of
value in the loop. Suddenly, things subtly break because of overflows.
Or maybe he changed the int to an unsigned, now the divides in the loop
give different answers. Etc. There really isn't any compiler/language

I consider an array that is NULL to have no members, so instead of

C has structs, too, as well as more complicated user defined
collections. Essentially, you cannot (simply) write generic algorithms
in C, because you cannot (simply) generically express iteration. Of
course, yo...

To: Linus Torvalds <torvalds@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:40 pm

Not to mention try finding two C++ compilers that support the same
language features. C is a known quantity. C++ depends on whos compiler
you use and what class libraries you use. Trying to make those things
work crossplatform is not an easy task. (Harder than it is in C at
least.)

A number of years ago, a programmer who will not be named (and is not me),
tried to port Perl to C++. It was a disaster. He found that every
compiler handled something differently.

If you stuck to one compiler, it might work. But trying to get GCC to
work like MS C++ or Borland C++ or whatever is just asking for pain.

--
Refrigerator Rule #1: If you don't remember when you bought it, Don't eat it.
-

To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:08 pm

As I said, it's a matter of believes. As such, any reasoning and
arguing will be endless and pointless, as for any other religious

I'll give you reasons why to use C++ for Git (not why C++ is better
for any project in general, as that again would be pointless):

1. Good String class will make code much more readable (and
significantly shorter)
2. Good Buffer class - same reason
3. Smart pointers and smart handles to manage memory and
file/socket/lock handles.

As it is right now, it's too hard to see the high-level logic thru

IMHO Git has a brilliant high-level design (object database, using
hashes, simple and accessible storage for data and metadata). Kudos to
you!
The implementation: a mixture of C and shell scripts, command line

I don't see myself comparing assembler to C anywhere.
I was pointing out that I've been programming in different languages
(many more actually) and observed bad developers writing bad code in
all of them. So this quality "bad developer" is actually
language-agnostic :-).
--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:50 am

But all of those are incompatible with another and require major
headaches and/or interface code to get to run with one another. And
then might use different interface styles, anyway.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 9:27 pm

Total BS. The string/memory management is not at all relevant. Look at the

The only really important part is the *design*. The fact that some of it
is in a "prototyping language" is exactly because it wasn't the core
parts, and it's slowly getting replaced. C++ would in *no* way have been
able to replace the shell scripts or perl parts.

You made a very clear "assembler -> C -> C++/C#" progression nin your
life, comparing my staying with C as a "dinosaur", as if it was some
inescapable evolution towards a better/more modern language.

With zero basis for it, since in many ways C is much superior to C++ (and
even more so C#) in both its portability and in its availability of

You can write bad code in any language. However, some languages, and
especially some *mental* baggages that go with them are bad.

The very fact that you come in as a newbie, point to some absolutely
*trivial* patches, and use that as an argument for a language that the
original author doesn't like, is a sign of you being a person who should
be disabused on any idiotic notions as soon as possible.

The things that actually *matter* for core git code is things like writing
your own object allocator to make the footprint be as small as possible in
order to be able to keep track of object flags for a million objects
efficiently. It's writing a parser for the tree objects that is basically
fairly optimal, because there *is* no abstraction. Absolutely all of it is
at the raw memory byte level.

Can those kinds of things be written in other languages than C? Sure. But
they can *not* be written by people who think the "high-level"
capabilities of C++ string handling somehow matter.

The fact is, that is *exactly* the kinds of things that C excels at. Not
just as a language, but as a required *mentality*. One of the great
strengths of C is that it doesn't make you think of your program as
anything high-level. It's what makes you apparently prefer other
languages, but the thing...

To: Linus Torvalds <torvalds@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:26 am

Hi,

There is an important additional point: a language like C _holds_ you to a
certain degree of diligence.

In my day-job I have to code in other languages, which make it "easy" to
code. As a result, the code I have to work with is sloppy, ugly and
buggy. By applying the same principles I am _forced_ to use in C, with
Git, I produce better code.

Ciao,
Dscho

-

To: Linus Torvalds <torvalds@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 11:09 pm

Not only have I looked at the code, I've also debugged it quite a bit.
Granted most of my problems had to do with handling paths on Windows
(i.e. string manipulations).

... and explain where I'm coming from:
My goal is to *use* Git. When something does not work *for me* I want
to be able to fix it (and contribute the fix) in *shortest time
possible* and with *minimal efforts*. As for me it's a diversion from
my main activities.
The fact that Git is written in C does not really contribute to that goal.
Suggestion to use C++ is the only alternative with existing C codebase.
So while C++ may not be the best choice "academically speaking" it's
pretty much the only practical choice.

"Democracy is the worst form of government except for all those others
that have been tried." - Winston Churchill

Now, I realize that I'm a very infrequent contributor to Git, but I
want my opinion to be heard.
People who carry the main weight of developing and maintaining Git
should make the call.
--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:28 am

Hi,

We are a happy little meritocracy here. Once you proved that you're not
full of shit (some seem to try the opposite, you know who you are), you
can go all caps. Before that, you'll have to show that you earn to be
heard first.

Ciao,
Dscho

-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:52 am

Sorry, but for fixing things in C, I can look and work locally. For
fixing things in C++, I first need to understand the class
hierarchies used in the project.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:31 am

Coupled with what you said in an earlier mail, namely
---%<---%<---

Considering C appeared in 1972, and C++ appeared in 1985, you have been
writing C code for 13 years. And you're telling me that git being written
in C prevents you from contributing?

If you want to do something useful in C++ for git, make it easy for C++

They already have, but every now and then someone comes along and suggest
a complete rewrite in some other language. So far we've had Java (there's
always one...), Python and now C++.

It happens to all projects, sooner or later. The funny thing is that all those
people that want their favourite software to be rewritten in their favourite
programming language always wants someone else to rewrite it for them.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Andreas Ericsson <ae@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:17 pm

Since this "complete rewrite" was mentioned in multiple emails I'd
like to rectify that:
What I'm offering (for Git) is to use C++ as a "better C".
Don't change any existing *working* code, but start introducing simple
C++ constructs in the new code.
Git is simple enough to not require any high-level abstractions. But
some utility classes could make code much simpler.

And BTW, I don't even like C++ that much :-), I just like it much
better than C. I've been saying that C++ is a legacy language for
quite some time now. But we will use it for many years to come because
the size of this legacy code is huge, so there will be plenty of C++
developers available (to contribute to Git :-).
And C++ is the only way to move with existing C codebase.
--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 8:29 pm

There are far too many highly valuable contributors that have spoken
against C++ for me to believe that C++ and C will ever co-exist in the
official git repo. Good thing utility classes can be developed on top
of the existing C-code, but in a separate repo, and packed into a
library. That way, you get some hacking ground for your beloved C++
coderswhile the current git contributors can keep contributing in the

The C code base is a lot larger and C++ will drop dead pretty fast if it's

Complete and utter BS. It can also stay in C, or get language bindings for
Python/Perl/PHP/LUA(?)/whatever, or both.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Andreas Ericsson <ae@...>, Git <git@...>
Date: Friday, September 7, 2007 - 6:28 pm

You are aware that the Linux kernel was kept compilable under g++ for
a while in its history? You'll need more than vague words to erase
the memories from that experiment...

Just compiling under C++, with no source changes, is likely to impact
performance and compile time rather badly, not to mention portability
(you need the C++ runtime, for one thing).

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: David Kastrup <dak@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Andreas Ericsson <ae@...>, Git <git@...>
Date: Friday, September 7, 2007 - 8:37 pm

This in fact is a very specific statement. Would you care to back it
up with facts?

--
- Dmitry
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Andreas Ericsson <ae@...>, Git <git@...>
Date: Saturday, September 8, 2007 - 2:25 am

Read up on the Linux kernel history in the archives.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 2:15 am

I consider string manipulation to be one of the places where C++ is a
total disaster. It's way to easy for idiots to do something like this:

a = b + "/share/" + c + serial_num;

where you can have absolutely no idea how many memory allocations are
done, due to type coercions, overloaded operators (good God, you can
overload the comma operator in C++!!!), and then when something like
that ends up in an inner loop, the result is a disaster from a

Yes, and if you contribute something the shortest time possible, and
it ends up being crap, who gets to rewrite it and fix it? I've seen
too many C++ programs which get this kind of crap added, and it's not
noticed right away (because C++ is really good at hiding such
performance killers so they are not visible), and then later on, it's

And if git were written in C++, it's precisely the infrequent
contributors (who are in a hurry, who only care about the quick hack
to get them going, and not about the long-term maintainability and
performance of the package) that are be in the position to do the
most damage...

- Ted
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Linus Torvalds <torvalds@...>, Git <git@...>
Date: Friday, September 7, 2007 - 1:48 am

That's just it -- Git's goal isn't to make it as easy as possible for
Git _users_ to fix it (thought that is a nice thing to have). Git's
goal is to be a very good, very fast SCM. Bugs should be found and
fixed, but that can most effectively be done by the people who are
already knowledgeable about Git's codebase (i.e. its developers), not
its users.

Dave.
-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 1:03 am

Just to piss you off.

-Miles

--
Love is a snowmobile racing across the tundra. Suddenly it flips over,
pinning you underneath. At night the ice weasels come. --Nietzsche
-

To: Miles Bader <miles@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 8:08 am

Hi,

Hehe.

FWIW I strongly disagree that it's BS. As others have stated, the reasons
are easily found, and they are no weak arguments.

Ciao,
Dscho

-

To: Dmitry Kakurin <dmitry.kakurin@...>
Cc: Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 12:59 am

Git's creator (Linus) codes in C, not C++. He has at various times
stated reasons why he does not use C++. I'm sure one can find such
messages with a bit of searching on mailing lists that he frequents.
He has his reasons. I also happen to agree with at least some
of them. :)

Git evolved from that initial prototype that Linus created. I'm not
sure how much code survives from that initial few versions that
Linus managed before Junio took over, but nobody wanted to rewrite
things that already work so it just stayed in C.
"If it works, don't fix it."

C works. We (now) have 83,215 lines of it. Its not going away
anytime soon in Git. It is also a relatively simple language that
a large number of open source programmers know. This makes it easy
for them to get involved in the project. Instead of say Haskell,
which has a smaller community. Or Tcl/Tk as we recently found out
in the Git User Survey. :-\

--
Shawn.
-

To: Shawn O. Pearce <spearce@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 5:12 am

This is important. Git contains code from more than 300 people. I'm
guessing you could cut that number by 2/3 if it had been written in C++.

Git is cheating a bit though. Its primary audience was (and is) the
various integrators working on the Linux kernel, all of whom are fairly
competent C programmers.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Shawn O. Pearce <spearce@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 5:52 am

C++ is a language without design discipline. Its set of features and
syntactic elements is incontingent (for example, its templates started
as a ripoff of Ada generics which would have been ok except for the
completely braindead idea of taking the Ada angle bracket restriction
syntax along with it), and it is the task of each programmer to choose
a sane and manageable subset and style, and implement using that. As
a consequence, every C++ programmer writes his own personal dialect of
C++, and we have about 20 different incompatible implementations of
multidimensional numeric arrays, making a complete mockery of the
"code reuse" mantra: C++ _projects_ can't actually usefully achieve
"multiple inheritance" on a design/meta level: once you start with one
non-trivial design, fitting other separately evolved components with a
different style causes retrofitting nightmares.

So going to C++ means cutting down the amount of people who find
themselves comfortable with the actual design and layout down to maybe
10% of those who would actually feel ok with the actual _algorithms_
employed.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

To: Andreas Ericsson <ae@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Shawn O. Pearce <spearce@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 5:35 am

Do we still have a huge overlap with the kernel people? I had
an impression that patches from the kernel folks, with notable
exception from a handful (you know who you are), have petered
out rapidly after the first several weeks.
-

To: Junio C Hamano <gitster@...>
Cc: Dmitry Kakurin <dmitry.kakurin@...>, Matthieu Moy <Matthieu.Moy@...>, Shawn O. Pearce <spearce@...>, Git <git@...>
Date: Thursday, September 6, 2007 - 6:21 am

True, but the point I was trying to make is that because git is written
in C, for an audience who are extremely at home with that particular
language, it quickly attracted contributors.

git log --pretty=short | sed -n 's/^Author: \([^<]*\)<.*$/\1/p' | \
sort | uniq | wc -l

reports 355 unique lines, although some authors are mentioned twice
(Theodore Tso vs Theodore Ts'o). Cross-matching the kernel authors
with the git authors shows that git and linux have 111 developers
in common, again reporting some of them twice. A quick visual scan
shows the figure to be 106, assuming no two authors have the same
name (including email addresses produced more unique contributors as
people change email more often than they change name).

It's not unreasonable to say that git got at least 106 C-programmers
"for free" included in their userbase round about the same second
Linus went public with his intentions of managing the linux kernel
in git, all of which are obviously comfortable enough with C to
poke around in the kernel.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-

To: Matthieu Moy <Matthieu.Moy@...>
Cc: Junio C Hamano <junkio@...>, Lukas <lukass@...>, Kristian <krh@...>, Git Mailing List <git@...>
Date: Wednesday, September 5, 2007 - 10:30 pm

From what I've seen (by perusing the bstring website), bstring is kind
of ugly though....

-Miles

--
"Suppose we've chosen the wrong god. Every time we go to church we're
just making him madder and madder." -- Homer Simpson
-

To: Lukas <lukass@...>
Cc: Junio C Hamano <junkio@...>, Git Mailing List <git@...>
Date: Tuesday, September 4, 2007 - 5:38 pm

Lukas Sandstr

To: Alex Riesen <raa.lkml@...>
Cc: Junio C Hamano <junkio@...>, Lukas <lukass@...>, Git Mailing List <git@...>
Date: Tuesday, September 4, 2007 - 7:01 pm

Well I honestly believe that putting strbufs/bstrings in mailinfo.c
adds no value. I was going to give it a try to see how strbufs
performed, but it's just useless.

The main problem mailinfo has, it's according to Junio that it may
sometimes truncate some things in buffers at 1000 octets, without dying
loudly. That is bad.

_but_ there is no point in using arbitrary long string buffers to
parse a mail. Remember, a mail goes through SMTP, and SMTP is supposed
to limit its lines at 512 characters (without use of extensions at
least). Not to mention that an email address cannot be more than 64+256
chars long (or sth around that). So using variable lengths buffers is
just a waste.

string buffers are not really (IMHO) supposed to help in parsing
tasks, and when you need to do some serious parsing, either do it by
hand or use lex, but nothing in between makes sense to me.

OTOH, string buffers can be used in many places where git has (at
least 4 different to my current count, growing) many implementations of
always slightly different kind of buffers. I've some more patches
pending here than the one I already sent, and well, here is the
diffstat:

$ git diff --stat origin/master.. ^strbuf*
archive-tar.c | 67 ++++++++++++---------------------------------=
---
builtin-apply.c | 29 ++++++---------------
builtin-blame.c | 34 ++++++++-----------------
builtin-commit-tree.c | 59 +++++++++---------------------------------
builtin-rerere.c | 53 +++++++++++---------------------------
cache-tree.c | 57 ++++++++++++++---------------------------
diff.c | 25 ++++++------------
fast-import.c | 38 +++++++++++----------------
mktree.c | 26 ++++++-------------
9 files changed, 116 insertions(+), 272 deletions(-)

I mean, there is not even a need to show the diff to understand what
the gain is. And that was possible, because strbufs are straightforward,
and gives you the kind of control...

Previous thread: Re: [PATCH] Add post-merge hook. by Junio C Hamano on Tuesday, September 4, 2007 - 1:25 pm. (6 messages)

Next thread: [PATCH] Function for updating refs. by Carlos Rica on Tuesday, September 4, 2007 - 9:38 pm. (1 message)