Hi. This is an attempt to use "The Better String Library"[1] in builtin-mailinfo.c The patch doesn't pass all the tests in the testsuit yet, but I thought I'd send it out so people can decide if they like how the code looks. I'm not sending a patch to add the library files at this time. I'll send that patch when this patch is working. The changes required to make it pass the tests shouldn't be very large. /Lukas [1] http://bstring.sourceforge.net/ --- builtin-mailinfo.c | 795 ++++++++++++++++++++++++++-------------------------- 1 files changed, 392 insertions(+), 403 deletions(-) diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c index d7cb11d..2ddc15d 100644 --- a/builtin-mailinfo.c +++ b/builtin-mailinfo.c @@ -5,14 +5,14 @@ #include "cache.h" #include "builtin.h" #include "utf8.h" +#include "bstring/bstrlib.h" static FILE *cmitmsg, *patchfile, *fin, *fout; static int keep_subject; -static const char *metainfo_charset; -static char line[1000]; -static char name[1000]; -static char email[1000]; +static bstring metainfo_charset; +static bstring name; +static bstring email; static enum { TE_DONTCARE, TE_QP, TE_BASE64, @@ -21,321 +21,291 @@ static enum { TYPE_TEXT, TYPE_OTHER, } message_type; -static char charset[256]; +static bstring charset; static int patch_lines; -static char **p_hdr_data, **s_hdr_data; +static bstring *p_hdr_data, *s_hdr_data; #define MAX_HDR_PARSED 10 #define MAX_BOUNDARIES 5 -static char *sanity_check(char *name, char *email) +static bstring sanity_check(bstring name, bstring email) { - int len = strlen(name); - if (len < 3 || len > 60) + static struct tagbstring email_ind = bsStatic("<@>"); + if (blength(name) < 3 || blength(name) > 60) return email; - if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>')) + if (binchr(name, 0, &email_ind) != BSTR_ERR) return email; return name; } -static int bogus_from(char *line) +static int bogus...
Unfortunatley, I haven't had any time inte the last few days to code, nor read mail. I'm assuming that there is no point in me finishing the patch and that git will go with the strbuf solution? /Lukas -
On Tue, 2007-09-04 at 22:50 +0200, Lukas Sandstr
Please, no. Let's not pull in a dependency for something as simple as a string library. How many distros have bstring pcakaged? The right version? Does it work on Windows? We already have strbuf.c, lets just consolidate the string manipulation code already in git under that interface. Kristian -
Kristian H
[ snip ] When I first looked at Git source code two things struck me as odd: 1. Pure C as opposed to C++. No idea why. Please don't talk about portability, it's BS. 2. Brute-force, direct string manipulation. It's both verbose and error-prone. This makes it hard to follow high-level code logic. - Dmitry -
*YOU* are full of bullshit. C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C. In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with. C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes: - infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny) - inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app. In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap. So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big...
As dinosaurs (who code exclusively in C) are becoming extinct, you will soon find yourself alone with attitude like this. Measuring number of people who contributed to Git is incorrect metric. Obviously C++ developers can contribute C code. But assuming that they prefer it that way is wrong. I was coding in Assembly when there was no C. Then in C before C++ was created. Now days it's C++ and C#, and I have never looked back. Bad developers will write bad code in any language. But penalizing good developers for this illusive reason of repealing bad contributors is nonsense. Anyway I don't mean to start a religious C vs. C++ war. It's a matter of beliefs and as such pointless. I just wanted to get a sense of how many people share this "Git should be in pure C" doctrine. -- - Dmitry -
Hi, No, it's not. As has been shown by some very good _arguments_. Once you have facts to back up your claims, it is not any belief any longer. Ciao, Dscho -
I honestly didn't. I didn't even think it's possible. In the environment of mainstream commercial software development the last war on this subj was over 8-10 years ago. Even wars like "do we use exceptions/templates/stl" are pretty much over. Now days it's "do we use Boost", or "do we use template metaprogramming". But even more often it's Java/C# vs. C++. Well I've heard *opinions* and anecdotal evidence. No facts though. And it's not surprising. There could be no hard facts in such a matter. It always boils down to "most of all, I want my software to be X" where X is different for different people (fast,maintainable,quick to market, scalable, beautiful, etc ... to name a few). With different values of X any debate is pointless. And X is exactly the matter of believes. Anyway my curiosity is satisfied (thru the roof so to speak) and I think it's enough on the subj. It has reminded me of good old times though. -- - Dmitry -
It is because the "environment of mainstream commercial software Now that's a stupid argument to bring up. Commercial software "Just to annoy mainstream commercial software developers" would be a good reason. -
Anecdotal evidence _is_ hard facts. That's what experience is all about. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
As long as TeX, Emacs and vi are around, I would not worry too much about dinosaurs in general. But C++ is a cancerous dinosaur. It has The problem with C++ is that every C++ developer has his own style, and reuse is an illusion within that style. Take a look at classes implementing matrix arithmetic: there are as many around as the day is long, and all of them are incompatible with one another. With regard to programming styles, C++ does not support multiple inheritance. For a single project grown from a single start, you can get reasonable solutions. But combining stuff is creating maintenance messes. With C, the situation is not dissimilar, but you spent less time What nonsense. Large parts of git already are shell scripts, so obviously there is no such doctrine. Just because C++ is not a sane proposition does not mean that others might not work. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
On Friday 2007 September 07, David Kastrup wrote: (Disclaimer: I'm certainly not joining the "C++ for git" chant; this reply is One could say the same about any API. "Take a look at that C library libXYZ - it does exactly the same thing as libPQR but all the function calls and Multiple inheritance is the spawn of the devil, but C++ _does_ support it. Forgetting about the terrible STL, to me there really is no difference between C and C++; you can be object oriented in C. Take a look at the Linux kernel, it should be printed out, rolled up and used to beat the ideas into students learning C++/Java/C#. Object oriented design is a choice, and if you really wanted you could do it in assembly. I would imagine the reason people often turn up wanting to rewrite Linux and git in C++ is because they are so object oriented in nature already and it's natural to think "wouldn't this be even better if I wrote it in an object oriented language"? Maybe, maybe not, but why bother? Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com -
The difference is that you can pass structures from one library into another with tolerable efficiency. Because there are only basically 2 What about "With regard to programming styles" did you not understand? I was not talking about a technical feature at class level, but about Maintainability and extensibility certainly are valid arguments for rewrites. But C++ does not really shine in that regard. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
El 7/9/2007, a las 2:21, Dmitry Kakurin escribi
I can appreciate that. I originally got into writing compilers because
my game (Empire) ran too slowly and I thought the existing compilers
could be dramatically improved.
And technically, yes, you can write code in C that is >= the speed of
any other language (other than asm). But practically, this isn't
necessarily so, for the following reasons:
1) You wind up having to implement the complex, dirty details of things
yourself. The consequences of this are:
a) you pick a simpler algorithm (which is likely less efficient - I
run across bubble sorts all the time in code)
b) once you implement, tune, and squeeze all the bugs out of those
complex, dirty details, you're reluctant to change it. You're reluctant
to try a different algorithm to see if it's faster. I've seen this
effect a lot in my own code. (I translated a large body of my own C++
code that I'd spent months tuning to D, and quickly managed to get
significantly more speed out of it, because it was much simpler to try
out different algorithms/data structures.)
2) Garbage collection has an interesting and counterintuitive
consequence. If you compare n malloc/free's with n gcnew/collections,
the malloc/free will come out faster, and you conclude that gc is slow.
But that misses one huge speed advantage of gc - you can do FAR fewer
allocations! For example, I've done a lot of string manipulating
programs in C. The basic problem is keeping track of who owns each
string. This is done by, when in doubt, make a copy of the string.
But if you have gc, you don't worry about who owns the string. You just
make another pointer to it. D takes this a step further with the concept
of array slicing, where one creates windows on existing arrays, or
windows on windows on windows, and no allocations are ever done. It's
just pointer fiddling.
------
Walter Bright
http://www.digitalmars.com C, C++, D programming language compilers
http://www.astoriaseminar.com Extraordinary C++
-El 7/9/2007, a las 10:36, Walter Bright escribi
That may very well be true. I've never looked at the source code for git, so I'm not in any position to judge it. Nor do I suggest translating a debugged, working, 80,000 line project into another language. My comments here are in more general terms. -
I haven't seen this in the development of git, although to be fair, you didn't mention the number of developers that were simultaneously working on your project. If it was you alone, I can imagine you were reluctant to change it just to see if something is faster. Opensource projects with many contributors (git, linux) work differently, since one or a few among the plethora of authors will almost always be a true expert at the problem being solved. The current pack-format and how it's read is one such example. It was done once, by the combined efforts of Linus and Junio (this is all off the top of my head and I cba to go looking up the details, so bear with me if there are errors). Linus and Junio are both very good C-programmers, but the handling of packfiles was not what you'd call their specialty. Along came Nicolas Pitre, another excellent C programmer, who probably has done some similar work before. He constructed a better algorithm, eventually resulting in the ultimate performance win with a net gain in both time and size (gj, Nicolas). The point is that, given enough developers, *someone* is bound to find an algorithm that works so well that it's no longer worth investing time to even discuss if anything else would work better, either because it moves the performance bottleneck to somewhere else (where further speedups would no longer produce humanly measurable improvements), or because the action seems instantanous to the user (further improvements simply aren't worth it, because no valuable resource will be saved from it). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
On my project, one. But I've seen this problem repeatedly in other projects that had multiple developers. For example, I used to use version 1 of an assembler. It was itself written entirely in assembler. It ran *incredibly* slowly on large asm files. But it was written in assembler, which is very fast, so how could that be? Turns out, the symbol table used internally was a linear one. A linear symbol table is easy to implement, but doesn't scale well at all. A linear symbol table was implemented because it was just harder to do more advanced symbol table algorithms in assembler. In this case, a higher level language re-implementation made the assembler much faster, even though that implementation was SLOWER in every detail. It was My point was that when I reimplemented it in D, the cost of changing the algorithms got much lower, so I was much more tempted to muck around That is a nice advantage. I don't think many projects can rely on having Sure, but I suggest that few projects reach this maxima. Case in point: ld, the gnu linker. It's terribly slow. To see how slow it is, compare it to optlink (the 15 years old one that comes with D for Windows). So I don't believe there is anything inherent about linking that should make ld so slow. There's some huge leverage possible in speeding up ld (spreading out that saved time among all the gnu developers). So while git may have reached a maxima in performance, I don't think this principle is applicable in general, even for very widely used open source projects that would profit greatly from improved performance. ------ Walter Bright http://www.digitalmars.com C, C++, D programming language compilers http://www.astoriaseminar.com Extraordinary C++ -
Well, when the ease-of-coding vs the exec-speed of D vs C is that of C vs asm, C will be dead fairly soon. However, since C is so ingrained in every language designer's head, I find that unlikely to happen any True that. I know a fair few projects that could have done with borrowing one or two proper gurus, but even opensource programmers are selfish in True again, but given what I said above holds, it would be madness to move from the lingua franca of oss hacking to a less common one, as it Interesting. I recently did a spot of work comparing various string-hashing algorithms. Perhaps I should head over to the ld camp and see if I can help. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
You can write "C" in D, and you'll get exactly the same (performance) results. After all, D and C share the same optimizer and code generator (for both implementations of D), and when the same intermediate code is presented, you'll get the same results. That's why when people benchmark D against C, they deliberately do NOT write the D version in a C'ish manner, but refactor the code into what would be a more D'ish style. I humbly suggest running a profiler over ld before spending time fixing the wrong thing <g>. I haven't looked at the ld source, but being experienced in similar projects I'd hazard a guess that there won't be a quick fix to ld's speed problems. -
I'm fairly sure of the same, but even small speedups in such a commonly used tool are worth pursuing. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Well, my good wishes go with you! If ld.so would be affected as well, you'd probably not help just developers. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Well, my first system was a Z80 computer with an editor/assembler in ROM (4kb). At one time I tried figuring out the size requirements of symbols. It was two bytes for each symbol. Namely the value. The "symbol table" was located behind the source code. Whenever this marvel of technology encountered a label, it searched the source code from the beginning for the definition of the label, keeping count of all label definitions in between. When it found the definition, the count corresponded to the position in the symbol table. So compilation times were O(ns), with n the number of symbol uses and s the size of the source code. Implementing in a higher language would not have helped: memory efficiency was what dictated this layout. Given that the whole available memory was perhaps 50kB, assembly language modules could not get so large that scale issues were deadly. But the assembly times did get annoying sometimes. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Wincent Colaiuta wrote: > El 7/9/2007, a las 2:21, Dmitry Kakurin escribi
El 7/9/2007, a las 8:25, Andreas Ericsson escribi
Hi, I have a buck here that says that you cannot hand-optimise assembly (on modern processors at least) as good as even gcc. Ciao, Dscho -
That assumes that the original task can even expressed well in C. Multiple precision arithmetic, for example, requires access to the carry bit. You can code around this, for example by writing something like unsigned a,b,carry; [...] carry = (a+b) < a; but the problem is that those are ad-hoc idioms with a variety of possibilities, and thus the compilers are not made to recognize them. Another thing is mixed-precision multiplications and divisions: those are _natural_ operations on a normal CPU, but have no representation in assembly language. As a consequence, most high performance multiple-precision packages contain assembly language in some form or other. gcc's assembly language template are excellent in that they actually cooperate nicely with the optimizer, so the optimizer can do all the address calculations and register assignments and opcode reorderings, and then the actual operations that are not expressible in C can be done by the programmer. But anyway, I have worked as a graphics driver programmer for some amount of time, and bit-stuffing memory-mapped areas with data was still something where hand assembly was best. I have also done BIOS terminal emulators, and being able to write something like ld b,whatever myloop: push bc push hl call nextchar pop hl pop bc ld (hl),a inc hl djnz myloop in order to suspend the terminal driver until the application comes up with the next `whatever' output characters in an escape sequence is _wagonloads_ more maintainable than using a state machine or whatever else for distributing material delivered into the driver. But this requires that nextchar can do something like nextchar: ld (driverstack),sp ld sp,(appstack) ret and the entrypoint, in contrast, does outchar: ld (appstack),sp ld sp,(driverstack) ret Cheap and expedient. You just need to set up a small stack, and presto: coroutines, at absolutely negligible cost. I know that there are some "portable" coroutine implementa...
http://www.gelato.unsw.edu.au/archives/git/0504/1746.html I win. Donate $1 to FSF next time you get the opportunity ;-) Hand-optimized asm is faster because the optimizer in the compiler is a general-purpose one that has to guess and make assumptions about the code and its input to make the correct decisions. While it gets things right in as many as 80% of the cases, there's still the 20% where it doesn't. A human can, with sufficient research and effort, make the same optimizations where they are correct but avoid the 20% erroneous ones. If the compiler gets it wrong inside your innermost loop, it might be worth shaving those extra 0.0001 seconds off of each iteration, because in the long run, world-wide, it might save several weeks worth of CPU-time every day. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
El 7/9/2007, a las 13:54, Andreas Ericsson escribi
Wincent Colaiuta wrote: > El 7/9/2007, a las 13:54, Andreas Ericsson escribi
El 7/9/2007, a las 15:58, Andreas Ericsson escribi
Wincent Colaiuta wrote: > El 7/9/2007, a las 15:58, Andreas Ericsson escribi
And this is of course exactly the kind of spot where you _would_ use assembly in the real world. 99.99% of code is better written in C than assembler, but there is that 0.01% where hand-coded assembler is a better choice. -- Karl Hasselstr
Unlike you, I actually gave reasons for my dislike of C++, and pointed to examples of the kinds of failures that it leads to. You, on the other hand, have given no sane reasons *for* using C++. The fact is, git is better than the other SCM's. And good taste (and C) is one of the reasons for that. It has nothing to do with dinosaurs. Good taste doesn't go out of style, and comparing C to assembler just shows that you don't have a friggin idea about what you're talking about. Linus -
To be very specific: - simple and clear core datastructures, with *very* lean and aggressive code to manage them that takes the whole approach of "simplicity over fancy" to the extreme. - a willingness to not abstract away the data structures and algorithms, because those are the *whole*point* of core git. And if you want a fancier language, C++ is absolutely the worst one to choose. If you want real high-level, pick one that has true high-level features like garbage collection or a good system integration, rather than something that lacks both the sparseness and straightforwardness of C, *and* doesn't even have the high-level bindings to important concepts. IOW, C++ is in that inconvenient spot where it doesn't help make things simple enough to be truly usable for prototyping or simple GUI programming, and yet isn't the lean system programming language that C is that actively encourags you to use simple and direct constructs. Linus -
The D programming language is a different take than C++ has on growing C. I'm curious what your thoughts on that are (D has garbage collection, while still retaining the ability to directly manage memory). Can you enumerate what you feel are the important concepts? -
Well, to me D has two significant drawbacks to be "ready to use". The first one is that it doesn't has bit-fields. I often deal with bit-fields on structures that have a _lot_ of instances in my program, and the bit-field is chosen for code readability _and_ structure size efficiency. I know you pretend that using masks manually often generates better code. But in my case, speed does not matter _that_ much. I mean it does, but not that this micro-level as access to the bit-field is not my inner-loop. The other second issue I have, is that there is no way to do: import (C) "foo.h" And this is a big no-go (maybe not for git, but as a general issue) because it impedes the use of external libraries with a C interface a _lot_. E.g. I'd really like to use it to use some GNU libc extensions, but I can't because it has too many dependencies (some async getaddrinfo interface, that need me to import all the signal events and so on extensions in the libc, with bitfields, wich send us back to the first point). I also have a third, but non critical issue, I absolutely don't like phobos :) Though I'm obviously free to chose another library. D has definitely many many many real advances over C (like the .init, .size, =2E.. and so on fields, known types, and whatever portability nightmare the C impose us). In fact I like to use D like I code in C, using modules and functions, and very few classes, as few as I can. And even (under- ?) using D like this, it is a real pleasure to work with. I'm really eager to see gdc be more stable. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
I'm surprised this is such an important issue. Others have mentioned it,
but regard it as a minor thing. Interestingly, the htod program (which
converts C .h files to D import files) will convert bit fields to inline
D does come with htod, which converts C .h files to D files. It's not
possible to do a perfect job (because of macros), but it comes pretty
darned close. The reason htod gets so close is because it is actually a
real C compiler front end, not a perl or regex string processing hack.
Because it (may) require a little hand tweaking of the results (again,
because C headers may include awful things like:
#define BEGIN {
#define print printf(
You're not the only one <g>. But I'll add that access to the standard C
runtime library *is* a part of D, so at some level it can't be worse
than C. There's also another runtime library available, Tango, which is
There are a lot of people hard at work on D to make it more stable and
increase the breadth and depth of tools available. I am fully aware that
there may be non-technical issues to using D in a project like git, like
availability of other D programmers, tradition, etc., but in this thread
I'm concerned mainly with technical issues.
P.S. I'm also NOT suggesting that git be converted to D. Translating a
working, debugged, 80,000 line codebase from one language to another is
usually a fool's errand.
Thanks for taking the time to post your thoughts.
-----------
Walter Bright
http://www.digitalmars.com C, C++, D programming language compilers
http://www.astoriaseminar.com Extraordinary C++
-Well htod does that, but it's very impractical to write them from
scratch. Especially if you want to benefit from the fact that padding
and integer sizes are very well defined to map e.g. structs onto a raw
stream, avoiding deserialization and so on. And for that bit-fields are
a really really fast and simple way to describe things.
I mean, take your classical example of the foreach loop. Your whole
point is that it's way shorter, and safer. And now you are saying that
people should instead of sth like:
struct my_struct {
unsigned some_field : 2;
unsigned has_this_property : 1;
unsigned is_in_this_state : 1;
unsigned priority_level : 2;
...
}
people should write (IIRC it works since ->some_field =3D 2 calls
->some_field(2) if the member does not exists, or maybe it's
set_some_field, it's not very relevant anyway):
struct my_struct {
unsigned some_field() {
return this->real_field >> 30;
}
void some_field(unsigned value) {
this->real_field |=3D (value & 3) << 30;
}
...
private:
unsigned real_field;
}
Please it has to be a joke: there is 42 ways for people to write it
wrong (wrong shifts, wrong masks, and so on), it's horribly obfuscated,
hence needs a lot of comments, whereas the bitfield is 90% self
documented, and the syntax is _very_ clear, you cannot beat that. I
would be absolutely fine with it being syntactical sugar for some kind
of template call though.
Not to mention that the usual C idiom:
union {
unsigned flags;
struct {
// many bitfields
};
};
Would need an explicit copy_flags(const my_struct foo) function to
work. Not pretty, not straightforward.
Really, I feel this is a big lack, for a language that aims at
simplicity, conciseness _and_ correctness.
OK, maybe I'm biased, I work with networks protocols all day long, so
I often need bitfields, but still, a lot of people deal with network
La...True. I haven't tried yet (nobody else seems to care about it as much as I should point out that inline functions are inlined, and there is no I'm not following this. To copy a union, you just copy it with the assignment operator: U a, b; You're right on both counts. It's because htod is built out of a fork of the Digital Mars C compiler. Something similar could be done with gcc, but I'm not the person to do it. I should also get off my lazy tail and GDC was just released for D 1.020, which is behind D 1.021, but 1.021 And it's nice to hear your perspective, which is why I dropped by this thread. -
I know that, and that's why I said I was totally fine with the bitfield notation to be only syntactic sugar on a template thingy if That was the point indeed. But if you don't have bitfields, you can't do the union. And if the bitfield is just syntactic sugar, it may be Sure, but it does not works on amd64 properly (and it's the architecture I care about) and is not ready for the current gcc (4.2, only 4.1 builds) and so on. It's not as stable as DMD is. It does not lags too much version-wise, it lags in maturity. But well, youth has a cure: time :) --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Yes, and the more people use it, the better it will get. These are all environmental problems, not technical limitations of the language. -
Pierre Habouzit <madcoder@debian.org> writes: And strictly speaking, C bitfields are completely useless for that purpose since the compiler is free to use whatever method he wants for allocating bit fields. So if you want to write a portable program, you are back to making the masks yourself. Where bit fields work reliably is when you are not interchanging data with other applications, but just laying out your internals. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
The point is (1) D is not C, (2) we all know that linux e.g. does that in many places using the fact that it knows how the supported compilers (gcc icc tcc maybe some other) do their packing. The discussion is about D. D solves the infamous problem with longs not having the same size everywhere, I don't see why it couldn't solve Thank you for the _C_ lesson. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
In my opinion there is basically one area which C has botched up seriously in order to be useful as a general purpose language, and that is conflating pointers and arrays, and allowing pointer arithmetic. The consequences are absolutely awful with regard to compilers being able to optimize, and it is pretty much the primary reason that Fortran is still quite in use for numerical work. C has no usable two-dimensional (never mind higher dimensions) array concept that would allow passing multidimensional arrays of runtime-determined size into functions. Period. Add to that the pointer aliasing problems affecting compilers, and C is useless for serious portable readable numerical work. Fortran libraries like blas and lapack are ubiquitous after decades because the language can deal with multiple-dimension arrays sensibly, and could do so in the sixties already. C99 helps a bit. But messing around with restrict pointers and similar means that to wring equal performance out of some trivial code piece (or permitting the compiler to do so without having to take aliasing into account) is a lot of work and leads to ugly and inscrutable code. That's the one thing that has seriously hampered C: the lack of a true array type on its own, decoupled from pointers. It does not need to carry its dimensions with it or other hide-the-implementation-from-the-programmer niceties: C is, after all, a low-level language, and Fortran did not suffer from not having array dimensions packed into the arrays as well. But that's water down the drawbridge. This single major deficiency is not anything that would hamper git development. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
I agree. It's one of those things that probably sounded like a good idea at the time. The consequences were not foreseen. All languages have a few of these (C++ has the infamous use of < > for template arguments). -
A design is perfect not when there is no longer anything you can add to it, but if there is no longer anything you can take away. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
El 7/9/2007, a las 9:40, David Kastrup escribi
I like to phrase that a slightly different way: anyone can make
something complicated, but it takes genius to make something simple.
A very big goal for D is to make what should be simple code, simple. It
turns out that what's simple for a computer is complex for a human. So
to design a language that is simple for programmers is (unfortunately) a
rather complex problem. Or perhaps I'm just not smart enough <g>.
A canonical example is that of a loop. Consider a simple C loop over an
array:
void foo(int array[10])
{
for (int i = 0; i < 10; i++)
{ int value = array[i];
... do something ...
}
}
It's simple, but it has a lot of problems:
1) i should be size_t, not int
2) array is not checked for overflow
3) 10 may not be the actual array dimension
4) may be more efficient to step through the array with pointers, rather
than indices
5) type of array may change, but the type of value may not get updated
6) crashes if array is NULL
7) only works with arrays and pointers
Since this thread is talking about C++, let's look at the C++ version:
void foo(std::vector<int> array)
{
for (std::vector<int>::const_iterator
i = array.begin();
i != array.end();
i++)
{
int value = *i;
... do something ...
}
}
It has fewer latent bugs, but still:
1) type of array may change, but the type of value may not get updated
2) too darned much typing
3) it's more complicated, not simpler
Frankly, I don't want to write loops that way. I want to write them like
this:
void foo(int[] array)
{
foreach (value; array)
{
... do something ...
}
}
As a programmer, I'm specifying exactly what I want to happen without
much extra puffery. It's less typing, simpler, and more resistant to bugs.
1) correct loop index type is selected based on the type of array
2) arrays carry with them their dimension, so foreach is guaranteed to
step through the loop the correct number ...Wrong. size_t is for holding the size of memory objects in bytes, not in terms of indices. For indices, the best variable is of the same type as the declared index maximum size, so here it is typeof(10), No. It is a beginners' and advanced users' mistake to think using pointers for access is a good idea. Trivial optimizations are what a compiler is best at, not the user. Using pointer manipulation will more often than not break loop unrolling, loop reversal, strength Most of those are toy concerns. They prevent problems that don't actually occur much in practice. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
The easiest way to show the error is consider the code being ported to a typical 64 bit C compiler. int's are still 32 bits, yet the array can be larger than 32 bits. You're right in that what we want to be able to do is typeof(array dimension), but there is no way to do that automatically in C, which is my point. If the array dimension changes, you have to carefully check to make sure every loop dependency on the type is updated, too. size_t will always work, however, making it a better choice than int, at Because the 10 array dimension is not statically checked in C. I could pass it a pointer to 3 ints without the compiler complaining. This makes it a potential maintenance problem. Also, the maintenance programmer may change the array dimension in the function signature, but overlook Array buffer overflow errors are commonplace in C, because array dimensions are not automatically checked at either compile or run time. This is an expensive problem. Some C APIs try to deal with this by passing a second argument for arrays giving the dimension (snprintf, for example), but this tends to be sporadic, not conventional. It being C compilers vary widely in the optimizations they'll do for simple loops. I see often enough attempts by programmers to take such matters into their own hands. I agree with you on that - and suggest the Let's say our fearless maintenance programmer decides to make it an array of longs, not an array of ints. He overlooks changing the type of value in the loop. Suddenly, things subtly break because of overflows. Or maybe he changed the int to an unsigned, now the divides in the loop give different answers. Etc. There really isn't any compiler/language I consider an array that is NULL to have no members, so instead of C has structs, too, as well as more complicated user defined collections. Essentially, you cannot (simply) write generic algorithms in C, because you cannot (simply) generically express iteration. Of course, yo...
Not to mention try finding two C++ compilers that support the same language features. C is a known quantity. C++ depends on whos compiler you use and what class libraries you use. Trying to make those things work crossplatform is not an easy task. (Harder than it is in C at least.) A number of years ago, a programmer who will not be named (and is not me), tried to port Perl to C++. It was a disaster. He found that every compiler handled something differently. If you stuck to one compiler, it might work. But trying to get GCC to work like MS C++ or Borland C++ or whatever is just asking for pain. -- Refrigerator Rule #1: If you don't remember when you bought it, Don't eat it. -
As I said, it's a matter of believes. As such, any reasoning and arguing will be endless and pointless, as for any other religious I'll give you reasons why to use C++ for Git (not why C++ is better for any project in general, as that again would be pointless): 1. Good String class will make code much more readable (and significantly shorter) 2. Good Buffer class - same reason 3. Smart pointers and smart handles to manage memory and file/socket/lock handles. As it is right now, it's too hard to see the high-level logic thru IMHO Git has a brilliant high-level design (object database, using hashes, simple and accessible storage for data and metadata). Kudos to you! The implementation: a mixture of C and shell scripts, command line I don't see myself comparing assembler to C anywhere. I was pointing out that I've been programming in different languages (many more actually) and observed bad developers writing bad code in all of them. So this quality "bad developer" is actually language-agnostic :-). -- - Dmitry -
But all of those are incompatible with another and require major headaches and/or interface code to get to run with one another. And then might use different interface styles, anyway. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Total BS. The string/memory management is not at all relevant. Look at the The only really important part is the *design*. The fact that some of it is in a "prototyping language" is exactly because it wasn't the core parts, and it's slowly getting replaced. C++ would in *no* way have been able to replace the shell scripts or perl parts. You made a very clear "assembler -> C -> C++/C#" progression nin your life, comparing my staying with C as a "dinosaur", as if it was some inescapable evolution towards a better/more modern language. With zero basis for it, since in many ways C is much superior to C++ (and even more so C#) in both its portability and in its availability of You can write bad code in any language. However, some languages, and especially some *mental* baggages that go with them are bad. The very fact that you come in as a newbie, point to some absolutely *trivial* patches, and use that as an argument for a language that the original author doesn't like, is a sign of you being a person who should be disabused on any idiotic notions as soon as possible. The things that actually *matter* for core git code is things like writing your own object allocator to make the footprint be as small as possible in order to be able to keep track of object flags for a million objects efficiently. It's writing a parser for the tree objects that is basically fairly optimal, because there *is* no abstraction. Absolutely all of it is at the raw memory byte level. Can those kinds of things be written in other languages than C? Sure. But they can *not* be written by people who think the "high-level" capabilities of C++ string handling somehow matter. The fact is, that is *exactly* the kinds of things that C excels at. Not just as a language, but as a required *mentality*. One of the great strengths of C is that it doesn't make you think of your program as anything high-level. It's what makes you apparently prefer other languages, but the thing...
Hi, There is an important additional point: a language like C _holds_ you to a certain degree of diligence. In my day-job I have to code in other languages, which make it "easy" to code. As a result, the code I have to work with is sloppy, ugly and buggy. By applying the same principles I am _forced_ to use in C, with Git, I produce better code. Ciao, Dscho -
Not only have I looked at the code, I've also debugged it quite a bit. Granted most of my problems had to do with handling paths on Windows (i.e. string manipulations). ... and explain where I'm coming from: My goal is to *use* Git. When something does not work *for me* I want to be able to fix it (and contribute the fix) in *shortest time possible* and with *minimal efforts*. As for me it's a diversion from my main activities. The fact that Git is written in C does not really contribute to that goal. Suggestion to use C++ is the only alternative with existing C codebase. So while C++ may not be the best choice "academically speaking" it's pretty much the only practical choice. "Democracy is the worst form of government except for all those others that have been tried." - Winston Churchill Now, I realize that I'm a very infrequent contributor to Git, but I want my opinion to be heard. People who carry the main weight of developing and maintaining Git should make the call. -- - Dmitry -
Hi, We are a happy little meritocracy here. Once you proved that you're not full of shit (some seem to try the opposite, you know who you are), you can go all caps. Before that, you'll have to show that you earn to be heard first. Ciao, Dscho -
Sorry, but for fixing things in C, I can look and work locally. For fixing things in C++, I first need to understand the class hierarchies used in the project. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum -
Coupled with what you said in an earlier mail, namely ---%<---%<--- Considering C appeared in 1972, and C++ appeared in 1985, you have been writing C code for 13 years. And you're telling me that git being written in C prevents you from contributing? If you want to do something useful in C++ for git, make it easy for C++ They already have, but every now and then someone comes along and suggest a complete rewrite in some other language. So far we've had Java (there's always one...), Python and now C++. It happens to all projects, sooner or later. The funny thing is that all those people that want their favourite software to be rewritten in their favourite programming language always wants someone else to rewrite it for them. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Since this "complete rewrite" was mentioned in multiple emails I'd like to rectify that: What I'm offering (for Git) is to use C++ as a "better C". Don't change any existing *working* code, but start introducing simple C++ constructs in the new code. Git is simple enough to not require any high-level abstractions. But some utility classes could make code much simpler. And BTW, I don't even like C++ that much :-), I just like it much better than C. I've been saying that C++ is a legacy language for quite some time now. But we will use it for many years to come because the size of this legacy code is huge, so there will be plenty of C++ developers available (to contribute to Git :-). And C++ is the only way to move with existing C codebase. -- - Dmitry -
There are far too many highly valuable contributors that have spoken against C++ for me to believe that C++ and C will ever co-exist in the official git repo. Good thing utility classes can be developed on top of the existing C-code, but in a separate repo, and packed into a library. That way, you get some hacking ground for your beloved C++ coderswhile the current git contributors can keep contributing in the The C code base is a lot larger and C++ will drop dead pretty fast if it's Complete and utter BS. It can also stay in C, or get language bindings for Python/Perl/PHP/LUA(?)/whatever, or both. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
