During the GitTogether we were kicking around the idea of a ground-up implementation of a Git library. This may be easier than trying to grind down git.git into a library, as we aren't tied to any of the current global state baggage or the current die() based error handling. I've started an _extremely_ rough draft. The code compiles into a libgit.a but it doesn't even implement what it describes in the API, let alone a working Git implementation. Really what I'm trying to incite here is some discussion on what the API looks like. API Docs: http://www.spearce.org/projects/scm/libgit2/apidocs/html/modules.html Source Code Clone URL: http://www.spearce.org/projects/scm/libgit2/libgit2.git -- Shawn. --
This 404's for me - Pieter --
Nevermind, it's only a clone url.. I'd expected a gitweb or so ;) Sorry for the noise, --
--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
I know this isn't actually helping a lot to define the real APIs, but we
should really not repeat current git mistakes and have a really uniform
APIs, meaning that first we must decide:
* proper namespacing (e.g. OBJ_* looks like failure to me, it's a way
too common prefix);
* proper public "stuff" naming (I e.g. realy like types names -- not
struct or enum tags, that I don't really care -- ending with _t as
it helps navigating source.
* ...
And write that down _first_. It's not a lot of work, but it must be
done. Working on a library really asks us to create something coherent
for our users.
Second, if we want this to be a successful stuff, we all agree we must
let git be able to use it medium term. That means that when git-core is
experimenting with new interfaces, it will probably need to hook into
some more internal aspects of the library. This is a problem to solve
elegantly, linking git against a static library won't please a lot of
vendors and linux distributions, and exporting "private" symbols is a
sure way to see them being abused.
Last but not least, I believe parts of git-core are currently easy to
just take. For example, any code *I* wrote, I hereby give permission to
relicense it in any of the following licenses: BSD-like, MIT-like,
WTFPL.
For example, on parse-options.c, git blame yields:
git blame -C -C -M parse-options.c|cut -d\( -f2|cut -d2 -f1|sort|uniq -c
16 Alex Riesen
6 Jeff King
47 Johannes Schindelin
12 Junio C Hamano
19 Michele Ballabio
1 Nanako Shiraishi
1 Olivier Marin
395 Pierre Habouzit
Okay, arguably parse-options.c in libgit quite doesn't makes sense
(though it can help bringing some kind of uniformity to other git
ecosystem tools built on libgit but that's not the point I'm trying to
make), I'm sure this kind of pattern where it's likely to be ...How about this? Private symbols are a problem. On some systems we can use link editing to strip them out of the .so, but that isn't always going to work everywhere. I've outlined the idea of using double underscore to name private functions, and we can link-edit out '*__*' if the Yea. We could try to do that. I don't know how far it will get us, but if we have to "steal" code we can rip a good part from JGit. Its BSD-like, but has that "icky Java smell" to it. :-) Before worrying about where we get implementation bits from I'm more interested in trying to get a consistent view of what our namespace looks like, and what our calling conventions are, so we have some sort of benchmark to measure APIs against as we add them to the implementation. -- Shawn. --
Well, I propose the following: we set-up on GNU-ld + gcc enabled systems all what is needed to use symbol visibility, which isn't that intrusive, and also rather easy given your GIT_EXPORT macro definition. This way, people who care about portability across all libgit2 supported platforms will have to align on the lowest common denominator, which will not have any kind of private stuff available, so we're safe. And GCC/GNU-ld enabled platforms cover most of the popular platforms (namely linux and *BSD, I'm not sure about Macos dynlibs). Even win32 has kind of what you need to do visibility I think. IOW prehistoric systems will be have to cope with that because of Linux (yeah, this is kind of deliciously backwards ;p). No, my worry was rather wrt git core itself, I really think we _must_ make it link against libgit2 if we want libgit2 to stay current, but git core will _very likely_ need the private stuff, and it _will_ be a problem. I mean we cannot seriously so-name a library and show its guts at the same time, and I'm unsure how to fix that problem. _that_ was my I'd say we should do both at the same time. Asking people if they would agree to relicense code can be done in parallel. We could extract a list of source files that we may need (my extraction included stuff that is very unlikely to be useful like test-*.c that aren't useful, and some that are already BSD I think), and see who it yields. It should be possible to do a matrix source-file x people and see on a per-file basis what they think. If someone gives me the list of files we should consider (I'm not sure about a good list right now) I could do the matrix at some fixed sha1 =66rom git.git using git blame -C -M -M -w, and ask people see where it leads us ? --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Yet more updates to the API: http://www.spearce.org/projects/scm/libgit2/apidocs/html/modules.html In particular I've started to define what revision machinary might look like, based somewhat on JGit's (public) RevWalk API. The guts of how to make it work of course aren't yet defined. My goal here is to have the git_revp_t be non-thread safe, but also to contain the entire object pool, so git_revp_free() will dispose of any and all git_commit_t's which were created from it. This "fixes" the "we leak everything" behavior and allows applications to have multiple pools on different threads if it needs to. Thus the core git_revp_t isn't thread-safe and doesn't get weighed down by locking if the application wants multiple threads. Yes, agreed. Only I don't know how to do it myself. I know its Hmmph. I agree git-core needs to link to libgit2. I disagree it needs private bits. If we do the library right git-core can use the public API. And where it cannot its either not something that is "right" for libgit2 (e.g. its parseopts and we aren't committed yet to including option parsing) or its highly experimental and we shouldn't put it into the library (and thus also git-core) until its more frozen. That said, I don't think its criminal to have git-core include and link to a static libgit2, especially if git-core's usage of the library is ahead of what the library itself is able to expose at Off the top of my head some really important ones: diff-delta.c object.c patch-delta.c refs.c revision.c sha1_file.c sha1_name.c They form a pretty large part of the guts of what most people want from a git library. Slightly less important, but still fairly core: builtin-fetch-pack.c builtin-send-pack.c connect.c remote.c transport.c Is most of the client side of the git:// transport, something people want. -- Shawn. --
Note that on Linux (and also BSDs I think) many pthread locking functions have stubs in the glibc and cost 0 until you use the libpthread. So unless we need to use locking inside the tight-loop, it's virtually free to write a thread-safe library (a function call is basically 10 times cheaper than an xchg-based lock). (Though for git it would suck as we get libpthread in our depends through some of the other dependencies. Another way is to use function pointers for the locking, and have a function to make the object store thread safe by setting the pointers to functions actually doing locking, and to let it point to functions doing nothing else. It looks unrealistic to me to let people deal with the locking, especially if we mean this library to _also_ be used in language bindings, hence used in sloppily written scripts. But of course, if locking calls are used in the tight loop that would I will, basically, you need to build everything with -fvisibilty=3Dhidden in your CFLAGS, and mark the prototypes of symbols you want to export with __attribute__((visibility("default"))) (that you can set into your EXPORT_GIT macro when you're building with __GNUC__). You don't even need a linker script (unless we're going to do some Sure, I didn't mean we have to put it in libgit2, it's highly UI related and other tool may want to use something else, and it makes no sense for many languages that have their library already (python, perl do e.g.). Okay I'll let people mention what they would like to see too, and I'll work from that then. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Ugh. You could talk me into it if you promise never typedef structures (or pointer to structures) with such symbols, I guess. --
I should write that one down in CONVENTIONS.
IMHO:
typedef uint32_t uid_t; /* sane */
typedef enum {...} status_t; /* sane */
typedef struct foo_t foo_t; /* sane */
typedef struct {...} foo_t; /* borderline insane */
typedef char* str_t; /* totally nuts */
typedef char**** str_pppp_t; /* totally nuts */
Hiding the fact that scalar types like a uid_t are 32 bits on
this system is reasonable. Heck, uid_t is already in POSIX,
we shouldn't fight that sort of idea. It at least improves
documentation somewhat.
Hiding the fact that some scalar type is an enum, so you don't have
to type "enum blah" everywhere is also reasonable. Its slightly
better than #define some magic constants and passing an int
everywhere. Its a reasonable balance between reducing keystrokes
and keeping the code semi-self-documenting.
Hiding the fact that an opaque struct (or union) you cannot ever
see the members of is a struct or union is good API design. You can
later change the major class from struct to union or back, or totally
redefine it, but the caller never needs to know what is going on.
Hiding a pointer is wrong. Callers should know they are getting a
pointer, or are being asked to supply a pointer-to-a-pointer. So the
"FILE*" stdio functions are sane, because we don't know what is under
a FILE type but we do know when we are dealing with a pointer to one.
My original proposal didn't stick _t onto the end of everything,
because I didn't think it was really necessary. I'm fine with it
either way. It may be better to include the _t suffix, it seems
to be somewhat common in libraries.
--
Shawn.
--
FWIW I've read what you say about types, while this is good design to
make things abstract, accessors are slower _and_ disallow many
optimizations as it's a function call and that it may clobber all your
pointers values.
For types that _will_ be in the tight loops, we must make the types
explicit or it'll bite us hard performance-wise. I'm thinking what is
"struct object" or "struct commit" in git.git. It's likely that we will
loose a *lot* of those types are opaque.
struct object in git has not changed since 2006.06. struct commit hasn't
since 2005.04 if you ignore { unsigned int indegree; void *util; } that
if I'm correct are annotations, and is a problem we (I think) have to
address differently anyways (I gave my proposal on this, I'm eager to
hear about what other think on the subject). So if in git.git that _is_
a moving target we have had a 2 year old implementation for those types,
it's that they're pretty well like this.
It's IMNSHO on the matter that core structures of git _will_ have to be
made explicit. I'm thinking objects and their "subtypes" (commits,
trees, blobs). Maybe a couple of things on the same vein.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Accessors are very nifty for one thing though; With a debugging flag, you can use an accessor-function, while without that debugging flag you can use a macro instead of a function. In other words, you use the compiler as a sort of sanity-checker that you're only accessing the variables through the proper macros. This method introduces a bit of extra code (50% of which is always dead) for each struct it's used on, but it makes debugging large-ish pieces of software relatively simple, since access to all object types The last sentence doesn't parse. I assume you mean "if those types are..", in which case it'll be solved by using accessor-macros and forward-declaring I agree. "git_commit", "git_tree", "git_blob" and "git_tag" can almost certainly be set in stone straight away. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
This was a typo, indeed s/of/if/ --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
True, accessors slow things down. But I'm not sure that the accessors at the application level are going to be a huge problem. Where the CPU time really matters is inside the tight loops of the library, where we can expose the struct to ourselves, because if the layout changes we'd be relinking the library anyway with the updated object code. I would rather stick with accessors right now. We could in the future expose the structs and convert the accessors to macros or inline functions in a future version of the ABI if performance is really shown to be a problem here from the *application*. Remember we are mostly talking about applications that are happy to fork+exec git right now. A little accessor function call is *still* Eh, I disagree here. In git.git today "struct commit" exposes its buffer with the canonical commit encoding. Having that visible wrecks what Nico and I were thinking about doing with pack v4 and encoding commits in a non-canonical format when stored in packs. Ditto with trees. Because git.git code goes against that canonical buffer we cannot easily insert pack v4 and test the improvements we want to make. The refactoring required is one of the reasons we haven't done pack v4 yet. _IF_ we really are going through this effort of building a different API and shifting to its use in git.git I want to make sure we at least initially leave the door open to make changes without rewriting everything *again*. Accessor functions can usually be inlined or macro'd away. But they cannot be magically inserted by the compiler if they aren't there in the first place. This isn't Python... :-) -- Shawn. --
Err... isn't that backwards? Surely you want to store stuff in the
canonical format so you're forced to do as few translations as
possible? Or are you trying to speed up packing by skipping the
canonicalization part? If so, that would slow down reading (or
Well, if macro usage is adhered to one wouldn't have to worry,
since the macro can just be rewritten with a function later (if,
for example, translation or some such happens to be required).
Older code linking to a newer library would work (assuming the
size of the commit object doesn't change anyway), but newer code
linking to an older library would not. Otoh, they wouldn't even
build unless they used the wrong header files, so there is nothing
What I meant was this (I'm a tad drunk, so read the spirit, not
the letter):
in "foo-api.h":
--%<--%<--
#ifdef BUILDING_FOR_DEPLOYING
#include "git_foo_decls.h"
# define git_foo_get_buf(git_foo) (git_foo->buf)
#else
#include "git_foo_fwd_decls.h"
extern const char *git_foo_get_buf(git_foo *foo);
#endif
--%<--%<--
foo.c
--%<--%<--
#include "foo-api.h"
#include "git_foo_decls.h"
#ifndef BUILDING_FOR_DEPLOYING
const char git_foo_get_buf(git_foo *foo)
{
return foo->buf;
}
/* other accessors go here */
#endif
/* rest of git_foo manipulators go here */
--%<--%<--
It's almost certainly not worth it for libgit2 though, as git@vger
provides a good review system.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
No. We suspect that canonical format is harder to decompress and parse during revision traversal. Other encodings in the pack file may produce much faster runtime performance, and reduce page faults (due to smaller pack sizes). We hardly ever use the canonical format for actual output; most output rips the canonical format apart and then formats the data the way it was requested. If we have the data *already* parsed in Wrong; we're trying to speed up reading. Packing may go slower, especially during the first conversion of v2->v4 for any given repository, but packing is infrequent so the minor (if any) drop You are assuming too much magic. If the older ABI used a macro and the newer one (which supports pack v4) organized struct commit differently and the user upgrades libgit2.so the older applications just broke, horribly. We know we want to do pack v4 in the near future. Or at least experiment with it and see if it works. If it does, we don't want to have to cause a major ABI breakage across all those newly installed libgit2s... yikes. I'm really in favor of accessor functions for the first version of the library. They can always be converted to macros once someone shows that their git visualizer program saves 10 ms on a 8,000 ms render operation by avoiding accessor functions. I'd rather spend our brain cycles optimizing the runtime and the in-core data so we spend less time in our tight revision traversal loops. Seriously. We make at least 10 or 11 function calls *per commit* that comes out of get_revision(). If the formatting application is really suffering from its 4 or 5 accessor function calls in order to get that returned data, we probably should also be looking at how we can avoid function cals in the library. Oh, and even with 4 or 5 accessor functions per commit in the application that is *still* better than the 10 or so calls the application probably makes today scraping "git log --format=raw" off a pipe and segment it into the different ...
I'll have to look into the pack v4 stuff, as I can't get what you're saying to make sense to me. Not canonicalizing the data when storing it means you'll have to have conversion routines from all the various encodings, unless you first canonicalize it and then encode it in the way you need it in the pack way. Never using canonical format means it has no potential for combinatorial explosion, and every converter Naturally. Re-sizing objects that weren't previously protected by accessor functions will always break the ABI unless the layout of the previously existing items in the object doesn't change, which That's not necessary, although it requires a bit more thought when I agree that for early versions they should definitely be functions, but since we've been talking about "final design" earlier in the All comps where I use git today are multi-core systems, so I'm very much in favour of parallell processing. Otoh, I also don't think it makes sense to jump through hoops to make sure one can, fe, read and parse all tags in multiple threads simultaneously. In other words, I don't think we need to bother adding thread-safety for stuff that really only should happen if the application is poorly designed (unless it's straightforward to do so, ofcourse). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Yea, optimizing C is a bitch. I'm in favor of accessors *IN THE APPLICATION*. Within the library's own C code, I think we should expose the struct, and use its members where it makes sense to. Especially in the really tight loops where we don't want to introduce more overhead. My rationale here is that we can change the struct at any time, Yes, but I'm arguing they should be opaque to the application, and visible to the library. Today the application is suffering from massive fork+exec overhead. I really don't give a damn if the application's compiler has to deal with a function call to read from a private member of an opaque type. Its still thousands of CPU instructions less per operation. Come back to me a year after libgit2 has been widely deployed on Linux distros and we have multiple applications linking to it. Lets talk then about the harmful performance problems caused by making these types opaque to the application. About that time we'll also be talking about how great pack v4 is and why its a good thing those types were opaque, as we didn't have to break the ABI Sure, but in the library only. -- Shawn. --
The problem is not the function call, it's really cheap on modern CPUs. The problem is that across a function call the compiler cannot make some kind of optimizations and _that_ isn't cheap. For example, pointers whose value have been loaded from the store into a register have to be loaded again. A function call trashes all the Well I fear it'll be more than a few percent harm, and I can already see Linus argue against the switch to libgit2 for git-core because it lost 2% performances ;P --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
First........... is there really a need to re-license it? Depends. Sure, I gave permission to copy some of my code for JGIT because 1) JGIT is Java code in which I have little interest, 2) the different license was justified by the nature of the JGIT project, and 3) although no license convey this I asked for the C version of git to remain the authoritative reference and that any improvements done to JGIT first be usable in the C version under the GPL. Of course a library might need a different license than the GPL to be widely useful from a linkage stand point, but the code within that library does not need to be miles away from the GPL. What I personally care about is for improvements to my code to always be contributed back, which pretty much discards BSD-like licenses. My favorite license for a library is the GPL with the gcc exception, i.e. what libraries coming with gcc are using. They're GPL but with an exception allowing them to be linked with anything. And because everything on a Linux system, including proprietary applications, is likely linked against those gcc libs, then there is nothing that would prevent libgit to be linked against anything as well. But the library code itself has GPL protection. For reference, here's the exception text: In addition to the permissions in the GNU General Public License, the Free Software Foundation gives you unlimited permission to link the compiled version of this file into combinations with other programs, and to distribute those combinations without any restriction coming from the use of this file. (The General Public License restrictions do apply in other respects; for example, they cover modification of the file, and distribution when not linked into a combine executable.) Nicolas --
at the very least you should go from GPLv2 to LGPLv2 for the library. while it can be argued that this really shouldn't be nessasary, the water is muddy enough that it would be a very good thing to do this. I don't see any need to switch to a BSD/MIT/etc license for a library, the <shrug>, I don't see why this is needed with the LGPL, but I'm not a lawyer. David Lang --
The LGPL also asks that proprietary applications provides necessary object files so you can link it against an alternative implementation of the LGPL library if you so wish. With dynamic libraries this is rather moot but I think that's the main difference. Nicolas --
Some people want to be able to link the library into an application that they redistribute binaries of, but not sources to. Those folks have also volunteered to help write the library. If they put their code where their mouth is, then I think they should be able to use their code the way they want to. That said, I think the license choice that makes the most sense here is probably LGPL or GPL+gcc exception, like you note below. Well, we cannot do a GPL->LGPL switch on code without author permission for that sort of re-licensing. That said, I think many authors of git.git code would be more comfortable with a GPL->LGPL change, where they wouldn't be OK with a GPL->BSD/MIT change. There may be some folks though who still I'm happy with either the LGPL or the GPL+exception above. If I read these correctly the GPL+exception allows one to distribute static executables without source or object files, so long as the library source wasn't modified. I'd almost prefer just using the standard LGPL then, static linking isn't very common anymore. -- Shawn. --
because of this paragraph. Like Pierre I also prefer a BSD style license, and JGit is under that, as it offers quite a bit of freedom for the consumer of the code. I'm also not too worried about not getting changes back. If someone forks away from the base project and doesn't contribute back, that's their problem. So long as the base project has sufficient momentum under it making changes and improving things, everyone else will want to pull and either face merge-hell once in a while, or send changes back upstream to avoid merge-hell. But I doubt Git regulars share our views on this, and I think most of the major contributors to git.git have stated multiple times that they prefer a GPL style license on their code. I want those people to contribute to libgit2 (assuming the project moves past the pie-in-the-sky theory stage), so I want the license to be something they will be comfortable with. -- Shawn. --
FWIW, I dislike the LGPL for many reasons, and I prefer 100x times a GPL with GCC kind of exceptions. But if other people hate it, I wont be be a problem and refuse it. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Hmm yeah, GPL has the viral link issue, and there is a real use of embedding the libgit in many projects. There is a use for it: front-ends to git from editors, for closed-source projects of various sorts, ... All of those right now must do a reimplementation of what they need. As soon as they need some kind of performance, exec()ing git-* commands doesn't fly. Git is currently mostly "GPLv2 or later". A BSDish license was mentioned, because it's the most permissive one and that nobody cared that much, though a LGPL/GPL-with-GCC-exception would probably fly. Many of the people needing a library for libgit are probably reading the list, I'll let them comment. The kind of license you propose would totally suite my needs, and I think, most of the one discussed at GitTogether'08 (except for the eclipse people disliking GPL'ed stuff, but anyways there was the issue of C code being non pure java anyways, so maybe Shawn can comment on that bit, I don't recall the exact specifics I must reckon). OT: FWIW I prefer BSDish licenses (even the MIT actually) for libraries because I believe that computing is overall better if everyone can use the right tool for the task, and I don't want to prevent people from using good stuff (I hope I write good stuff ;P) because of the license. And I don't care about people don't giving back to me, those are not the kind of people who would have given back if it was GPL'ed anyways. But I understand this is a completely personal view, and I'm not even trying to persuade you :) --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
I do care. I think the BSD license is too permissive. There are really nifty pieces of code in Git that I would be really sorry to see go Eclipse is Java and that issue is already solved with JGIT which doesn't Everybody can and does link against glibc on Linux which is LGPL. So Sure, and that's where we differ. I let you use my code for free, as long as you give me back your improvements to it. This way everybody stays honnest. I think this is Linus' view as well which he often resume as "tit for tat". Nicolas --
I agree with Nicolas, for what it's worth. Although I realize my small contributions to the git code can be trivially rewritten, I think that For dynamic libraries, yes, as you can replace them in-flight if you need to. For static libraries, it's a different matter. I think the gcc exception is rather special, as parts of gcc is the dynamic linker initialization code, which has to be shipped in object form and included in all executables produced with gcc. Perhaps we'd be better off asking one of EFF's lawyers what each license means, exactly. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
As an implementor of a git GUI, I don't really care what license libgit2 will be. GitX is currently GPLv2, though that might change to LGPL to allow it to ship/link with non-GPL libraries. GPL and LGPL would both suit me. As a more concrete comment, is there anything you would like to hear from GUI developers during the development of libgit2? I'm not sure I can contribute much in terms of code, but if you need any constructive comments, I can help with that. - Pieter --
We'd like to hear feedback on the API. Look at the operations your application does with git-core. Can you express that with the libgit2 API? If not, why not? Is it just that the docs are unclear? Is the API missing? What would you like to invoke to get the data you need? ... You are the end-user of the library, so it needs to suit you. Ok, you aren't the only end-user, but you and other developers like you... :-) -- Shawn. --
I will be the end-user of the library because if we want libgit2 to be anywhere close to successful, you should be able to port C-git to it. I understand that the apidocs/ is a very early work-in-progress, but still, it bothers me that it is unclear to me what lifetime rules are in effect on the in-core objects. For example, in C-git, commit objects are not just parsed but are modified in place as history is traversed (e.g. their flags smudged and their parents simplified). You have "flags" field in commit, which implies to me that the design shares this same "modified by processing in-place" assumption. It is great for processing efficiency as long as you are a "run once and let exit(3) clean-up" type of program, but is quite problematic otherwise. commit.flags that C-git uses for traversal marker purposes, together with "who are parents and children of this commit", should probably be kept inside traversal module, if you want to make this truly reusable. By the way, I hate git_result_t. That should be "int", the most natural integral type on the platform. --
I don't think it's impossible to have something efficient without this kind of hacks. You just need to dissociate the objects from their annotations, though use some kind of allocator that allow numbering of the objects, and use that number as a lookup in an array of annotations. It will require pool allocators for the annotations, but that should I concur. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Interesting approach. I don't know why I didn't think of that one. You'll still need to be able to toss parts of the git graph though. If you just pin everything in memory under a single global object table you'll run server processes out of memory as they chug through int it is then. -- Shawn. --
Sure, but for that you just need to reinject the numbers into some kind of free list (hint a bitmap) to reuse old slots. Of course this is some kind of take-once never-release approach _BUT_ one can do better and "defrag" this at times. E.g. for a server if we take your idea, there are some times we probably *know* nobody has kept a reference to one of the pointers and we can reorganize some pointers around to free chunks of data not allocated. An escape way is to use mmap + madvise for those pools, the former to allocate the memory and the latter to drop large unused ranges when needed (remapping with MAP_FIXED is also supposed to work but fails on Macos it seems). Even win32 has what it takes to do so (just skim through the code of jemalloc, that is used for mozilla, it's quite portable on POSIX + Windows). I was thinking that one should create and register pools of annotations as such, define the size of the annotation (IOW the size of cell), and let deal with that. I imagine the stuff as some kind of allocator that would allocate the first time e.g. 4k of objects, and if you need more 8k, and if need more 16k and so on exponentially. Mapping an integer to a cell in this kind of ropes is _really_ efficient: you need to know the first bit set (__builtin_clz is of help with gcc, it can be emulated with proper asm for many non-gcc platforms also) to give you the number of the rope component that you intend to address, you clear that bit, the resulting number is the index in that rope component of the cell you're interested in. On most machines such an operation would be a few cycles, which should be fairly little wrt the operation you would do during a traversal. Maintaining the bitmap is important, so that we can give back large chunks of physical memory back to the system, and many malloc implementations would likely use such kind of implementations, we would just do our (which is arguably heavy) but gives us control on the cell number, which is just what we ...
Its not efficient to keep this data inside of the "traversal module"
instance (aka what I called git_revp_t). You really want it inside
of the commit itself (aka git_commit_t).
My thought here is that git_commit_t's are scoped within a given
git_revp_t that was used when they were parsed. That is:
git_revp_t *pool_a = git_revp_alloc(db, NULL);
git_revp_t *pool_b = git_revp_alloc(db, NULL);
git_oid_t id;
git_commit_t *commit_a, *commit_b, *commit_c;
git_oid_mkstr(&id, "3c223b36af9cace4f802a855fbb588b1dccf0648");
commit_a = git_commit_parse(pool_a, &id);
commit_b = git_commit_parse(pool_b, &id);
commit_c = git_commit_parse(pool_a, &id);
if (commit_a == commit_b)
die("the world just exploded");
else
printf("this was correct behavior\n");
if (commit_a == commit_c)
printf("this was correct behavior\n");
else
die("the hash table is broken");
To completely different git_revp_t's on the same database yeild
different commit pointers, but successive calls to parse the same
commit in the same pool yield the same pointer.
Certain operations on the pool can cause it to alter its state
in a way that cannot be reversed (e.g. rewrite parents). In such
cases the caller should free the pool and alloc a new one in order
to issue new traversals against the same object database, but with
Yea, I'm torn on git_result_t myself. Some library APIs use their
own result type, but as a typedef off int.
I'm tempted to stick with int for the result type, but I don't
want readers to confuse our result type of 0 == success, <0 ==
failure with some case where we return a signed integral value as
a result of a computation.
I'm also debating the error handling. Do we return the error
code as the return value from the function, or do we stick it into
some sort of thread-global like classic "errno", or do we ask the
application to pass in a structure to us?
E.g.:
Return code:
git_result_t r = git_foo_bar(...);
if (r < 0)
...Passing a structure pointer for errors is adding overhead to the API in all cases, please don't do that. Both the negative code and errno style are lightweight in the common "no error" case. The errno style is probably more handy for those functions returning a pointer which should be NULL in the error case. Nicolas --
Sometimes, but yea, its not that often. I think we've already settled that it shall be "int". I'm updating some stuff like that (and dropping _t) as I write I'm sticking with return a negative code for now, to the extent that some functions which return a pointer but also have many common failure modes (e.g. git_odb_open) use an output parameter as their first arg, so the error code can be returned as the result of the function. -- Shawn. --
Actually, the pointer-returning functions can encode error cases into a "negative" pointer. See include/linux/err.h for example. void *ptr = git_alloc_foo(...); if (IS_ERR(ptr)) die("git_alloc_foo failed: %s", git_strerr(PTR_ERR(ptr))); Nicolas --
Oh, good point. We could also stagger the errors so they are always odd, and never return an odd-alignment pointer from a successful function. Thus IS_ERR can be written as: #define IS_ERR(ptr) (((intptr_t)(ptr)) & 1) which is quite cheap, and given the (probably required anyway) aligned allocation policy means we still have 2^31 possible error codes available. -- Shawn. --
Hi,
Unfortunately, errno would not be thread-safe, unless you can guarantee
Oh boy, both solutions are ugly as hell. Although the &1 method does not
limit the memory space as much (except if you plan to work in
space-contrained environments, where you do not want to be forced to
word-align structs).
The only pointer game I would remotely consider clean is if you had
const char *errors[] = {
...
};
inline int is_error(void *ptr) {
return ptr >= errors && ptr < errors + ARRAY_SIZE(errors);
}
although that would be less performant.
Ciao,
Dscho
--
Well, TLS afaict is implemented on arches that have GNU ld, or on win32 Well, you can't return _sanely_ an error through a pointer. The &1 method is broken as soon as you return a char* (there is an alignment requirement for malloc, not for any pointer out there), hence shall not be used, as it would not be the sole way to test for error. Another option, that is _theorically_ not portable, but is ttbomk on all the platforms we intend to support (IOW POSIX-ish and windows), is to use "small" values of the pointers for errors. [NULL .. (void *)(PAGE_SIZE = - 1)[ cannot exist, which gives us probably always 512 different errors, and the test is ((uintptr_t)ptr < (PAGE_SIZE)) which is cheap. It's butt ugly, but encoding errors into pointers is butt ugly in the first place. Another option that is what I would prefer, would be for the use of errno where it makes sense. E.g. if you want a function that fetches an object, this is somehow what read(3) would look like on our store, more or less, and errno's are enough. For the other functions where errno cannot be used, I'm pretty sure we will always pass a handle to some kind of libgit2 stuff, like the "repository" we're working on. The _easiest_ way is to put the "last error" into that structure and use that. I mean, if we want libgit2 to be useful for _everyone_ we *WILL* have to pass a repository context around. I see almost no way around it. And there, NULL means error, and if you want to know about the specific error, git_repo_errno(&ctx) / git_repo_strerr(&ctx) is just easy. Note: What is important is to be able to check for errors _fast_, I don't think printing out the error and knowing which error it was would be in the fast path, so it's less useful to have this information immediately. _My_ taste (but again, like the _t I would use what is there, and won't make a fuss about it at all) is to see function return -1 or NULL for errors, and "abuse" errno for system-like functions, or put the last error into the ...
Or use "negative" pointers. Again, please have a look at include/linux/err.h. The pointer range from 0xffffffff (or 0xffffffffffffffff on 64-bit machines) down to the range you want is for errors, and the top of the address range is almost certain to never be valid in user space either. Nicolas --
Sure, I'm just not sure there isn't an arch where a page would be 512 Indeed. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Hi, How certain may I be of that assumption? Because an assumption it is. Ciao, Dscho --
This reminds me that Shawn earlier in an unrelated thread asked me if I
can relicense builtin-blame.c for JGIT; your reasoning fully matches that
Although I'd be Ok with either GPL + gcc exception on whatever core-ish
(i.e. what will be necessary for libgit2; "blame" would not count) pieces
I have in C-git codebase, "can be linked with anything" allows a gaping
hole to the library, which I'm a bit hesitant to swallow without thinking.
E.g. our read_object() may look like this:
void *read_object(const object_name_t sha1,
enum object_type *type,
size_t *size)
{
...
}
but an extension a closed-source person may sell you back may do:
+typedef void *read_object_fn(const object_name_t,
+ enum object_type *,
+ size_t *);
+read_object_fn read_object_custom = NULL;
void *read_object(const object_name_t sha1,
enum object_type *type,
size_t *size)
{
+ if (read_object_custom != NULL)
+ return read_object_custom(sha1, type, size);
...
}
I.e. use the supplied custom function to do proprietary magic, such as
reading the object lazily from elsewhere over the network. And we will
never get that magic bit back.
Although no license asks this, my wish is that if somebody built on top of
what I wrote to make the world a better place, I'd like the same access to
that additional code so that I too can enjoy the improved world. Because
almost all of my code in git.git are under GPLv2, in reality I do not have
any access to your software as long as you do not distribute your
additional code that made the world a better place, which is a bit sad.
--
Well I wasn't thinking about anything else than what is needed for the libgit2. I love BSDish for libraries, though like GPL for the actual Well, for one "we're" not supposed to accept any patch that does that, and I don't expect that the people who end up maintaining libgit2 will become rogue. Though if such bits of APIs do exist one day, then well, I see no license except the GPL that can prevent you from that. My idea of trying to be able to reuse git.git code is not a necessity, a new implementation from scratch is likely to be possible. Though we all know that if the core git contributors don't contribute and eventually use libgit2 this will not fly. That's why we must think about it. I assume given your answer that if libgit2 is BSD you may not be as motivated to contribute code to it as you are to git.git, and this IMHO would be a big no-go, like shawn said in another mail. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Someday I'm going to come back to you and ask for "blame" in libgit2. Its an important function to be able to execute for an end-user. Look at "git gui blame", its a major feature of the GUI. If your blame implementation will never be available except under the GPL then either it should be clean-room rewritten under the library's license, or maybe there is a "libgitblame" that is GPL and can be optionally linked with libgit2 and a GPL'd application As a maintainer I'd never accept such a patch. I'd ask for the code under read_object_custom, or toss the patch on the floor. But that doesn't stop them from distributing the patched sources like above, keeping the fun bits in the closed source portion of the executable they distribute. Maybe I just think too highly of the other guy, but I'd hope that anyone patching libgit2 like above would try to avoid it, because IMHO, its a flaw of the GPL. GitHub anyone? Heck, even Google uses a lot of GPL'd software internally (yes, we have Linux desktops and servers) but not all of the software we distribute internally goes external, so not all of our patches are published. *sigh* I've actually stayed awake at night sometimes wondering what the world would be like if the GPL virual clause forced the source code for a website to be opened, or forced you to publish your code even if you never distribute binaries beyond "you" (where "you" is some mega corp in many countries with many employees). -- Shawn. --
There is such license, and it is called AGPLv3, Affero GPL[1]. And of course Google prohibits (or did prohibit) using it for projects hosted at Google Code... wonder why... ;-) [1] http://en.wikipedia.org/wiki/AGPL -- Jakub Narebski Poland ShadeHawk on #git --
the issue that I see is that libgit2 will be (on most systems) a shared library. what's to stop someone from taking the libgit2 code, adding the magic proprietary piece, and selling a new libgit2 library binary 'just replace your existing shared library with this new one and all your git related programs gain this feature' they would only face merge issues if they need to keep up to date with you, and git makes it pretty easy to maintain a fork if you only have to do one-way merging (rere) David Lang --
True. The only thing that prevents that is the normal GPL. The LGPL and GPL+"gcc exception" allow this sort of mean behavior. I doubt there's enough of a market for that; replacing a library is something of a pain and if the feature really is interesting or useful someone will write a clean-room re-implementation and submit In other words, we're too good for our own good. ;-) -- Shawn. --
how would the LGPL of GPL+gcc extention allow this? if they modify the code in the library and then distribute the modified library wouldn't they be required to distribute the changes to that library? they could use the LGPL or GPL+exception library with their propriatary program, but I don't see how they could get away with modifying the library. David Lang --
See junio's example. It's rather easy to add hooks into the library to implement a feature outside of it. It's even possible to do it while preserving the ABI fully IMHO (by being a strict superset of it). The patch would be so trivial, that I see no reason why they wouldn't provide it. Though the real implementation of the feature that would be delegated through it would be in their closed source stuff. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
But at that point it's a matter of public perception. It would clearly be against the spirit of the license even though the license itself couldn't prevent such tortuous practices. Those people doing such things would clearly be identified as bad guys and get bad press, and libgit contributors could even attempt law suits based on the derived work angle. That might be just enough to prevent such things to happen. OTOH they would certainly come out clean if the license was BSD since the spirit of that license explicitly allows closing up the whole library and adding extra features. Nicolas --
I think it's pretty clear most contributor from git.git would be driven away if the license isn't GPLish. So I was basing my answer on a supposition it would be. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Sure. My point is the difference in guiltiness. Even if a LGPL license doesn't prevent all abuses, it at least allows for public denunciations at the very least. Nicolas --
My take on the consensus for the license part of the discussion is that libgit2 should be under the "GPL gcc library" license. BTW, I can't actually find a copy of that license; the only thing I can locate in the GCC SVN tree is a copy of the LGPL. -- Shawn. --
The exception is usually found at the top of files constituting libgcc.a. One example is gcc/config/arm/ieee754-df.S. ;-) Nicolas --
Headers updated. Its now GPL+gcc library exception. Not that the 5 lines of useful code there really needs copyright, but hey, whatever. -- Shawn. --
I'm sorry - why is that better than LGPL? Wouldn't it be better to use a license that people have heard of rather than one that can't be looked up or it's implications easily researched? What is this affording the library that offsets the headaches of everyone trying to figure out if they can use it or not? Scott --
I guess my main concern is that if a company wanted to direct resources at supporting Git in something (say, an editor or GUI or whatnot), and that company is of _any_ size, they are going to have to get their legal department to review this strange and almost totally unused license - only knowing that it's barely different than GPL and they know GPL will not fly. LGPL will likely be known to them and a policy may already be in place. Think about trying to incorporate this into something proprietary, Shawn - how much of a pain is it going to be to get that license reviewed in Google? However, LGPL I'm sure there is already a reviewed policy. Now, since that may be a pain, time that Shawn could have been spending being paid to work on the library is lost because they can't use it, or it takes weeks/months to review it. That's my concern. I personally would rather see it BSD or something more permissive so that no human has to waste even a second of their valuable time figuring out if they can work with it or not, but I understand that many people here are much more protective of their code. I simply think that LGPL is a much more widely used and understood compromise that affords nearly the same protectionism. --
Apparently BSD won't fly, as you have already seen on the list. If we did put the library under a BSD license we'd lose some core contributors. Or they at least wouldn't improve the library code, even if git.git linked to it in the future. I don't want to lose these folks. IANAL, but from what I can tell the main difference between LGPL and GPL+"gcc library exception" is that the LGPL requires that the end-user must be able to relink the derived executable with their own replacement library. The GPL+"gcc library exception" makes no such requirement. If you read the exception clause it practically makes the library even easier to use commerically than the BSD license does, however modifications to the library sources must still be distributed. Isn't that actually somewhat close to the Mozilla Public License? -- Shawn. --
The gcc exception license should have been reviewed by anyone who has ever build anything proprietary out of gcc. GPL+link exception is a very common license. It's most common use is for runtime libraries for various programming languages. Lawyers I know are significantly less fearful of the GPL+exception than the LGPL. The exception basically says that if you use it in a certain way, then none of the GPL applies. David --
Indeed. And gcc is a *very* popular compiler on Linux distributions. A lot of commerical software is built under it, and distributed in binary-only form. Those companies have already done the legal review they felt necessary before shipping (and selling) that software. -- Shawn. --
Its license. GPL even with GCC exception would not allow you to do that. Though they could propose a fork of the library patched, with the patch distributed. The downside would be that their code would not be binary compatible with the "true" libgit2, so they would probably have to change the name to avoid namespace clashes, or overwrite the "real" library. But yes, it's theoretically feasible. I'm not sure it would be worth the hassle, and if they respect the license (if they don't they can already do that with the current git anyway) then the fact that someone would want to do something like that would be known fact, probably not avoided, but known. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
the proposal had been for the library to be BSD, not GPL. and I don't see any reason why such a thing couldn't be done with a BSE library. --
~~ Brian Gernhardt --
As it's the git-lib, all public functions should almost certainly be *_t types are reserved by POSIX for future implementations, so that's a no-go (although I doubt POSIX will ever make types named git_*_t). Apart from that, please consider reading Ulrich Drepper's musings on library design at http://people.redhat.com/drepper/goodpractice.pdf It's pretty short but brings up nearly all the crucial points one really don't want to forget. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
I think I've read that before, but I'll skim over it again. Thanks for the link. -- Shawn. --
Essentially, anything that ends with "_t" ;-) http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_02.html#tag_02_02_02 --
The ones I know of off the top of my head are: * Anything ending with _t (posix) pthread_t, size_t, socklen_t, ... * str/mem/? function name prefix NOT followed by an underscore (C lib) strlen, strchr, memcmp, memcpy, ... (strbuf_ stuff doesn't break this, as it contains an underscore in the name - the underscore following it doesn't have to be immediate). * Double underscore prefix (compiler / C lib) __WORDSIZE, __cplusplus, ... It's also a good idea to stay away from single underscore prefix for generic-ish names, as some compilers/architectures abuse it extensively. Prefixing all public functions and macros of the library with 'git_' (lower-case for functions and function-like macros, upper-case for the rest) will probably see us through safe. Exceptions can probably be made for already completed API's, such as the arg-parsing stuff I swear by it at work, where I'm "the library guy". I'll make sure to review stuff and can chip in code or thoughts to make the library fly with a minimum amount of problems and maintenance burden. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Hi, I do not know if I want to trust a person that has shown a certain eagerness to ignore good library design by breaking the well-established dont-change-apis-on-minor-versions idiom, and instead of listening to users that have problems as a consequence rather ignore them. Instead, let's build on the knowledge of people we have learnt to trust, on this list. Thank you, Dscho --
We should take the opportunity a make it more portable. Instead of using
the posix api directly we should warp it in "git_" APIs. And be carefull
with certain APIs like fork or fork+exec and instead provided a more
generic solution: for fork one that would use the best solution in the
given platform, either by forking or threading; and for fork+exec a
generic create_process/run_command.
Here's an example, for the 'read' API, on how we can simply do this
without worries for the posix crowd:
ssize_t git_read(git_fildes_t fildes, void* buf, size_t bufsize);
Were git_fildes_t would be an int for posix and an HANDLE for win32.
For the posix case git_read can be simply inlined and we get zero overhead:
static inline ssize_t git_read(git_fildes_t fildes, void *buf,
size_t bufsize)
{
return read(fildes, buf, bufsize);
}
And for the win32 case it would be much more easier to implement the
equivalent, something like:
ssize_t git_read(git_fildes_t fildes, void *buf, size_t bufsize)
{
DWORD rd;
if (!ReadFile(fildes, buf, bufsize, &rd, NULL)) {
//translate win32 error to errno
return -1;
}
return rd;
}
Of course, there is also the issue of using the c runtime on win32, but
that problem can be easily solved outside git, provided that we don't
use a 'fileno' like API.
Bruno Santos
--
... Yes, already on my mind. _IF_ this carries further I'll be involved with its development, and I'll make certain its reasonably portable like you are asking. There are only a handful of things we really need to from the OS and I think we can wrap most of them up into little inline stubs on POSIX (so POSIX folks have no impact) and on Win32 we can have small stubs (or again inline) so its pretty native on Win32. Yes, I have considered say NPR or APR; I'm not going there. Both are nice packages but there's also downsides to bringing them into libgit2. IMHO its just easier to wrap the handful of things we really need. -- Shawn. --
Having looked briefly at the code, I've got a couple of comments:
* GIT_EXTERN() does nothing. Ever. It's noise and should be removed.
Instead it would be better to have GIT_PRIVATE(), which could
set visibility to "internal" or "hidden", meaning the symbol it's
attached to can be used for lookups when creating a shared library
but won't be usable from programs linking to that shared library
(visibility-attributes have zero effect on static libraries). At
least on all archs anyone really cares about.
* Prefixing the files themselves with git_ is useless and only leads
to developer frustration. I imagine we'd be installing any header
files in a git/ directory anyway, so we're gaining absolutely
nothing with the git_ prefix on source-files.
Apart from that, it seems you've been designing a lot rather than
trying to use the API to actually do something. It would, imo, be
a lot better to start development with adding functionality shared
between all programs and then expand further on that, such as
incorporating all functions needed for manipulating tags into the
library and then modify existing code to use the library to get
tag-ish things done. That would also mean that the library would
quickly get used by core git, as once a certain part of it is
complete patches can be fitted to the library rather than to the
current non-libish dying() functions.
I also think it's quite alright to not strive *too* hard to make
all functions thread-safe, as very few of them will actually need
that. It's unlikely that a user program will spawn one thread to
write a lot of tags while another is trying to parse them, for
example.
By adding an init routine that determines the workdir and the
gitdir, one could start using the library straight away.
int git_init(const char *db, const char *worktree)
{
if (git_set_db_dir(db))
return -1;
git_set_worktree((worktree))
return -1;
return 0;
}
and already you have a some few small helpers ...I feel the same way. But I was also under the impression that the brilliant engineers who work for Microsoft decided that on their platform special annotations have to be inserted on functions that a DLL wants to export to applications. Hence any cross-platform library that I have seen annotates their exported functions this way, with the macro being empty on POSIX and expanding to some magic keyword on Microsoft's OS. I think it I can see why you said this; needing GIT_PRIVATE() is a lot more rare than needing GIT_EXTERN(). Only a handful of cross-module, but private, functions are likely to exist, so it makes sense to Yes, I realized that this morning. I plan on changing that mess around so we have "include/git/oid.h" and library and application code can use "#include <git/oid.h>". Library modules should just I wanted to get a solid idea of what our API conventions should be, before we started writing a lot of code around them. Part of the problem with the git.git code is we don't have conventions that are really suited for use in a shared library (assuming we even have Tags are mostly pointless. Its a tiny part of the code that isn't that interesting to most people. And it requires object database access anyway if you want to talk about parsing or reading a tag. There's almost no point in a git library that can't read the on Oh really? Maybe true for tags, just because they are such an unimportant part of the git suite compared to everything else. But right now I'm running a production system using a threaded server process that is operating on Git repositories. Fortunately threads suck less on Java than they do on POSIX, and we have a 100% pure Java library available for Git. It would be nice if a library created in the late part of 2008 recognized that threads exist, aren't going to disappear tomorrow, and that consumers of libraries actually may need to run the library within a threaded process. Or are you one of those developers who think ...
Hi, Exactly. This is the "good" old __declspec(dllexport) for you. It is a pain in the butt, but that is what you have to go for if libgit2 is supposed to be any more portable than ligit.a. Ciao, Dscho --
Not only there is the windows thing, but the *best* way to design a library is to make _everything_ by default and only show what you mean to. GIT_EXTERN allow us to do just that with __attribute__((visibility("default"))) on GNU-ld/GCC. Of course we can achieve the same using nothing on public symbols and __attribute__((visibility("hidden"))) on the private ones, but it's way more cumbersome and there's always a chance to forget one. It's bad design IMHO. FWIW GIT_EXTERN(...) prototype(arguments); isn't unredable to me, everyone does that, it doesn't break tags, it's _good_. And you can ask doxygen to substitute this GIT_EXTERN(x) with x so that it doesn't shows up in the documentation at all (it can include these kind of partially preprocessed headers in the documentation) for the people who are really annoyed with them. But it's definitely a necessary evil. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
I noticed when I fetched the latest head today that it's already done. I fail to understand why headers need to be in a separate path so that oid.c can't just '#include "oid.h"'. With the risk of nitpicking you to death, put public headers in a separate dir (I'd suggest public/%.h in Make-speak, but I have no strong preference) and keep private headers next to %.c. Always #include the public header file from the private one (that should Right. I guess I'm too firm a believer in system evolution by constant refactoring (with fluctuating api's, yes) rather than thinking initial True, but designing top-down means you'll need to write one more API to get the first stuff working, so you'll always be using the new code you write immediately and for something real. IMO, that No, I'm one of those developers who think that if implementing a function as thread-safe means it'll take 50 times longer than just writing something that works, the right decision is to go with the faster way to get the job done and then expand on it later when the need arises. Reading my original post, I realize I should have made that more clear. Sorry for making your gall rise unnecessarily. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Just a random question: is there a reason why you have put all the .h in a separate includes/ directory instead of relying on the install target to put the include files at the right place ? To me it makes it much harder to hack on the files as one is always required to switch between both directories... --
I agree with this, but as I guess Shawn will do roughly 45 times more work on it than me (according to current commit-count in git.git), I'll live with it. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
I don't, modifying the public includes may break the ABI and the API. I believe it to be a good practice to put them in a separate directory so that people modifying them will know this particular header is public. Yes you can name your private headers differently, but it's not really the same, it doesn't make editing public headers hard, and it has to. People modifying them _have_ to thing "err why am I modifying this specific header in the first place" before doing anything in it. --=20 =C2=B7O=C2=B7 Pierre Habouzit =C2=B7=C2=B7O madcoder@debia= n.org OOO http://www.madism.org
Well, I suggested putting "src/public/public_header.h" quite early on, with private headers next to the source. AFAIU, the private and public headers both are now located in the same directory, and that directory is separate from the .c files. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 --
Currently there are only public headers, and the public headers are all under include/git/. Private headers are going to be under src/ so they are isolated from the public headers. But I haven't had a chance to touch libgit2 in over a week. :-\ I've simply got too many projects going on at once. This is one I really want to work on though, so I'm going to try and make time for it next week. But I'm also in the middle of a major overhaul of Gerrit, so it can run on non-Google infrastructure and thus is usable by pretty much anyone. -- Shawn. --
