During the GitTogether we were kicking around the idea of a ground-up
implementation of a Git library. This may be easier than trying
to grind down git.git into a library, as we aren't tied to any
of the current global state baggage or the current die() based
error handling.I've started an _extremely_ rough draft. The code compiles into a
libgit.a but it doesn't even implement what it describes in the API,
let alone a working Git implementation. Really what I'm trying to
incite here is some discussion on what the API looks like.API Docs:
http://www.spearce.org/projects/scm/libgit2/apidocs/html/modules.htmlSource Code Clone URL:
http://www.spearce.org/projects/scm/libgit2/libgit2.git--
Shawn.
--
Just a random question: is there a reason why you have put all the .h in
a separate includes/ directory instead of relying on the install target
to put the include files at the right place ?To me it makes it much harder to hack on the files as one is always
required to switch between both directories...
--
I agree with this, but as I guess Shawn will do roughly 45 times more
work on it than me (according to current commit-count in git.git), I'll
live with it.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
I don't, modifying the public includes may break the ABI and the API.
I believe it to be a good practice to put them in a separate directory
so that people modifying them will know this particular header is
public. Yes you can name your private headers differently, but it's not
really the same, it doesn't make editing public headers hard, and it has
to. People modifying them _have_ to thing "err why am I modifying this
specific header in the first place" before doing anything in it.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Well, I suggested putting "src/public/public_header.h" quite early on,
with private headers next to the source. AFAIU, the private and public
headers both are now located in the same directory, and that directory is
separate from the .c files.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
Currently there are only public headers, and the public headers are
all under include/git/. Private headers are going to be under src/
so they are isolated from the public headers.But I haven't had a chance to touch libgit2 in over a week. :-\
I've simply got too many projects going on at once. This is one
I really want to work on though, so I'm going to try and make time
for it next week. But I'm also in the middle of a major overhaul
of Gerrit, so it can run on non-Google infrastructure and thus is
usable by pretty much anyone.--
Shawn.
--
Having looked briefly at the code, I've got a couple of comments:
* GIT_EXTERN() does nothing. Ever. It's noise and should be removed.
Instead it would be better to have GIT_PRIVATE(), which could
set visibility to "internal" or "hidden", meaning the symbol it's
attached to can be used for lookups when creating a shared library
but won't be usable from programs linking to that shared library
(visibility-attributes have zero effect on static libraries). At
least on all archs anyone really cares about.
* Prefixing the files themselves with git_ is useless and only leads
to developer frustration. I imagine we'd be installing any header
files in a git/ directory anyway, so we're gaining absolutely
nothing with the git_ prefix on source-files.Apart from that, it seems you've been designing a lot rather than
trying to use the API to actually do something. It would, imo, be
a lot better to start development with adding functionality shared
between all programs and then expand further on that, such as
incorporating all functions needed for manipulating tags into the
library and then modify existing code to use the library to get
tag-ish things done. That would also mean that the library would
quickly get used by core git, as once a certain part of it is
complete patches can be fitted to the library rather than to the
current non-libish dying() functions.I also think it's quite alright to not strive *too* hard to make
all functions thread-safe, as very few of them will actually need
that. It's unlikely that a user program will spawn one thread to
write a lot of tags while another is trying to parse them, for
example.By adding an init routine that determines the workdir and the
gitdir, one could start using the library straight away.int git_init(const char *db, const char *worktree)
{
if (git_set_db_dir(db))
return -1;
git_set_worktree((worktree))
return -1;return 0;
}and already you have a some few small helpers that a...
I feel the same way.
But I was also under the impression that the brilliant engineers
who work for Microsoft decided that on their platform special
annotations have to be inserted on functions that a DLL wants to
export to applications.Hence any cross-platform library that I have seen annotates their
exported functions this way, with the macro being empty on POSIX
and expanding to some magic keyword on Microsoft's OS. I think itI can see why you said this; needing GIT_PRIVATE() is a lot more
rare than needing GIT_EXTERN(). Only a handful of cross-module,
but private, functions are likely to exist, so it makes sense toYes, I realized that this morning. I plan on changing that mess
around so we have "include/git/oid.h" and library and application
code can use "#include <git/oid.h>". Library modules should justI wanted to get a solid idea of what our API conventions should be,
before we started writing a lot of code around them. Part of the
problem with the git.git code is we don't have conventions that are
really suited for use in a shared library (assuming we even haveTags are mostly pointless. Its a tiny part of the code that isn't
that interesting to most people. And it requires object database
access anyway if you want to talk about parsing or reading a tag.
There's almost no point in a git library that can't read the onOh really?
Maybe true for tags, just because they are such an unimportant part
of the git suite compared to everything else.But right now I'm running a production system using a threaded server
process that is operating on Git repositories. Fortunately threads
suck less on Java than they do on POSIX, and we have a 100% pure
Java library available for Git.It would be nice if a library created in the late part of 2008
recognized that threads exist, aren't going to disappear tomorrow,
and that consumers of libraries actually may need to run the library
within a threaded process.Or are you one of those developers who thi...
I noticed when I fetched the latest head today that it's already
done. I fail to understand why headers need to be in a separate
path so that oid.c can't just '#include "oid.h"'.With the risk of nitpicking you to death, put public headers in a
separate dir (I'd suggest public/%.h in Make-speak, but I have no
strong preference) and keep private headers next to %.c. Always
#include the public header file from the private one (that shouldRight. I guess I'm too firm a believer in system evolution by constant
refactoring (with fluctuating api's, yes) rather than thinking initialTrue, but designing top-down means you'll need to write one more
API to get the first stuff working, so you'll always be using the
new code you write immediately and for something real. IMO, thatNo, I'm one of those developers who think that if implementing a
function as thread-safe means it'll take 50 times longer than just
writing something that works, the right decision is to go with the
faster way to get the job done and then expand on it later when the
need arises. Reading my original post, I realize I should have made
that more clear. Sorry for making your gall rise unnecessarily.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
Not only there is the windows thing, but the *best* way to design a
library is to make _everything_ by default and only show what you mean
to.GIT_EXTERN allow us to do just that with
__attribute__((visibility("default"))) on GNU-ld/GCC.Of course we can achieve the same using nothing on public symbols and
__attribute__((visibility("hidden"))) on the private ones, but it's way
more cumbersome and there's always a chance to forget one. It's bad
design IMHO.FWIW GIT_EXTERN(...) prototype(arguments); isn't unredable to me,
everyone does that, it doesn't break tags, it's _good_. And you can ask
doxygen to substitute this GIT_EXTERN(x) with x so that it doesn't shows
up in the documentation at all (it can include these kind of partially
preprocessed headers in the documentation) for the people who are really
annoyed with them.But it's definitely a necessary evil.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Hi,
Exactly. This is the "good" old __declspec(dllexport) for you. It is a
pain in the butt, but that is what you have to go for if libgit2 is
supposed to be any more portable than ligit.a.Ciao,
Dscho
--
We should take the opportunity a make it more portable. Instead of using
the posix api directly we should warp it in "git_" APIs. And be carefull
with certain APIs like fork or fork+exec and instead provided a more
generic solution: for fork one that would use the best solution in the
given platform, either by forking or threading; and for fork+exec a
generic create_process/run_command.Here's an example, for the 'read' API, on how we can simply do this
without worries for the posix crowd:ssize_t git_read(git_fildes_t fildes, void* buf, size_t bufsize);
Were git_fildes_t would be an int for posix and an HANDLE for win32.
For the posix case git_read can be simply inlined and we get zero overhead:static inline ssize_t git_read(git_fildes_t fildes, void *buf,
size_t bufsize)
{
return read(fildes, buf, bufsize);
}And for the win32 case it would be much more easier to implement the
equivalent, something like:ssize_t git_read(git_fildes_t fildes, void *buf, size_t bufsize)
{
DWORD rd;if (!ReadFile(fildes, buf, bufsize, &rd, NULL)) {
//translate win32 error to errno
return -1;
}
return rd;
}Of course, there is also the issue of using the c runtime on win32, but
that problem can be easily solved outside git, provided that we don't
use a 'fileno' like API.Bruno Santos
--
...
Yes, already on my mind. _IF_ this carries further I'll be involved
with its development, and I'll make certain its reasonably portable
like you are asking.There are only a handful of things we really need to from the OS
and I think we can wrap most of them up into little inline stubs
on POSIX (so POSIX folks have no impact) and on Win32 we can have
small stubs (or again inline) so its pretty native on Win32.Yes, I have considered say NPR or APR; I'm not going there.
Both are nice packages but there's also downsides to bringing them
into libgit2. IMHO its just easier to wrap the handful of things
we really need.--
Shawn.
--
--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printableI know this isn't actually helping a lot to define the real APIs, but we
should really not repeat current git mistakes and have a really uniform
APIs, meaning that first we must decide:
* proper namespacing (e.g. OBJ_* looks like failure to me, it's a way
too common prefix);* proper public "stuff" naming (I e.g. realy like types names -- not
struct or enum tags, that I don't really care -- ending with _t as
it helps navigating source.* ...
And write that down _first_. It's not a lot of work, but it must be
done. Working on a library really asks us to create something coherent
for our users.Second, if we want this to be a successful stuff, we all agree we must
let git be able to use it medium term. That means that when git-core is
experimenting with new interfaces, it will probably need to hook into
some more internal aspects of the library. This is a problem to solve
elegantly, linking git against a static library won't please a lot of
vendors and linux distributions, and exporting "private" symbols is a
sure way to see them being abused.Last but not least, I believe parts of git-core are currently easy to
just take. For example, any code *I* wrote, I hereby give permission to
relicense it in any of the following licenses: BSD-like, MIT-like,
WTFPL.For example, on parse-options.c, git blame yields:
git blame -C -C -M parse-options.c|cut -d\( -f2|cut -d2 -f1|sort|uniq -c
16 Alex Riesen
6 Jeff King
47 Johannes Schindelin
12 Junio C Hamano
19 Michele Ballabio
1 Nanako Shiraishi
1 Olivier Marin
395 Pierre HabouzitOkay, arguably parse-options.c in libgit quite doesn't makes sense
(though it can help bringing some kind of uniformity to other git
ecosystem tools built on libgit but that's not the point I'm trying to
make), I'm sure this kind of pattern where it's likely to be easy ...
As it's the git-lib, all public functions should almost certainly be
*_t types are reserved by POSIX for future implementations, so that's
a no-go (although I doubt POSIX will ever make types named git_*_t).Apart from that, please consider reading Ulrich Drepper's musings on
library design at http://people.redhat.com/drepper/goodpractice.pdfIt's pretty short but brings up nearly all the crucial points one really
don't want to forget.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
Hi,
I do not know if I want to trust a person that has shown a certain
eagerness to ignore good library design by breaking the well-established
dont-change-apis-on-minor-versions idiom, and instead of listening to
users that have problems as a consequence rather ignore them.Instead, let's build on the knowledge of people we have learnt to trust,
on this list.Thank you,
Dscho--
I think I've read that before, but I'll skim over it again.
Thanks for the link.--
Shawn.
--
The ones I know of off the top of my head are:
* Anything ending with _t (posix)
pthread_t, size_t, socklen_t, ...
* str/mem/? function name prefix NOT followed by an underscore (C lib)
strlen, strchr, memcmp, memcpy, ...
(strbuf_ stuff doesn't break this, as it contains an underscore in the
name - the underscore following it doesn't have to be immediate).
* Double underscore prefix (compiler / C lib)
__WORDSIZE, __cplusplus, ...It's also a good idea to stay away from single underscore prefix for
generic-ish names, as some compilers/architectures abuse it extensively.Prefixing all public functions and macros of the library with 'git_'
(lower-case for functions and function-like macros, upper-case for
the rest) will probably see us through safe. Exceptions can probably
be made for already completed API's, such as the arg-parsing stuffI swear by it at work, where I'm "the library guy". I'll make sure to
review stuff and can chip in code or thoughts to make the library fly
with a minimum amount of problems and maintenance burden.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
Essentially, anything that ends with "_t" ;-)
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_02.ht...
--
~~ Brian Gernhardt
--
First........... is there really a need to re-license it?
Depends. Sure, I gave permission to copy some of my code for JGIT
because 1) JGIT is Java code in which I have little interest, 2) the
different license was justified by the nature of the JGIT project, and
3) although no license convey this I asked for the C version of git to
remain the authoritative reference and that any improvements done to JGIT
first be usable in the C version under the GPL.Of course a library might need a different license than the GPL to be
widely useful from a linkage stand point, but the code within that
library does not need to be miles away from the GPL. What I personally
care about is for improvements to my code to always be contributed back,
which pretty much discards BSD-like licenses.My favorite license for a library is the GPL with the gcc exception,
i.e. what libraries coming with gcc are using. They're GPL but with an
exception allowing them to be linked with anything. And because
everything on a Linux system, including proprietary applications, is
likely linked against those gcc libs, then there is nothing that would
prevent libgit to be linked against anything as well. But the library
code itself has GPL protection.For reference, here's the exception text:
In addition to the permissions in the GNU General Public License, the
Free Software Foundation gives you unlimited permission to link the
compiled version of this file into combinations with other programs,
and to distribute those combinations without any restriction coming
from the use of this file. (The General Public License restrictions
do apply in other respects; for example, they cover modification of
the file, and distribution when not linked into a combine
executable.)Nicolas
--
This reminds me that Shawn earlier in an unrelated thread asked me if I
can relicense builtin-blame.c for JGIT; your reasoning fully matches thatAlthough I'd be Ok with either GPL + gcc exception on whatever core-ish
(i.e. what will be necessary for libgit2; "blame" would not count) pieces
I have in C-git codebase, "can be linked with anything" allows a gaping
hole to the library, which I'm a bit hesitant to swallow without thinking.E.g. our read_object() may look like this:
void *read_object(const object_name_t sha1,
enum object_type *type,
size_t *size)
{
...
}but an extension a closed-source person may sell you back may do:
+typedef void *read_object_fn(const object_name_t,
+ enum object_type *,
+ size_t *);
+read_object_fn read_object_custom = NULL;
void *read_object(const object_name_t sha1,
enum object_type *type,
size_t *size)
{
+ if (read_object_custom != NULL)
+ return read_object_custom(sha1, type, size);
...
}I.e. use the supplied custom function to do proprietary magic, such as
reading the object lazily from elsewhere over the network. And we will
never get that magic bit back.Although no license asks this, my wish is that if somebody built on top of
what I wrote to make the world a better place, I'd like the same access to
that additional code so that I too can enjoy the improved world. Because
almost all of my code in git.git are under GPLv2, in reality I do not have
any access to your software as long as you do not distribute your
additional code that made the world a better place, which is a bit sad.
--
Someday I'm going to come back to you and ask for "blame" in libgit2.
Its an important function to be able to execute for an end-user.
Look at "git gui blame", its a major feature of the GUI.If your blame implementation will never be available except under
the GPL then either it should be clean-room rewritten under the
library's license, or maybe there is a "libgitblame" that is GPL
and can be optionally linked with libgit2 and a GPL'd applicationAs a maintainer I'd never accept such a patch. I'd ask for the
code under read_object_custom, or toss the patch on the floor.
But that doesn't stop them from distributing the patched sources
like above, keeping the fun bits in the closed source portion of
the executable they distribute.Maybe I just think too highly of the other guy, but I'd hope that
anyone patching libgit2 like above would try to avoid it, becauseIMHO, its a flaw of the GPL. GitHub anyone? Heck, even Google uses
a lot of GPL'd software internally (yes, we have Linux desktops and
servers) but not all of the software we distribute internally goes
external, so not all of our patches are published. *sigh*I've actually stayed awake at night sometimes wondering what the
world would be like if the GPL virual clause forced the source code
for a website to be opened, or forced you to publish your code
even if you never distribute binaries beyond "you" (where "you"
is some mega corp in many countries with many employees).--
Shawn.
--
the issue that I see is that libgit2 will be (on most systems) a shared
library.what's to stop someone from taking the libgit2 code, adding the magic
proprietary piece, and selling a new libgit2 library binary 'just replace
your existing shared library with this new one and all your git related
programs gain this feature'they would only face merge issues if they need to keep up to date with
you, and git makes it pretty easy to maintain a fork if you only have to
do one-way merging (rere)David Lang
--
Its license. GPL even with GCC exception would not allow you to do that.
Though they could propose a fork of the library patched, with the patch
distributed. The downside would be that their code would not be binary
compatible with the "true" libgit2, so they would probably have to
change the name to avoid namespace clashes, or overwrite the "real"
library.But yes, it's theoretically feasible. I'm not sure it would be worth the
hassle, and if they respect the license (if they don't they can already
do that with the current git anyway) then the fact that someone would
want to do something like that would be known fact, probably not
avoided, but known.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
the proposal had been for the library to be BSD, not GPL. and I don't see
any reason why such a thing couldn't be done with a BSE library.--
True. The only thing that prevents that is the normal GPL. The
LGPL and GPL+"gcc exception" allow this sort of mean behavior.
I doubt there's enough of a market for that; replacing a library
is something of a pain and if the feature really is interesting or
useful someone will write a clean-room re-implementation and submitIn other words, we're too good for our own good. ;-)
--
Shawn.
--
how would the LGPL of GPL+gcc extention allow this? if they modify the
code in the library and then distribute the modified library wouldn't they
be required to distribute the changes to that library?they could use the LGPL or GPL+exception library with their propriatary
program, but I don't see how they could get away with modifying the
library.David Lang
--
See junio's example. It's rather easy to add hooks into the library to
implement a feature outside of it. It's even possible to do it while
preserving the ABI fully IMHO (by being a strict superset of it).The patch would be so trivial, that I see no reason why they wouldn't
provide it. Though the real implementation of the feature that would be
delegated through it would be in their closed source stuff.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
But at that point it's a matter of public perception. It would clearly
be against the spirit of the license even though the license itself
couldn't prevent such tortuous practices. Those people doing such
things would clearly be identified as bad guys and get bad press, and
libgit contributors could even attempt law suits based on the derived
work angle. That might be just enough to prevent such things to happen.OTOH they would certainly come out clean if the license was BSD since
the spirit of that license explicitly allows closing up the whole
library and adding extra features.Nicolas
--
My take on the consensus for the license part of the discussion is
that libgit2 should be under the "GPL gcc library" license.BTW, I can't actually find a copy of that license; the only thing
I can locate in the GCC SVN tree is a copy of the LGPL.--
Shawn.
--
The exception is usually found at the top of files constituting
libgcc.a. One example is gcc/config/arm/ieee754-df.S. ;-)Nicolas
--
Headers updated. Its now GPL+gcc library exception.
Not that the 5 lines of useful code there really needs copyright,
but hey, whatever.--
Shawn.
--
I'm sorry - why is that better than LGPL? Wouldn't it be better to
use a license that people have heard of rather than one that can't be
looked up or it's implications easily researched? What is this
affording the library that offsets the headaches of everyone trying to
figure out if they can use it or not?Scott
--
I guess my main concern is that if a company wanted to direct
resources at supporting Git in something (say, an editor or GUI or
whatnot), and that company is of _any_ size, they are going to have to
get their legal department to review this strange and almost totally
unused license - only knowing that it's barely different than GPL and
they know GPL will not fly. LGPL will likely be known to them and a
policy may already be in place.Think about trying to incorporate this into something proprietary,
Shawn - how much of a pain is it going to be to get that license
reviewed in Google? However, LGPL I'm sure there is already a
reviewed policy. Now, since that may be a pain, time that Shawn could
have been spending being paid to work on the library is lost because
they can't use it, or it takes weeks/months to review it. That's my
concern.I personally would rather see it BSD or something more permissive so
that no human has to waste even a second of their valuable time
figuring out if they can work with it or not, but I understand that
many people here are much more protective of their code. I simply
think that LGPL is a much more widely used and understood compromise
that affords nearly the same protectionism.--
The gcc exception license should have been reviewed by anyone who has
ever build anything proprietary out of gcc.GPL+link exception is a very common license. It's most common use is
for runtime libraries for various programming languages.Lawyers I know are significantly less fearful of the GPL+exception
than the LGPL. The exception basically says that if you use it in a
certain way, then none of the GPL applies.David
--
Indeed. And gcc is a *very* popular compiler on Linux distributions.
A lot of commerical software is built under it, and distributed in
binary-only form. Those companies have already done the legal review
they felt necessary before shipping (and selling) that software.--
Shawn.
--
Apparently BSD won't fly, as you have already seen on the list.
If we did put the library under a BSD license we'd lose some core
contributors. Or they at least wouldn't improve the library code,
even if git.git linked to it in the future. I don't want to lose
these folks.IANAL, but from what I can tell the main difference between LGPL
and GPL+"gcc library exception" is that the LGPL requires that
the end-user must be able to relink the derived executable with
their own replacement library. The GPL+"gcc library exception"
makes no such requirement.If you read the exception clause it practically makes the library
even easier to use commerically than the BSD license does, however
modifications to the library sources must still be distributed.Isn't that actually somewhat close to the Mozilla Public License?
--
Shawn.
--
I think it's pretty clear most contributor from git.git would be driven
away if the license isn't GPLish. So I was basing my answer on a
supposition it would be.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Sure. My point is the difference in guiltiness. Even if a LGPL license
doesn't prevent all abuses, it at least allows for public denunciations
at the very least.Nicolas
--
There is such license, and it is called AGPLv3, Affero GPL[1].
And of course Google prohibits (or did prohibit) using it for projects
hosted at Google Code... wonder why... ;-)[1] http://en.wikipedia.org/wiki/AGPL
--
Jakub Narebski
Poland
ShadeHawk on #git
--
Well I wasn't thinking about anything else than what is needed for the
libgit2. I love BSDish for libraries, though like GPL for the actualWell, for one "we're" not supposed to accept any patch that does that,
and I don't expect that the people who end up maintaining libgit2 will
become rogue. Though if such bits of APIs do exist one day, then well, I
see no license except the GPL that can prevent you from that.My idea of trying to be able to reuse git.git code is not a necessity,
a new implementation from scratch is likely to be possible. Though we
all know that if the core git contributors don't contribute and
eventually use libgit2 this will not fly. That's why we must think about
it.I assume given your answer that if libgit2 is BSD you may not be as
motivated to contribute code to it as you are to git.git, and this IMHO
would be a big no-go, like shawn said in another mail.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Hmm yeah, GPL has the viral link issue, and there is a real use of
embedding the libgit in many projects. There is a use for it: front-ends
to git from editors, for closed-source projects of various sorts, ...
All of those right now must do a reimplementation of what they need. As
soon as they need some kind of performance, exec()ing git-* commands
doesn't fly.Git is currently mostly "GPLv2 or later". A BSDish license was
mentioned, because it's the most permissive one and that nobody cared
that much, though a LGPL/GPL-with-GCC-exception would probably fly.Many of the people needing a library for libgit are probably reading the
list, I'll let them comment. The kind of license you propose would
totally suite my needs, and I think, most of the one discussed at
GitTogether'08 (except for the eclipse people disliking GPL'ed stuff,
but anyways there was the issue of C code being non pure java anyways,
so maybe Shawn can comment on that bit, I don't recall the exact
specifics I must reckon).OT: FWIW I prefer BSDish licenses (even the MIT actually) for libraries
because I believe that computing is overall better if everyone can use
the right tool for the task, and I don't want to prevent people from
using good stuff (I hope I write good stuff ;P) because of the license.
And I don't care about people don't giving back to me, those are not the
kind of people who would have given back if it was GPL'ed anyways.But I understand this is a completely personal view, and I'm not even
trying to persuade you :)
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
As an implementor of a git GUI, I don't really care what license
libgit2 will be. GitX is currently GPLv2, though that might change
to LGPL to allow it to ship/link with non-GPL libraries. GPL and LGPL
would both suit me.As a more concrete comment, is there anything you would like to hear
from GUI developers during the development of libgit2? I'm not sure I
can contribute much in terms of code, but if you need any constructive
comments, I can help with that.- Pieter
--
We'd like to hear feedback on the API. Look at the operations
your application does with git-core.Can you express that with the libgit2 API?
If not, why not?
Is it just that the docs are unclear?
Is the API missing?
What would you like to invoke to get the data you need?
...You are the end-user of the library, so it needs to suit you. Ok,
you aren't the only end-user, but you and other developers like
you... :-)--
Shawn.
--
I will be the end-user of the library because if we want libgit2 to be
anywhere close to successful, you should be able to port C-git to it.I understand that the apidocs/ is a very early work-in-progress, but
still, it bothers me that it is unclear to me what lifetime rules are in
effect on the in-core objects. For example, in C-git, commit objects are
not just parsed but are modified in place as history is traversed
(e.g. their flags smudged and their parents simplified). You have "flags"
field in commit, which implies to me that the design shares this same
"modified by processing in-place" assumption. It is great for processing
efficiency as long as you are a "run once and let exit(3) clean-up" type
of program, but is quite problematic otherwise. commit.flags that C-git
uses for traversal marker purposes, together with "who are parents and
children of this commit", should probably be kept inside traversal module,
if you want to make this truly reusable.By the way, I hate git_result_t. That should be "int", the most natural
integral type on the platform.
--
Its not efficient to keep this data inside of the "traversal module"
instance (aka what I called git_revp_t). You really want it inside
of the commit itself (aka git_commit_t).My thought here is that git_commit_t's are scoped within a given
git_revp_t that was used when they were parsed. That is:git_revp_t *pool_a = git_revp_alloc(db, NULL);
git_revp_t *pool_b = git_revp_alloc(db, NULL);
git_oid_t id;
git_commit_t *commit_a, *commit_b, *commit_c;git_oid_mkstr(&id, "3c223b36af9cace4f802a855fbb588b1dccf0648");
commit_a = git_commit_parse(pool_a, &id);
commit_b = git_commit_parse(pool_b, &id);
commit_c = git_commit_parse(pool_a, &id);if (commit_a == commit_b)
die("the world just exploded");
else
printf("this was correct behavior\n");if (commit_a == commit_c)
printf("this was correct behavior\n");
else
die("the hash table is broken");To completely different git_revp_t's on the same database yeild
different commit pointers, but successive calls to parse the same
commit in the same pool yield the same pointer.Certain operations on the pool can cause it to alter its state
in a way that cannot be reversed (e.g. rewrite parents). In such
cases the caller should free the pool and alloc a new one in order
to issue new traversals against the same object database, but withYea, I'm torn on git_result_t myself. Some library APIs use their
own result type, but as a typedef off int.I'm tempted to stick with int for the result type, but I don't
want readers to confuse our result type of 0 == success, <0 ==
failure with some case where we return a signed integral value as
a result of a computation.I'm also debating the error handling. Do we return the error
code as the return value from the function, or do we stick it into
some sort of thread-global like classic "errno", or do we ask the
application to pass in a structure to us?E.g.:
Return code:
git_result_t r = git_foo_bar(...);
i...
Passing a structure pointer for errors is adding overhead to the API in
all cases, please don't do that.
Both the negative code and errno style are lightweight in the common "no
error" case. The errno style is probably more handy for those functions
returning a pointer which should be NULL in the error case.Nicolas
--
Sometimes, but yea, its not that often. I think we've already
settled that it shall be "int".I'm updating some stuff like that (and dropping _t) as I write
I'm sticking with return a negative code for now, to the extent
that some functions which return a pointer but also have many
common failure modes (e.g. git_odb_open) use an output parameter
as their first arg, so the error code can be returned as the result
of the function.--
Shawn.
--
Actually, the pointer-returning functions can encode error cases into a
"negative" pointer. See include/linux/err.h for example.void *ptr = git_alloc_foo(...);
if (IS_ERR(ptr))
die("git_alloc_foo failed: %s", git_strerr(PTR_ERR(ptr)));Nicolas
--
Oh, good point. We could also stagger the errors so they are
always odd, and never return an odd-alignment pointer from a
successful function. Thus IS_ERR can be written as:#define IS_ERR(ptr) (((intptr_t)(ptr)) & 1)
which is quite cheap, and given the (probably required anyway)
aligned allocation policy means we still have 2^31 possible
error codes available.--
Shawn.
--
Hi,
Unfortunately, errno would not be thread-safe, unless you can guarantee
Oh boy, both solutions are ugly as hell. Although the &1 method does not
limit the memory space as much (except if you plan to work in
space-contrained environments, where you do not want to be forced to
word-align structs).The only pointer game I would remotely consider clean is if you had
const char *errors[] = {
...
};inline int is_error(void *ptr) {
return ptr >= errors && ptr < errors + ARRAY_SIZE(errors);
}although that would be less performant.
Ciao,
Dscho--
Well, TLS afaict is implemented on arches that have GNU ld, or on win32
Well, you can't return _sanely_ an error through a pointer. The &1
method is broken as soon as you return a char* (there is an alignment
requirement for malloc, not for any pointer out there), hence shall not
be used, as it would not be the sole way to test for error.Another option, that is _theorically_ not portable, but is ttbomk on all
the platforms we intend to support (IOW POSIX-ish and windows), is to
use "small" values of the pointers for errors. [NULL .. (void *)(PAGE_SIZE =
- 1)[
cannot exist, which gives us probably always 512 different errors, and
the test is ((uintptr_t)ptr < (PAGE_SIZE)) which is cheap. It's butt
ugly, but encoding errors into pointers is butt ugly in the first place.Another option that is what I would prefer, would be for the use of
errno where it makes sense. E.g. if you want a function that fetches an
object, this is somehow what read(3) would look like on our store, more
or less, and errno's are enough. For the other functions where errno
cannot be used, I'm pretty sure we will always pass a handle to some
kind of libgit2 stuff, like the "repository" we're working on. The
_easiest_ way is to put the "last error" into that structure and use
that. I mean, if we want libgit2 to be useful for _everyone_ we *WILL*
have to pass a repository context around. I see almost no way around it.
And there, NULL means error, and if you want to know about the specific
error, git_repo_errno(&ctx) / git_repo_strerr(&ctx) is just easy.Note: What is important is to be able to check for errors _fast_, I
don't think printing out the error and knowing which error it was would
be in the fast path, so it's less useful to have this information
immediately._My_ taste (but again, like the _t I would use what is there, and won't
make a fuss about it at all) is to see function return -1 or NULL for
errors, and "abuse" errno for system-like functions, or put the last
error in...
Or use "negative" pointers. Again, please have a look at
include/linux/err.h. The pointer range from 0xffffffff (or
0xffffffffffffffff on 64-bit machines) down to the range you want is for
errors, and the top of the address range is almost certain to never be
valid in user space either.Nicolas
--
Hi,
How certain may I be of that assumption? Because an assumption it is.
Ciao,
Dscho--
Sure, I'm just not sure there isn't an arch where a page would be 512
Indeed.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
I don't think it's impossible to have something efficient without this
kind of hacks. You just need to dissociate the objects from their
annotations, though use some kind of allocator that allow numbering of
the objects, and use that number as a lookup in an array of annotations.
It will require pool allocators for the annotations, but that shouldI concur.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Interesting approach. I don't know why I didn't think of that one.
You'll still need to be able to toss parts of the git graph though.
If you just pin everything in memory under a single global object
table you'll run server processes out of memory as they chug throughint it is then.
--
Shawn.
--
Sure, but for that you just need to reinject the numbers into some kind
of free list (hint a bitmap) to reuse old slots. Of course this is some
kind of take-once never-release approach _BUT_ one can do better and
"defrag" this at times.E.g. for a server if we take your idea, there are some times we probably
*know* nobody has kept a reference to one of the pointers and we can
reorganize some pointers around to free chunks of data not allocated.An escape way is to use mmap + madvise for those pools, the former to
allocate the memory and the latter to drop large unused ranges when
needed (remapping with MAP_FIXED is also supposed to work but fails on
Macos it seems). Even win32 has what it takes to do so (just skim
through the code of jemalloc, that is used for mozilla, it's quite
portable on POSIX + Windows).I was thinking that one should create and register pools of annotations
as such, define the size of the annotation (IOW the size of cell), and
let deal with that. I imagine the stuff as some kind of allocator that
would allocate the first time e.g. 4k of objects, and if you need more
8k, and if need more 16k and so on exponentially. Mapping an integer to
a cell in this kind of ropes is _really_ efficient: you need to know the
first bit set (__builtin_clz is of help with gcc, it can be emulated
with proper asm for many non-gcc platforms also) to give you the number
of the rope component that you intend to address, you clear that bit,
the resulting number is the index in that rope component of the cell
you're interested in. On most machines such an operation would be a few
cycles, which should be fairly little wrt the operation you would do
during a traversal.Maintaining the bitmap is important, so that we can give back large
chunks of physical memory back to the system, and many malloc
implementations would likely use such kind of implementations, we would
just do our (which is arguably heavy) but gives us control on the cell
number, which is just what we need.Of ...
I do care. I think the BSD license is too permissive. There are really
nifty pieces of code in Git that I would be really sorry to see goEclipse is Java and that issue is already solved with JGIT which doesn't
Everybody can and does link against glibc on Linux which is LGPL. So
Sure, and that's where we differ. I let you use my code for free, as
long as you give me back your improvements to it. This way everybody
stays honnest. I think this is Linus' view as well which he often
resume as "tit for tat".Nicolas
--
I agree with Nicolas, for what it's worth. Although I realize my small
contributions to the git code can be trivially rewritten, I think thatFor dynamic libraries, yes, as you can replace them in-flight if you need
to. For static libraries, it's a different matter. I think the gcc exception
is rather special, as parts of gcc is the dynamic linker initialization code,
which has to be shipped in object form and included in all executables
produced with gcc. Perhaps we'd be better off asking one of EFF's lawyers
what each license means, exactly.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
at the very least you should go from GPLv2 to LGPLv2 for the library.
while it can be argued that this really shouldn't be nessasary, the water
is muddy enough that it would be a very good thing to do this.I don't see any need to switch to a BSD/MIT/etc license for a library, the
<shrug>, I don't see why this is needed with the LGPL, but I'm not a
lawyer.David Lang
--
The LGPL also asks that proprietary applications provides necessary
object files so you can link it against an alternative implementation of
the LGPL library if you so wish. With dynamic libraries this is rather
moot but I think that's the main difference.Nicolas
--
Some people want to be able to link the library into an application
that they redistribute binaries of, but not sources to. Those folks
have also volunteered to help write the library. If they put their
code where their mouth is, then I think they should be able to use
their code the way they want to.That said, I think the license choice that makes the most sense
here is probably LGPL or GPL+gcc exception, like you note below.Well, we cannot do a GPL->LGPL switch on code without author
permission for that sort of re-licensing.That said, I think many authors of git.git code would be more
comfortable with a GPL->LGPL change, where they wouldn't be OK with
a GPL->BSD/MIT change. There may be some folks though who stillI'm happy with either the LGPL or the GPL+exception above. If I
read these correctly the GPL+exception allows one to distribute
static executables without source or object files, so long as the
library source wasn't modified. I'd almost prefer just using the
standard LGPL then, static linking isn't very common anymore.--
Shawn.
--
FWIW, I dislike the LGPL for many reasons, and I prefer 100x times a GPL
with GCC kind of exceptions. But if other people hate it, I wont be
be a problem and refuse it.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
because of this paragraph.
Like Pierre I also prefer a BSD style license, and JGit is under
that, as it offers quite a bit of freedom for the consumer of
the code.I'm also not too worried about not getting changes back. If someone
forks away from the base project and doesn't contribute back,
that's their problem. So long as the base project has sufficient
momentum under it making changes and improving things, everyone
else will want to pull and either face merge-hell once in a while,
or send changes back upstream to avoid merge-hell.But I doubt Git regulars share our views on this, and I think most
of the major contributors to git.git have stated multiple times
that they prefer a GPL style license on their code. I want those
people to contribute to libgit2 (assuming the project moves past the
pie-in-the-sky theory stage), so I want the license to be something
they will be comfortable with.--
Shawn.
--
How about this?
Private symbols are a problem. On some systems we can use link
editing to strip them out of the .so, but that isn't always going to
work everywhere. I've outlined the idea of using double underscore
to name private functions, and we can link-edit out '*__*' if theYea. We could try to do that. I don't know how far it will get us,
but if we have to "steal" code we can rip a good part from JGit.
Its BSD-like, but has that "icky Java smell" to it. :-)Before worrying about where we get implementation bits from I'm
more interested in trying to get a consistent view of what our
namespace looks like, and what our calling conventions are, so we
have some sort of benchmark to measure APIs against as we add them
to the implementation.--
Shawn.
--
FWIW I've read what you say about types, while this is good design to
make things abstract, accessors are slower _and_ disallow many
optimizations as it's a function call and that it may clobber all your
pointers values.For types that _will_ be in the tight loops, we must make the types
explicit or it'll bite us hard performance-wise. I'm thinking what is
"struct object" or "struct commit" in git.git. It's likely that we will
loose a *lot* of those types are opaque.struct object in git has not changed since 2006.06. struct commit hasn't
since 2005.04 if you ignore { unsigned int indegree; void *util; } that
if I'm correct are annotations, and is a problem we (I think) have to
address differently anyways (I gave my proposal on this, I'm eager to
hear about what other think on the subject). So if in git.git that _is_
a moving target we have had a 2 year old implementation for those types,
it's that they're pretty well like this.It's IMNSHO on the matter that core structures of git _will_ have to be
made explicit. I'm thinking objects and their "subtypes" (commits,
trees, blobs). Maybe a couple of things on the same vein.--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Yea, optimizing C is a bitch.
I'm in favor of accessors *IN THE APPLICATION*.
Within the library's own C code, I think we should expose the struct,
and use its members where it makes sense to. Especially in the
really tight loops where we don't want to introduce more overhead.My rationale here is that we can change the struct at any time,
Yes, but I'm arguing they should be opaque to the application, and
visible to the library. Today the application is suffering from
massive fork+exec overhead. I really don't give a damn if the
application's compiler has to deal with a function call to read
from a private member of an opaque type. Its still thousands of
CPU instructions less per operation.Come back to me a year after libgit2 has been widely deployed on
Linux distros and we have multiple applications linking to it.
Lets talk then about the harmful performance problems caused by
making these types opaque to the application. About that time
we'll also be talking about how great pack v4 is and why its a good
thing those types were opaque, as we didn't have to break the ABISure, but in the library only.
--
Shawn.
--
The problem is not the function call, it's really cheap on modern CPUs.
The problem is that across a function call the compiler cannot make
some kind of optimizations and _that_ isn't cheap.For example, pointers whose value have been loaded from the store into a
register have to be loaded again. A function call trashes all theWell I fear it'll be more than a few percent harm, and I can already see
Linus argue against the switch to libgit2 for git-core because it lost
2% performances ;P--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Accessors are very nifty for one thing though; With a debugging flag,
you can use an accessor-function, while without that debugging flag you
can use a macro instead of a function. In other words, you use the
compiler as a sort of sanity-checker that you're only accessing the
variables through the proper macros.This method introduces a bit of extra code (50% of which is always
dead) for each struct it's used on, but it makes debugging large-ish
pieces of software relatively simple, since access to all object typesThe last sentence doesn't parse. I assume you mean "if those types are..",
in which case it'll be solved by using accessor-macros and forward-declaringI agree. "git_commit", "git_tree", "git_blob" and "git_tag" can almost
certainly be set in stone straight away.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
True, accessors slow things down. But I'm not sure that the
accessors at the application level are going to be a huge problem.Where the CPU time really matters is inside the tight loops of the
library, where we can expose the struct to ourselves, because if
the layout changes we'd be relinking the library anyway with the
updated object code.I would rather stick with accessors right now. We could in the
future expose the structs and convert the accessors to macros or
inline functions in a future version of the ABI if performance is
really shown to be a problem here from the *application*.Remember we are mostly talking about applications that are happy to
fork+exec git right now. A little accessor function call is *still*Eh, I disagree here. In git.git today "struct commit" exposes its
buffer with the canonical commit encoding. Having that visible
wrecks what Nico and I were thinking about doing with pack v4 and
encoding commits in a non-canonical format when stored in packs.
Ditto with trees.Because git.git code goes against that canonical buffer we cannot
easily insert pack v4 and test the improvements we want to make.
The refactoring required is one of the reasons we haven't done pack
v4 yet. _IF_ we really are going through this effort of building
a different API and shifting to its use in git.git I want to make
sure we at least initially leave the door open to make changes
without rewriting everything *again*.Accessor functions can usually be inlined or macro'd away. But
they cannot be magically inserted by the compiler if they aren't
there in the first place. This isn't Python... :-)--
Shawn.
--
Err... isn't that backwards? Surely you want to store stuff in the
canonical format so you're forced to do as few translations as
possible? Or are you trying to speed up packing by skipping the
canonicalization part? If so, that would slow down reading (orWell, if macro usage is adhered to one wouldn't have to worry,
since the macro can just be rewritten with a function later (if,
for example, translation or some such happens to be required).
Older code linking to a newer library would work (assuming the
size of the commit object doesn't change anyway), but newer code
linking to an older library would not. Otoh, they wouldn't even
build unless they used the wrong header files, so there is nothingWhat I meant was this (I'm a tad drunk, so read the spirit, not
the letter):in "foo-api.h":
--%<--%<--
#ifdef BUILDING_FOR_DEPLOYING
#include "git_foo_decls.h"
# define git_foo_get_buf(git_foo) (git_foo->buf)
#else
#include "git_foo_fwd_decls.h"
extern const char *git_foo_get_buf(git_foo *foo);
#endif
--%<--%<--foo.c
--%<--%<--
#include "foo-api.h"
#include "git_foo_decls.h"
#ifndef BUILDING_FOR_DEPLOYING
const char git_foo_get_buf(git_foo *foo)
{
return foo->buf;
}/* other accessors go here */
#endif/* rest of git_foo manipulators go here */
--%<--%<--It's almost certainly not worth it for libgit2 though, as git@vger
provides a good review system.--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
No. We suspect that canonical format is harder to decompress and
parse during revision traversal. Other encodings in the pack file
may produce much faster runtime performance, and reduce page faults
(due to smaller pack sizes).We hardly ever use the canonical format for actual output; most
output rips the canonical format apart and then formats the data
the way it was requested. If we have the data *already* parsed inWrong; we're trying to speed up reading. Packing may go slower,
especially during the first conversion of v2->v4 for any given
repository, but packing is infrequent so the minor (if any) dropYou are assuming too much magic. If the older ABI used a macro
and the newer one (which supports pack v4) organized struct commit
differently and the user upgrades libgit2.so the older applications
just broke, horribly.We know we want to do pack v4 in the near future. Or at least
experiment with it and see if it works. If it does, we don't
want to have to cause a major ABI breakage across all those newly
installed libgit2s... yikes.I'm really in favor of accessor functions for the first version of
the library. They can always be converted to macros once someone
shows that their git visualizer program saves 10 ms on a 8,000 ms
render operation by avoiding accessor functions. I'd rather spend
our brain cycles optimizing the runtime and the in-core data so
we spend less time in our tight revision traversal loops.Seriously. We make at least 10 or 11 function calls *per commit*
that comes out of get_revision(). If the formatting application is
really suffering from its 4 or 5 accessor function calls in order
to get that returned data, we probably should also be looking at
how we can avoid function cals in the library.Oh, and even with 4 or 5 accessor functions per commit in the
application that is *still* better than the 10 or so calls the
application probably makes today scraping "git log --format=raw"
off a pipe and segment it into the different ...
I'll have to look into the pack v4 stuff, as I can't get what you're
saying to make sense to me. Not canonicalizing the data when storing
it means you'll have to have conversion routines from all the various
encodings, unless you first canonicalize it and then encode it in the
way you need it in the pack way. Never using canonical format means
it has no potential for combinatorial explosion, and every converterNaturally. Re-sizing objects that weren't previously protected by
accessor functions will always break the ABI unless the layout of
the previously existing items in the object doesn't change, whichThat's not necessary, although it requires a bit more thought when
I agree that for early versions they should definitely be functions,
but since we've been talking about "final design" earlier in theAll comps where I use git today are multi-core systems, so I'm very
much in favour of parallell processing. Otoh, I also don't think it
makes sense to jump through hoops to make sure one can, fe, read and
parse all tags in multiple threads simultaneously. In other words, I
don't think we need to bother adding thread-safety for stuff that
really only should happen if the application is poorly designed
(unless it's straightforward to do so, ofcourse).--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
This was a typo, indeed s/of/if/
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Ugh.
You could talk me into it if you promise never typedef structures (or
pointer to structures) with such symbols, I guess.--
I should write that one down in CONVENTIONS.
IMHO:
typedef uint32_t uid_t; /* sane */
typedef enum {...} status_t; /* sane */
typedef struct foo_t foo_t; /* sane */typedef struct {...} foo_t; /* borderline insane */
typedef char* str_t; /* totally nuts */
typedef char**** str_pppp_t; /* totally nuts */Hiding the fact that scalar types like a uid_t are 32 bits on
this system is reasonable. Heck, uid_t is already in POSIX,
we shouldn't fight that sort of idea. It at least improves
documentation somewhat.Hiding the fact that some scalar type is an enum, so you don't have
to type "enum blah" everywhere is also reasonable. Its slightly
better than #define some magic constants and passing an int
everywhere. Its a reasonable balance between reducing keystrokes
and keeping the code semi-self-documenting.Hiding the fact that an opaque struct (or union) you cannot ever
see the members of is a struct or union is good API design. You can
later change the major class from struct to union or back, or totally
redefine it, but the caller never needs to know what is going on.Hiding a pointer is wrong. Callers should know they are getting a
pointer, or are being asked to supply a pointer-to-a-pointer. So the
"FILE*" stdio functions are sane, because we don't know what is under
a FILE type but we do know when we are dealing with a pointer to one.My original proposal didn't stick _t onto the end of everything,
because I didn't think it was really necessary. I'm fine with it
either way. It may be better to include the _t suffix, it seems
to be somewhat common in libraries.--
Shawn.
--
Well, I propose the following: we set-up on GNU-ld + gcc enabled systems
all what is needed to use symbol visibility, which isn't that intrusive,
and also rather easy given your GIT_EXPORT macro definition.This way, people who care about portability across all libgit2 supported
platforms will have to align on the lowest common denominator, which
will not have any kind of private stuff available, so we're safe. And
GCC/GNU-ld enabled platforms cover most of the popular platforms (namely
linux and *BSD, I'm not sure about Macos dynlibs). Even win32 has kind
of what you need to do visibility I think.IOW prehistoric systems will be have to cope with that because of Linux
(yeah, this is kind of deliciously backwards ;p).No, my worry was rather wrt git core itself, I really think we _must_
make it link against libgit2 if we want libgit2 to stay current, but git
core will _very likely_ need the private stuff, and it _will_ be a
problem. I mean we cannot seriously so-name a library and show its guts
at the same time, and I'm unsure how to fix that problem. _that_ was myI'd say we should do both at the same time. Asking people if they would
agree to relicense code can be done in parallel. We could extract a list
of source files that we may need (my extraction included stuff that is
very unlikely to be useful like test-*.c that aren't useful, and some
that are already BSD I think), and see who it yields. It should be
possible to do a matrix source-file x people and see on a per-file basis
what they think.If someone gives me the list of files we should consider (I'm not sure
about a good list right now) I could do the matrix at some fixed sha1
=66rom git.git using git blame -C -M -M -w, and ask people see where it
leads us ?--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
Yet more updates to the API:
http://www.spearce.org/projects/scm/libgit2/apidocs/html/modules.html
In particular I've started to define what revision machinary might
look like, based somewhat on JGit's (public) RevWalk API. The guts
of how to make it work of course aren't yet defined.My goal here is to have the git_revp_t be non-thread safe, but
also to contain the entire object pool, so git_revp_free() will
dispose of any and all git_commit_t's which were created from it.This "fixes" the "we leak everything" behavior and allows
applications to have multiple pools on different threads if
it needs to. Thus the core git_revp_t isn't thread-safe and
doesn't get weighed down by locking if the application wants
multiple threads.Yes, agreed. Only I don't know how to do it myself. I know its
Hmmph. I agree git-core needs to link to libgit2.
I disagree it needs private bits. If we do the library right
git-core can use the public API. And where it cannot its either
not something that is "right" for libgit2 (e.g. its parseopts and
we aren't committed yet to including option parsing) or its highly
experimental and we shouldn't put it into the library (and thus
also git-core) until its more frozen.That said, I don't think its criminal to have git-core include and
link to a static libgit2, especially if git-core's usage of the
library is ahead of what the library itself is able to expose atOff the top of my head some really important ones:
diff-delta.c
object.c
patch-delta.c
refs.c
revision.c
sha1_file.c
sha1_name.cThey form a pretty large part of the guts of what most people want
from a git library.Slightly less important, but still fairly core:
builtin-fetch-pack.c
builtin-send-pack.c
connect.c
remote.c
transport.cIs most of the client side of the git:// transport, something people want.
--
Shawn.
--
Note that on Linux (and also BSDs I think) many pthread locking
functions have stubs in the glibc and cost 0 until you use the
libpthread. So unless we need to use locking inside the tight-loop, it's
virtually free to write a thread-safe library (a function call is
basically 10 times cheaper than an xchg-based lock).(Though for git it would suck as we get libpthread in our depends
through some of the other dependencies.Another way is to use function pointers for the locking, and have a
function to make the object store thread safe by setting the pointers to
functions actually doing locking, and to let it point to functions doing
nothing else.It looks unrealistic to me to let people deal with the locking,
especially if we mean this library to _also_ be used in language
bindings, hence used in sloppily written scripts.But of course, if locking calls are used in the tight loop that would
I will, basically, you need to build everything with -fvisibilty=3Dhidden
in your CFLAGS, and mark the prototypes of symbols you want to export
with __attribute__((visibility("default"))) (that you can set into your
EXPORT_GIT macro when you're building with __GNUC__).You don't even need a linker script (unless we're going to do some
Sure, I didn't mean we have to put it in libgit2, it's highly UI related
and other tool may want to use something else, and it makes no sense for
many languages that have their library already (python, perl do e.g.).Okay I'll let people mention what they would like to see too, and I'll
work from that then.
--=20
=C2=B7O=C2=B7 Pierre Habouzit
=C2=B7=C2=B7O madcoder@debia=
n.org
OOO http://www.madism.org
This 404's for me
- Pieter
--
Nevermind, it's only a clone url.. I'd expected a gitweb or so ;)
Sorry for the noise,
--
