Timothy Witham announced an OSDL special interest group, the Binary Testing SIG, for dicussing ways to ensure and verify that the Linux kernel [forum] provides forward binary compatibility. Timothy exaplins, "this is a real problem for large end users deploying Linux in that they like to be able to run/roll forward the same version of an application for 5 or so years. They can do this with their legacy operating systems and we need to be able to do this with Linux."
Jeff Garzik noted that maintaining userland compatibility of the application binary interface (ABI) "has always been a strongly held value in
Linux". Robert Love [interview] went on to add:
"Yah. With the exception of maybe changing something in /proc (which has been rare, and hopefully will never happen with /sys) the kernel-to-user ABI is really stable. I'd venture, in fact, to say that this effort is very important but does not affect the kernel at all. Current 'fault' lies in things e.g. like the C++ ABI, which is constantly fluctuating (rightly so, to fix bugs, but still)."
From: Timothy D. Witham [email blocked] To: Linux Kernel ML [email blocked] Subject: Announcing Binary Compatibility/Testing Date: Wed, 13 Oct 2004 15:16:53 -0700 Announcing Binary Compatibility/Testing In talking to end users, distributions, OSS developers and large scale ISV's one issue kept popping up. And that is the fact that binaries keep breaking. This is a real problem for large end users deploying Linux in that they like to be able to run/roll forward the same version of an application for 5 or so years. They can do this with their legacy operating systems and we need to be able to do this with Linux. One of the big problems is that these ISV's release and test on a cycle that is measured in calendar quarters and of course the OSS cycle is measured in days. The idea is to move testing of these binary applications upstream to match the OSS development cycle. For this purpose I've started a mailing list to discuss how to accomplish this. I've got slides for anybody who is interested. (PDF.) http://lists.osdl.org/mailman/listinfo/binary_sig http://groups.osdl.org/sig (Follow binary testing for slides) Let the flaming start. :-) Tim -- Timothy D. Witham - Chief Technology Officer - [email blocked] Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005 (503)-626-2455 x11 (office) (503)-702-2871 (cell) (503)-626-2436 (fax)
From: Timothy D. Witham [email blocked] Subject: Re: Announcing Binary Compatibility/Testing Date: Wed, 13 Oct 2004 15:38:48 -0700 On Wed, 2004-10-13 at 15:16 -0700, Timothy D. Witham wrote: > > http://groups.osdl.org/sig (Follow binary testing for slides) http://groups.osdl.org/sigs not sig - sorry Tim -- Timothy D. Witham - Chief Technology Officer - [email blocked] Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005 (503)-626-2455 x11 (office) (503)-702-2871 (cell) (503)-626-2436 (fax)
From: Jeff Garzik [email blocked] Subject: Re: Announcing Binary Compatibility/Testing Date: Wed, 13 Oct 2004 18:39:51 -0400 Timothy D. Witham wrote: > Announcing Binary Compatibility/Testing [...] > Let the flaming start. :-) Userland ABI compatibility has always been a strongly held value in Linux, I don't think we would flame any efforts to support that... Jeff
From: Robert Love [email blocked] Subject: Re: Announcing Binary Compatibility/Testing Date: Wed, 13 Oct 2004 19:24:15 -0400 On Wed, 2004-10-13 at 18:39 -0400, Jeff Garzik wrote: > Userland ABI compatibility has always been a strongly held value in > Linux, I don't think we would flame any efforts to support that... Yah. With the exception of maybe changing something in /proc (which has been rare, and hopefully will never happen with /sys) the kernel-to-user ABI is really stable. I'd venture, in fact, to say that this effort is very important but does not affect the kernel at all. Current "fault" lies in things e.g. like the C++ ABI, which is constantly fluctuating (rightly so, to fix bugs, but still). Any other incompatibility lies in libraries, but we have library versioning. There is nothing wrong with newer libs breaking compatibility so long as they have a different soname. Vendors just need to ship compat libs and ISV's need to make sure they request the right lib and don't touch internals. Robert Love
From: Timothy D. Witham [email blocked] Subject: Re: Announcing Binary Compatibility/Testing Date: Wed, 13 Oct 2004 16:36:32 -0700 On Wed, 2004-10-13 at 19:24 -0400, Robert Love wrote: > On Wed, 2004-10-13 at 18:39 -0400, Jeff Garzik wrote: > > > Userland ABI compatibility has always been a strongly held value in > > Linux, I don't think we would flame any efforts to support that... > > Yah. With the exception of maybe changing something in /proc (which has > been rare, and hopefully will never happen with /sys) the kernel-to-user > ABI is really stable. > I would tend to agree with that statement. > I'd venture, in fact, to say that this effort is very important but does > not affect the kernel at all. Current "fault" lies in things e.g. like > the C++ ABI, which is constantly fluctuating (rightly so, to fix bugs, > but still). > > Any other incompatibility lies in libraries, but we have library > versioning. There is nothing wrong with newer libs breaking > compatibility so long as they have a different soname. Vendors just > need to ship compat libs and ISV's need to make sure they request the > right lib and don't touch internals. > Part of the problem is knowing which things to request. I've envisioned a database that has the matrix of tests and packages so that people like ISV's and system integrators will be able to look up what has been tested and passed. I think that this database is the crucial portion of the new development. I also expect that part of this process will be the finding that an ISV used the API in a way that could of got them in trouble and that the new version closes that hole. In this case it would be a bug on the ISVs side but it would be known long before it got deployed and the ISV could schedule the development and testing of the patch to their software as part of their normal deployment schedule. > Robert Love > Tim -- Timothy D. Witham - Chief Technology Officer - [email blocked] Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005 (503)-626-2455 x11 (office) (503)-702-2871 (cell) (503)-626-2436 (fax)
NPTL threads
Didn't the (relatively) recent change to NPTL threads cause a lot of headaches for Enterprise customers ? I know that we are still battling with getting solid policies and processes in place to deal with that here.
Sybase and Websphere MQ (IBM) don't have an install that will work straight out of the box with NPTL threads, so you have to use the LD_ASSUME_KERNEL environment variable. But if you use this environment variable and install / update RPMs, you risk corrupting your RPM database. We eventually found the solution to this on an Oracle site, but I haven't found RPM_FORCE_NPTL mentioned in too many places.
It's this kind of change within a supposedly stable version of the kernel that can and does scare enterprises away from Linux.
Re: NTPL threads
NTPL is more POSIX compliant than what there was before. In fact programs using the old linuxthreads library can still work under NTPL. Sure, some programs will fail, but NTPL is a step ahead, not behind, so if some program was using old non-POSIX compliant libraries and they were relying in forbidden rules that's their problem... Besides, the RPM problem was a redhat one, involving the berkeley db and rpm, and they fixed it. Personally I went from a non-NTPL to a NTPL libc (without changing the rest of the userspace) and nothing did break (KDE uses lots of threads and could have broken easily, etc)
Not just NTPL
ISVs need more than "tough luck - it changed". Dealing with glibc versions is bad enough. Build a package against Solaris 2.6, and you can be pretty sure it'll run on Solaris 10 - that's more then a 5-year gap.
Can't say the same the other way around - if you depend on GNOME libs, they weren't there in Sol 2.6, of course.
OS-level stuff should stay the same, not just the API (even that isn't guaranteed with Linux) but ABI, otherwise vendors have to say "this package (RPM, tarball, whatever) works with RHEL 2.x" - download this one for RHEL 3.x, download this one for SuSE, etc.
PITA, at best.
Not just NTPL
Actually if you stick to the _public_ APIs and link dynamically you're
guranteed your app still works. I still have glibc 2.1-linked binaries
around here. The big problem in the glibc 2.3/nptl transition was that
lots of propritary binaries (and some OSS ones like wine) relied on
undocumented internal APIs.
In addition to that you can also have multiple libc versions, a few
years ago I still had some a.out binaries linked against libc4 around,
which worked nicely on a recent system as long as you installed the
old libraries.
Ah, no, no you're not guarant
Ah, no, no you're not guaranteed that at all. NPTL changed signal semantics *significantly*. Yes the new behaviour is more POSIX compliant but so what? Software before was not written to work on some mythical POSIX system with millions of users, it was written to work on Linux - you cannot make such huge changes which break things so much and say "well it's your own fault for writing buggy software" when there was hardly any way to avoid writing "buggy" (ie linux not posix compliant) software before.
In the case of Wine, before NPTL Linux threading was so awful that it had no choice but to rely on non-POSIX behaviour. There is simply no other way to do it. Many other programs were in the same boat: Linux threading was bad so they hacked around it, threading changed and their hacks broke. Screwed if you do, screwed if you don't ...
Crap excuses like that are definitely scaring enterprises (and end users) away from Linux. Testing syscall compatibility is meaningless while compatibility is placed behind standards compliance as a goal.
Perfect solution
Enterprises: Go fuck your selves. We dont need to worry about you. If you are a user and dont need NPTL --- STAY WITH THE FUCKING LINUXTHREADS --- We dont obligate you to change !@!#$@#$ Linux 2.6 *WORKS* with the old LT.
That's a great attitude
Linux: Ready for the Enterprise. Or not.
Re: Ah, no, no you're not guarant
The signal semantic did *not* change if you use correctly signals and threads. That's why glibc developers said nothing changed. (man pthread_kill/pthread_sigmask).
Right, but sometimes "compatible" is "bug compatible"
Especially in the case of LinuxThreads, it sounds like many useful applications had to rely on specific peculiarities of the thread implementation, instead of simply the published API. I personally don't know, in this case, if Wine really had to do this just to work (e.g. work around the API out of actual need), or out of expediency.
At any rate, it often happens that an application unintentionally relies on some corner condition of the OS and its libraries. It's all well and good to say "Yeah, you violated the API and/or ABI, so it's your problem." That's fine to say for a current package currently under development--the developers can fix their bug and move on. For "dusty old binaries," there is some expectation that the undocumented and unspecified/underspecified behaviors of the OS don't change radically release-to-release. In essence, the OS is "bug compatible," insofar as it doesn't needlessly turn innocuous application bugs into show-stoppers.
This is different from "bug-compatible" in the sense that you can't fix bugs in the OS if some apps happen to rely on them. That's a tougher argument to make, and it really depends on the nature of the bugs, and the nature of the apps. (Tongue-in-cheek: For instance, virus software relies on the bugs that make the OS susceptible to viruses. Should we leave those bugs open so that virus software's still relevant?)
For a really interesting example of "unintentionally relying on unspecified or underspecified behavior", consider how bash broke when the kernel changed fork() to run child before parent. The API leaves the execution order unspecified. Bash apparently relied on the parent running first, at least occasionally.
Ignore the PITA - focus on the cost
Each of those packages that you descibe an ISV as having to maintain has an associated cost. Building the package, testing it, getting it certified by the vendor - those all cost.
Now you come to a security update, and you've got to test not just with your Linux package, but with every vendor distribution package. That incurs even more cost. This is why a lot of vendors will only offer official support for one or two flavours of Linux.
are you sure that wasn't a redhat problem?
Redhat ships a lot of non standard stuff in their kernels and I'm pretty certain they shipped NPTL stuff before it was included into mainline. That's going to cause issues if you upgrade to the stock kernel and suddenly major features are missing.
That's really how it's supposed to work. People use RedHat kernels with the understanding that any extra features are RedHat specific and not necesarily going to stay a long time. People who are willing to do the extra work, like Oracle, will code for the advanced features.
Then after RedHat has tested a feature a long time and people like it, it gets included in the stock kernel and you know it's going to be supported.
redhat y/n
Red Hat does do some non-mainline stuff (not as much as SuSE, but they definately do 'some'). However, the NPTL shift was a more generic problem. The problems weren't related to Red Hat's implementation of NPTL vs the standard one.
Today, NPTL is part of mainli
Today, NPTL is part of mainline kernels. Oracle still requires LD_LIBRARY_ASSUME and some other hacks to work. As does Sybase. As does Websphere MQ. So I hardly see how we can blame Red Hat here.
To take that a step further, the Kernel development team have decided that we have to trust our distro vendors for stability. So which way do we have to look at this to balance both of those views ? It's Red Hat's fault, but we have to rely on Red Hat because the Kernel team are not focussed on that stability ?
Re: Today, NPTL is part of mainli
NPTL IS NOT IN THE KERNEL. IT'S A FUCKING LIBRARY.
not blaming redhat at all
RedHat included it before the stock kernel did. That's great. That's
what their supposed to do. Open Source. Ra ra ra.
Now RedHat can say "Oracle works best on RHAS."
Of course, when someone upgrades RHAS (and voids their support
contract) they are also going to run into trouble with kernel features
disapearing. And so they blame Linux.
It's not fair that they blame Linux. If they want to blame RedHat,
that's OK by me. RedHat gets paid to take the blame and make the
customer happy. Hopefully RedHat will educate them in a nice way and
everyone makes a lot of cash.
I don't blame RedHat. But the way Linux is distributed is a lot more
complicated than the way Windows or Solaris is distributed. If people
don't understand how it works, then it's my job to pass the buck.
;)
NPTL threads
NPTL did not alter any existing kernel interfaces!
LD_ASSUME_KERNEL directs the userspace *library* code to switch to using the older LinuxThreads implementation which the *kernel* still supports.
Statically linked binaries do still work since they snapshot the libraries. Dynamically linked binaries would still work if they aggressively required version numbers for all libraries (and shipped said libraries).
This is not so say that binary compatibility could not be done better, simply that, as Robert Love noted, it does not really require changing the working practice of the kernel developers.
You can't "aggressively requi
You can't "aggressively require" non-NPTL versions of glibc. The glibc team either wrongly believed it was compatible or didn't care, either way the only method to get the behaviour the app expects is to utilise a "please unbreak me" environment variable which did not exist before NPTL was created. That's not an acceptable solution, especially as the type of breakage you get if you don't specify this variable can be extremely subtle (deadlocks, races etc) and only people with good knowledge of threading and debugging can figure it out, ie not end users.
LD_ASSUME_KERNEL
the only method to get the behaviour the app expects is to utilise a "please unbreak me" environment variable which did not exist before NPTL was created
I'm pretty sure this environment variable existed before NPTL. Here is a page by Ulrich Drepper on its effect: http://people.redhat.com/drepper/assumekernel.html
From a quick Google for LD_ASSUME_KERNEL though, you will find references to it at least as far back as 2001 (when Redhat 7.1 was released, I think), far before NPTL.
"static linking"
When I upgraded from glibc2.2 to 2.3, the statically linked executables crashed. The problem is, that NSS (Name Service Switch, what you configure in /etc/nsswitch.conf) is implemented by dynamically loading shared objects, so every program using e.g. gethostbyname(), getservent() or getpwent(), i.e. using TCP/IP or the user database, is effectively dynamic. But the binary interface of these shared objects changed between the glibc revisions. Shared executables automatically use the new libc, which of course works with the new modules, and is itself compatible to the old one, but static executables contain the old libc, which crashes. (The NSS modules actually contain the glibc version in their filenames, but obviously the executable I used tried to load them via the unversioned symlinks).
As you can see, statically linked executables are actually _less_ future proof, if they make use of the mentioned interfaces.
Static linking seems to be deprecated, glibc is the Linux API.
As you can see, statically linked executables are actually _less_ future proof, if they make use of the mentioned interfaces.
No, it does not prove that. It only demonstrates that you effectively
cannot make statically linked executables with glibc! The mentioned interfaces are frequently used, in fact I suspect some
of them can get implictly used by other library services.
All this means that glibc is now effectively part of the Linux API,
and its binary compatibility must be taken equally seriously than
that of the kernel. Or maybe even more seriously, since most
applications use just the library level services, not kernel
directly.