logo
Published on KernelTrap (http://kerneltrap.org)

Linux: Unified x86 Architecture

By Jeremy
Created Jul 23 2007 - 20:28

Thomas Gleixner described an effort to create a unified x86 architecture tree, "the core idea behind our project is simple to describe: we introduce a new arch/x86/ and include/asm-x86/ file hierarchy that includes all the existing 32-bit and 64-bit x86 code and allows the building of either a 32-bit (i386) kernel or a 64-bit (x86_64) kernel." Andi Kleen expressed some concern, "I think it's a bad idea because it means we can never get rid of any old junk. IMNSHO arch/x86_64 is significantly cleaner and simpler in many ways than arch/i386 and I would like to preserve that. Also in general arch/x86_64 is much easier to hack than arch/i386 because it's easier to regression test and in general has to care about much less junk. And I don't know of any way to ever fix that for i386 besides splitting the old stuff off completely." Additional concerns about legacy issues were countered by Linus Torvalds, "there really isn't that much legacy crud. There are things like random quirks, but every time I hear the (theoretical) argument about how much time and effort we save by having it duplicated somewhere else, I think about all the time we definitely waste by fixing the same bug twice (and worry about the cases where we don't)." Among the justifications for a unified architecture, Thomas noted:

"We believe that the whole x86 CPU family is very much related and should be supported in a single architecture tree. All 64-bit CPUs implement the ability to execute pure 32-bit kernels, and will probably do so for the next couple of decades. So it's not like it will ever be possible to get rid of our legacies: for example even the latest 64-bit CPUs implement the legacy 'A20 line' feature that was already a weird outdated hack in the days of 16-bit x8086 CPUs."


From:	Thomas Gleixner [email blocked]
To:	LKML [email blocked]
Subject: [RFC, Announce] Unified x86 architecture, arch/x86
Date:	Sat, 21 Jul 2007 00:32:59 +0200

We are pleased to announce a project we've been working on for some 
time: the unified x86 architecture tree, or "arch/x86" - and we'd like 
to solicit feedback about it.

What is this about?
-------------------

The topic of sharing more x86 code has been discussed on LKML a number 
of times. Various approaches were discussed and we decided to advance 
the discussion by implementing a full solution that brings the 
transition to a shared tree to completion.

Warning: our approach is quite a bit more extreme than what has been
suggested before.

The core idea behind our project is simple to describe: we introduce a 
new arch/x86/ and include/asm-x86/ file hierarchy that includes all the 
existing 32-bit and 64-bit x86 code and allows the building of either a 
32-bit (i386) kernel or a 64-bit (x86_64) kernel.

In this initial implementation the old arch/i386 and arch/x86_64 trees 
are removed _immediately_, in the same commit, and all future x86 
development goes on in the new, shared tree. So the transition right now 
is one atomic operation.

As a next step we plan to generate a gradual, fully bisectable, fully
working switchover from the current code to the fully populated
arch/x86 tree. It will result in about 1000-2000 commits. We are
releasing our current solution because it 100% represents the finally
resulting arch/x86 source tree already, and we first wanted to make
sure that the new architecture layout works fine and folks are happy
before we go and do the (even more complex) fine-grained work.

A git tree is available from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git

One (large!) combo patch is available at:

   http://kernel.org/pub/linux/kernel/people/tglx/linux-x86.2.6.22-git-ede13d.combo.patch.bz2 [1]

the patch is against this upstream -git head:

  commit ede13d81b4dda409a6d271b34b8e2ec9383e255d

It makes little sense to apply this patch to anything else because
these architectures are such a fast-moving target.


Why do we want to do this?
--------------------------

We believe that the whole x86 CPU family is very much related and should 
be supported in a single architecture tree. All 64-bit CPUs implement 
the ability to execute pure 32-bit kernels, and will probably do so for 
the next couple of decades. So it's not like it will ever be possible to 
get rid of our legacies: for example even the latest 64-bit CPUs 
implement the legacy "A20 line" feature that was already a weird 
outdated hack in the days of 16-bit x8086 CPUs.

So what can we do? We should learn to live better with our legacies, and 
we should avoid reinventing the wheel in 64-bit code. Today there's 
already some limited code sharing between the i386 and x86_64 trees, but 
it's quite non-obvious: it's either done via a placeholder file that 
#include's an out-of-arch file, or a Makefile rule that uses an
out-of-arch file. These 'cross-tree' file uses are not visible at the 
target point, and people sometimes break "the other arch" if they modify 
the file, because they are not aware of there being code sharing.

Furthermore, the separate i386 and x86_64 trees often cause kernel 
writers to (unconsciously) think in either 32-bit or in 64-bit terms, 
instead of thinking about the two things in one way. It also happened 
numerous times that code that was originally copied from arch/i386 gets 
64-bit-only additions in arch/x86_64, and if a bug is fixed in the 
32-bit code, that fix is not applied to the x86_64 tree. Or some 
function is cleaned up and improved in the x86_64 tree, but that 
improvement is not easily adaptable to arch/i386, because the x86_64 
code became 64-bit only already.

All in one: the two architecture trees are "way too far apart" from each 
other, which causes the source code to diverge not only physically but 
structurally as well. The whole setup works _against_ sharing code, 
instead of working _for_ sharing code. This causes double maintenance 
even for functionality that is conceptually the same for the 32-bit and 
the 64-bit tree. (such as support for standard PC platform architecture 
devices)


How did we do it?
-----------------

As an initial matter, we made it painstakingly sure that the resulting 
.o files in a 32-bit build are bit for bit equal. (at least in terms of 
.text, small .data differences are there due to the namespace changes). 
We also made it sure that you can pick up _any_ existing 32-bit .config, 
stick it into our new arch/x86 tree and build a fully working 32-bit 
kernel. Same is the goal for any existing 64-bit .config as well.

The shared x86 tree is _not_ "merge the 32-bit legacy code into the 
x86_64 tree". It is _not_ "create a x86_64 tree that can run on modern 
32-bit hardware too, leave arch/i386 around for ancient stuff". It is a 
unification between equals, all legacies and all new code is unified 
into one shared tree. Nothing is left behind.

A key component of our change is that only a very small portion of the 
conversion was done 'manually' - the overwhelming majority of file 
movement happened in an automated fashion. We did this to reduce the 
chance of human errors in the process of moving and rearranging more 
than a 1000 files.

We also made the change fully bisectable: you can bisect 'across' the
big arch/x86 commit and the .config's will be picked up correctly. The
git-history of all previously existing files has also been preserved
via the use of git's file-movement feature. The more fine-grained
multiple commits approach which we are ready to do is also providing a
fully bisectable and history preserving solution.

How is the new arch/x86 and include/asm-x86 namespace layed out? Our 
foremost concern was to enable a 100% smooth transition to the new, 
shared architecture, while still enabling much more fine-grained future 
unification of the source code. To do this we consciously aimed for the 
strictest possible unification strategy: we only 'unified' those source 
files that are already bit for bit equal between the two architectures 
today. For all other files we used the following rule: if a file came 
from arch/i386/foo/bar.c, it gets moved to arch/x86/foo/bar_32.c, if it 
came from arch/x86_64/foo/bar.c it gets moved to arch/x86/foo/bar_64.c. 
We also generated arch/x86/foo/bar.c that simply #include's those two 
files (depending on whether we do a 32-bit or a 64-bit built). If a file 
only existed in only one of the architectures, it's moved to 
arch/x86/foo/bar.c straight away. (take a look at our git repository to 
see how this works out in practice.)

Include files are handles similarly in include/asm-x86, with the 
exception of include files that are exported to user-space: for those, 
to preserve the source code API towards user-space, Kbuild creates 
symlinks from the _32.h or _64.h files to the .h file, instead of a stub 
.h file. In the future the number of such symlinks can be reduced 
significantly by unifying the files.

The arch/i386 and arch/x86_64 trees are removed completely, except for a 
small Kconfig and kbuild stub to ease bisection and to ease the import 
(and export) of .configs into the new, shared x86 architecture.

And we are serious about no-compromises compatibility of the transition: 
we built and booted an arch/x86 kernel on a real i386 DX CPU. (Yes, an 
old 33 MHz one. No, the CPU did not melt, Linux booted up fine!) We also 
built and booted a 64-bit kernel on a quad-core 64-bit CPU from the 
shared tree. (and on a number of other x86 systems.)


What will happen to arch/x86 in the future?
-------------------------------------------

Future, fine-grained unification is the main idea behind the new layout: 
'unifying' a 32-bit and a 64-bit source code file will be a matter of 
creating a single .c file from a _32.c and _64.c file. Those patches 
will be easy to review and will be straightforward to create. We chose 
not to do any of those unifications in this initial work yet, even if 
they were easy to do, to be able to guarantee the bit-for-bit 
equivalency of the new tree to the old trees.


When should this go upstream?
-----------------------------

We actually think that the sooner we get over with this, the better.

Once the precise method is agreed upon, the best period of the
transition is when other larger-scale changes are done typically:
right at the end of a merge window, when most of the architecture
flux has flown into the tree already (so that the transition does not
cause hickup in merge activities), but when we still have a maximum
amount of time left to fix up any effects of the unification.

This tree cannot really be carried in -mm or in other devel trees
due to its size and intrusiveness.

This is also true for the fine-grained solution, which should be done
in one go as well. We do not believe that a "Chinese 5 year transition
plan" is something useful. When we switch in one go we have two
advantages:

1) it is a single synchronization point for folks with patches against
   that code

2) it is more likely that people tackle the unification of file_32.c
   and file_64.c which are in the same source directory than the
   unification of arch/i386/..../file.c and arch/x86_64/..../file.c

As usual, comments and suggestions are welcome!

	Thomas, Ingo


From: Jeff Garzik [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 18:38:39 -0400 I agree with Andi... it's quite nice to be able to leave some arch/i386 stuff, and not carry it over to arch/x86-64. Jeff
From: Ingo Molnar [2] [email blocked] To: Jeff Garzik [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 00:40:33 +0200 * Jeff Garzik [email blocked] wrote: > I agree with Andi... it's quite nice to be able to leave some > arch/i386 stuff, and not carry it over to arch/x86-64. we can leave those few items in arch/x86 just as much. No need to keep around a legacy tree for that. Ingo
From: Jeff Garzik [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 18:42:42 -0400 Ingo Molnar wrote: > * Jeff Garzik [email blocked] wrote: > >> I agree with Andi... it's quite nice to be able to leave some >> arch/i386 stuff, and not carry it over to arch/x86-64. > > we can leave those few items in arch/x86 just as much. No need to keep > around a legacy tree for that. By extension it makes doing that sort of thing, in general, more difficult. Which is IMO not desirable. Jeff
From: Linus Torvalds [email blocked] To: Jeff Garzik [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 15:51:51 -0700 (PDT) On Fri, 20 Jul 2007, Jeff Garzik wrote: > Ingo Molnar wrote: > > * Jeff Garzik [email blocked] wrote: > > > > > I agree with Andi... it's quite nice to be able to leave some arch/i386 > > > stuff, and not carry it over to arch/x86-64. > > > > we can leave those few items in arch/x86 just as much. No need to keep > > around a legacy tree for that. > > By extension it makes doing that sort of thing, in general, more difficult. > Which is IMO not desirable. I think it's *much* harder to carry legacy things around in an old tree that almost nobody even uses any more (probably not true yet, but for most of the main developers, I bet it will be true in a year). Especially one that just duplicates 99% of the stuff. There really isn't that much legacy crud. There are things like random quirks, but every time I hear the (theoretical) argument about how much time and effort we save by having it duplicated somewhere else, I think about all the time we definitely waste by fixing the same bug twice (and worry about the cases where we don't). Linus
From: Andi Kleen [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 07:37:58 +0200 On Saturday 21 July 2007 00:32, Thomas Gleixner wrote: > We are pleased to announce a project we've been working on for some > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > to solicit feedback about it. Well you know my position on this. I think it's a bad idea because it means we can never get rid of any old junk. IMNSHO arch/x86_64 is significantly cleaner and simpler in many ways than arch/i386 and I would like to preserve that. Also in general arch/x86_64 is much easier to hack than arch/i386 because it's easier to regression test and in general has to care about much less junk. And I don't know of any way to ever fix that for i386 besides splitting the old stuff off completely. Besides radical file movements like this are bad anyways. They cause a big break in patchkits and forward/backwards porting that doesn't really help anybody. > This causes double maintenance > even for functionality that is conceptually the same for the 32-bit and > the 64-bit tree. (such as support for standard PC platform architecture > devices) It's not really the same platform: one is PC hardware going back forever with zillions of bugs, the other is modern PC platforms which much less bugs and quirks To see it otherwise it's more a junkification of arch/x86_64 than a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 at all. > How did we do it? > ----------------- > > As an initial matter, we made it painstakingly sure that the resulting > .o files in a 32-bit build are bit for bit equal. You got not a single line less code duplication then, so i don't really see the point of this. -Andi
From: Thomas Gleixner [email blocked] To: Andi Kleen [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 10:15:50 +0200 On Sat, 2007-07-21 at 07:37 +0200, Andi Kleen wrote: > On Saturday 21 July 2007 00:32, Thomas Gleixner wrote: > > We are pleased to announce a project we've been working on for some > > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > > to solicit feedback about it. > > Well you know my position on this. I think it's a bad idea because > it means we can never get rid of any old junk. IMNSHO arch/x86_64 > is significantly cleaner and simpler in many ways than arch/i386 and I would > like to preserve that. Also in general arch/x86_64 is much easier to hack > than arch/i386 because it's easier to regression test and in general > has to care about much less junk. And I don't > know of any way to ever fix that for i386 besides splitting the old > stuff off completely. I disagree of course. I worked on both trees quite intensive over the last years and I broke x86_64 more than once when hacking on i386 and vice versa. Your "junk" argument is nothing else than a strawman which you beat on every time when this discussion comes up. > Besides radical file movements like this are bad anyways. They cause > a big break in patchkits and forward/backwards porting that doesn't > really help anybody. Interestingly enough the folks with the big patch kits (Virtualization) would be quite happy about that move. > > This causes double maintenance > > even for functionality that is conceptually the same for the 32-bit and > > the 64-bit tree. (such as support for standard PC platform architecture > > devices) > > It's not really the same platform: one is PC hardware going back forever > with zillions of bugs, the other is modern PC platforms which much less > bugs and quirks It _IS_ the same platform. x86_64 is PC hardware with zillions of bugs as well. And it is not modern at all. It is nothing else than a 64 bit version of the legacy x86. > To see it otherwise it's more a junkification of arch/x86_64 than > a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 > at all. We went for a 1 : 1 replacement without merging anything which is not obvious in the first place (identical files and files, which are just including some other file). That way we were able to do a binary compatible migration. The clean up is the next step and there are enough folks out there willing to help on this. > > As an initial matter, we made it painstakingly sure that the resulting > > .o files in a 32-bit build are bit for bit equal. > > You got not a single line less code duplication then, so i don't really > see the point of this. Really ? The script detected 15 identical files with a simple cmp. It also unified another 10 by simply looking at the only line in there "include <the other arch/file>" And there is more of that, when you take the time and look closely at the _32.[ch] _64.[ch] files which are created by the merge. tglx
From: David Woodhouse [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 11:37:46 +0100 On Sat, 2007-07-21 at 00:32 +0200, Thomas Gleixner wrote: > We are pleased to announce a project we've been working on for some > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > to solicit feedback about it. Oooh, shiny. We've been talking about how useful this would be for years. Experience with the unification of PowerPC shows that it's definitely the right thing to do -- it reduces the number of gratuitous differences between 32-bit and 64-bit code, and makes it far more easier to ensure that bug-fixes and new features get added to both at the same time. > When should this go upstream? > ----------------------------- > > We actually think that the sooner we get over with this, the better. I'm inclined to agree. -- dwmw2
From: Matt Mackall [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 17:25:09 -0500 On Sat, Jul 21, 2007 at 12:32:59AM +0200, Thomas Gleixner wrote: > How is the new arch/x86 and include/asm-x86 namespace layed out? Our > foremost concern was to enable a 100% smooth transition to the new, > shared architecture, while still enabling much more fine-grained future > unification of the source code. To do this we consciously aimed for the > strictest possible unification strategy: we only 'unified' those source > files that are already bit for bit equal between the two architectures > today. For all other files we used the following rule: if a file came > from arch/i386/foo/bar.c, it gets moved to arch/x86/foo/bar_32.c, if it > came from arch/x86_64/foo/bar.c it gets moved to arch/x86/foo/bar_64.c. > We also generated arch/x86/foo/bar.c that simply #include's those two > files (depending on whether we do a 32-bit or a 64-bit built). If a file > only existed in only one of the architectures, it's moved to > arch/x86/foo/bar.c straight away. (take a look at our git repository to > see how this works out in practice.) Can we see some stats on: How many files were auto-merged? How many files got 32.c and 64.c extensions? How many existed only in one arch? -- Mathematics is the supreme nostalgia of our time.
From: Thomas Gleixner [email blocked] To: Chris Wright [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sun, 22 Jul 2007 09:50:46 +0200 On Sat, 2007-07-21 at 16:51 -0700, Chris Wright wrote: > * Matt Mackall wrote: > > Can we see some stats on: > > > > How many files were auto-merged? > > How many files got 32.c and 64.c extensions? > > How many existed only in one arch? > > It's mostly about file movement first. > > 918 files changed, 4745 insertions(+), 2836 deletions(-) Hmm, did you forget to make distclean ? Numbers from the script: include/asm-i386 240 files include/asm-x86_64 169 files ------------------------------ 409 files include/asm-x86 389 files arch/i386 335 files arch/x86_64 141 files ------------------------------ 476 files arch/x86 484 files The increase here is due to migration helper files which only include the (_32.x or the _64.x) variant. Makefile helpers 9 files Kconfig helpers 1 file Source helpers 4 files ------------------------------ 14 files Summary: vanilla 22657 files vanilla->x86 22649 files ------------------------------ include/x86 has 125 _32 and 125 _64 files arch/x86 has 55 _32 and 55 _64 files 25 files were auto-merged Looking at include/asm-x86/*_[32/64].h there are offhand ~ 50 of the 125 which differ only minimal (white space damage, comment changes, ...), where the unification is a no brainer. tglx
From: Matt Mackall [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sun, 22 Jul 2007 07:02:13 -0500 On Sun, Jul 22, 2007 at 09:50:46AM +0200, Thomas Gleixner wrote: > > Numbers from the script: > That looks more promising than I would have expected. For what it's worth, I was originally fairly disgusted by the _32/64.c thing, but the idea grows on me. -- Mathematics is the supreme nostalgia of our time.



Related Links:


Source URL:
http://kerneltrap.org/node/13984