Thomas Gleixner described an effort to create a unified x86 architecture tree, "the core idea behind our project is simple to describe: we introduce a new arch/x86/ and include/asm-x86/ file hierarchy that includes all the existing 32-bit and 64-bit x86 code and allows the building of either a 32-bit (i386) kernel or a 64-bit (x86_64) kernel." Andi Kleen expressed some concern, "I think it's a bad idea because it means we can never get rid of any old junk. IMNSHO arch/x86_64 is significantly cleaner and simpler in many ways than arch/i386 and I would like to preserve that. Also in general arch/x86_64 is much easier to hack than arch/i386 because it's easier to regression test and in general has to care about much less junk. And I don't know of any way to ever fix that for i386 besides splitting the old stuff off completely." Additional concerns about legacy issues were countered by Linus Torvalds, "there really isn't that much legacy crud. There are things like random quirks, but every time I hear the (theoretical) argument about how much time and effort we save by having it duplicated somewhere else, I think about all the time we definitely waste by fixing the same bug twice (and worry about the cases where we don't)." Among the justifications for a unified architecture, Thomas noted:
"We believe that the whole x86 CPU family is very much related and should be supported in a single architecture tree. All 64-bit CPUs implement the ability to execute pure 32-bit kernels, and will probably do so for the next couple of decades. So it's not like it will ever be possible to get rid of our legacies: for example even the latest 64-bit CPUs implement the legacy 'A20 line' feature that was already a weird outdated hack in the days of 16-bit x8086 CPUs."
From: Thomas Gleixner [email blocked] To: LKML [email blocked] Subject: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 00:32:59 +0200 We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or "arch/x86" - and we'd like to solicit feedback about it. What is this about? ------------------- The topic of sharing more x86 code has been discussed on LKML a number of times. Various approaches were discussed and we decided to advance the discussion by implementing a full solution that brings the transition to a shared tree to completion. Warning: our approach is quite a bit more extreme than what has been suggested before. The core idea behind our project is simple to describe: we introduce a new arch/x86/ and include/asm-x86/ file hierarchy that includes all the existing 32-bit and 64-bit x86 code and allows the building of either a 32-bit (i386) kernel or a 64-bit (x86_64) kernel. In this initial implementation the old arch/i386 and arch/x86_64 trees are removed _immediately_, in the same commit, and all future x86 development goes on in the new, shared tree. So the transition right now is one atomic operation. As a next step we plan to generate a gradual, fully bisectable, fully working switchover from the current code to the fully populated arch/x86 tree. It will result in about 1000-2000 commits. We are releasing our current solution because it 100% represents the finally resulting arch/x86 source tree already, and we first wanted to make sure that the new architecture layout works fine and folks are happy before we go and do the (even more complex) fine-grained work. A git tree is available from: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git One (large!) combo patch is available at: http://kernel.org/pub/linux/kernel/people/tglx/linux-x86.2.6.22-git-ede13d.combo.patch.bz2 [1] the patch is against this upstream -git head: commit ede13d81b4dda409a6d271b34b8e2ec9383e255d It makes little sense to apply this patch to anything else because these architectures are such a fast-moving target. Why do we want to do this? -------------------------- We believe that the whole x86 CPU family is very much related and should be supported in a single architecture tree. All 64-bit CPUs implement the ability to execute pure 32-bit kernels, and will probably do so for the next couple of decades. So it's not like it will ever be possible to get rid of our legacies: for example even the latest 64-bit CPUs implement the legacy "A20 line" feature that was already a weird outdated hack in the days of 16-bit x8086 CPUs. So what can we do? We should learn to live better with our legacies, and we should avoid reinventing the wheel in 64-bit code. Today there's already some limited code sharing between the i386 and x86_64 trees, but it's quite non-obvious: it's either done via a placeholder file that #include's an out-of-arch file, or a Makefile rule that uses an out-of-arch file. These 'cross-tree' file uses are not visible at the target point, and people sometimes break "the other arch" if they modify the file, because they are not aware of there being code sharing. Furthermore, the separate i386 and x86_64 trees often cause kernel writers to (unconsciously) think in either 32-bit or in 64-bit terms, instead of thinking about the two things in one way. It also happened numerous times that code that was originally copied from arch/i386 gets 64-bit-only additions in arch/x86_64, and if a bug is fixed in the 32-bit code, that fix is not applied to the x86_64 tree. Or some function is cleaned up and improved in the x86_64 tree, but that improvement is not easily adaptable to arch/i386, because the x86_64 code became 64-bit only already. All in one: the two architecture trees are "way too far apart" from each other, which causes the source code to diverge not only physically but structurally as well. The whole setup works _against_ sharing code, instead of working _for_ sharing code. This causes double maintenance even for functionality that is conceptually the same for the 32-bit and the 64-bit tree. (such as support for standard PC platform architecture devices) How did we do it? ----------------- As an initial matter, we made it painstakingly sure that the resulting .o files in a 32-bit build are bit for bit equal. (at least in terms of .text, small .data differences are there due to the namespace changes). We also made it sure that you can pick up _any_ existing 32-bit .config, stick it into our new arch/x86 tree and build a fully working 32-bit kernel. Same is the goal for any existing 64-bit .config as well. The shared x86 tree is _not_ "merge the 32-bit legacy code into the x86_64 tree". It is _not_ "create a x86_64 tree that can run on modern 32-bit hardware too, leave arch/i386 around for ancient stuff". It is a unification between equals, all legacies and all new code is unified into one shared tree. Nothing is left behind. A key component of our change is that only a very small portion of the conversion was done 'manually' - the overwhelming majority of file movement happened in an automated fashion. We did this to reduce the chance of human errors in the process of moving and rearranging more than a 1000 files. We also made the change fully bisectable: you can bisect 'across' the big arch/x86 commit and the .config's will be picked up correctly. The git-history of all previously existing files has also been preserved via the use of git's file-movement feature. The more fine-grained multiple commits approach which we are ready to do is also providing a fully bisectable and history preserving solution. How is the new arch/x86 and include/asm-x86 namespace layed out? Our foremost concern was to enable a 100% smooth transition to the new, shared architecture, while still enabling much more fine-grained future unification of the source code. To do this we consciously aimed for the strictest possible unification strategy: we only 'unified' those source files that are already bit for bit equal between the two architectures today. For all other files we used the following rule: if a file came from arch/i386/foo/bar.c, it gets moved to arch/x86/foo/bar_32.c, if it came from arch/x86_64/foo/bar.c it gets moved to arch/x86/foo/bar_64.c. We also generated arch/x86/foo/bar.c that simply #include's those two files (depending on whether we do a 32-bit or a 64-bit built). If a file only existed in only one of the architectures, it's moved to arch/x86/foo/bar.c straight away. (take a look at our git repository to see how this works out in practice.) Include files are handles similarly in include/asm-x86, with the exception of include files that are exported to user-space: for those, to preserve the source code API towards user-space, Kbuild creates symlinks from the _32.h or _64.h files to the .h file, instead of a stub .h file. In the future the number of such symlinks can be reduced significantly by unifying the files. The arch/i386 and arch/x86_64 trees are removed completely, except for a small Kconfig and kbuild stub to ease bisection and to ease the import (and export) of .configs into the new, shared x86 architecture. And we are serious about no-compromises compatibility of the transition: we built and booted an arch/x86 kernel on a real i386 DX CPU. (Yes, an old 33 MHz one. No, the CPU did not melt, Linux booted up fine!) We also built and booted a 64-bit kernel on a quad-core 64-bit CPU from the shared tree. (and on a number of other x86 systems.) What will happen to arch/x86 in the future? ------------------------------------------- Future, fine-grained unification is the main idea behind the new layout: 'unifying' a 32-bit and a 64-bit source code file will be a matter of creating a single .c file from a _32.c and _64.c file. Those patches will be easy to review and will be straightforward to create. We chose not to do any of those unifications in this initial work yet, even if they were easy to do, to be able to guarantee the bit-for-bit equivalency of the new tree to the old trees. When should this go upstream? ----------------------------- We actually think that the sooner we get over with this, the better. Once the precise method is agreed upon, the best period of the transition is when other larger-scale changes are done typically: right at the end of a merge window, when most of the architecture flux has flown into the tree already (so that the transition does not cause hickup in merge activities), but when we still have a maximum amount of time left to fix up any effects of the unification. This tree cannot really be carried in -mm or in other devel trees due to its size and intrusiveness. This is also true for the fine-grained solution, which should be done in one go as well. We do not believe that a "Chinese 5 year transition plan" is something useful. When we switch in one go we have two advantages: 1) it is a single synchronization point for folks with patches against that code 2) it is more likely that people tackle the unification of file_32.c and file_64.c which are in the same source directory than the unification of arch/i386/..../file.c and arch/x86_64/..../file.c As usual, comments and suggestions are welcome! Thomas, Ingo
From: Jeff Garzik [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 18:38:39 -0400 I agree with Andi... it's quite nice to be able to leave some arch/i386 stuff, and not carry it over to arch/x86-64. Jeff
From: Ingo Molnar [2] [email blocked] To: Jeff Garzik [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 00:40:33 +0200 * Jeff Garzik [email blocked] wrote: > I agree with Andi... it's quite nice to be able to leave some > arch/i386 stuff, and not carry it over to arch/x86-64. we can leave those few items in arch/x86 just as much. No need to keep around a legacy tree for that. Ingo
From: Jeff Garzik [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 18:42:42 -0400 Ingo Molnar wrote: > * Jeff Garzik [email blocked] wrote: > >> I agree with Andi... it's quite nice to be able to leave some >> arch/i386 stuff, and not carry it over to arch/x86-64. > > we can leave those few items in arch/x86 just as much. No need to keep > around a legacy tree for that. By extension it makes doing that sort of thing, in general, more difficult. Which is IMO not desirable. Jeff
From: Linus Torvalds [email blocked] To: Jeff Garzik [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Fri, 20 Jul 2007 15:51:51 -0700 (PDT) On Fri, 20 Jul 2007, Jeff Garzik wrote: > Ingo Molnar wrote: > > * Jeff Garzik [email blocked] wrote: > > > > > I agree with Andi... it's quite nice to be able to leave some arch/i386 > > > stuff, and not carry it over to arch/x86-64. > > > > we can leave those few items in arch/x86 just as much. No need to keep > > around a legacy tree for that. > > By extension it makes doing that sort of thing, in general, more difficult. > Which is IMO not desirable. I think it's *much* harder to carry legacy things around in an old tree that almost nobody even uses any more (probably not true yet, but for most of the main developers, I bet it will be true in a year). Especially one that just duplicates 99% of the stuff. There really isn't that much legacy crud. There are things like random quirks, but every time I hear the (theoretical) argument about how much time and effort we save by having it duplicated somewhere else, I think about all the time we definitely waste by fixing the same bug twice (and worry about the cases where we don't). Linus
From: Andi Kleen [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 07:37:58 +0200 On Saturday 21 July 2007 00:32, Thomas Gleixner wrote: > We are pleased to announce a project we've been working on for some > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > to solicit feedback about it. Well you know my position on this. I think it's a bad idea because it means we can never get rid of any old junk. IMNSHO arch/x86_64 is significantly cleaner and simpler in many ways than arch/i386 and I would like to preserve that. Also in general arch/x86_64 is much easier to hack than arch/i386 because it's easier to regression test and in general has to care about much less junk. And I don't know of any way to ever fix that for i386 besides splitting the old stuff off completely. Besides radical file movements like this are bad anyways. They cause a big break in patchkits and forward/backwards porting that doesn't really help anybody. > This causes double maintenance > even for functionality that is conceptually the same for the 32-bit and > the 64-bit tree. (such as support for standard PC platform architecture > devices) It's not really the same platform: one is PC hardware going back forever with zillions of bugs, the other is modern PC platforms which much less bugs and quirks To see it otherwise it's more a junkification of arch/x86_64 than a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 at all. > How did we do it? > ----------------- > > As an initial matter, we made it painstakingly sure that the resulting > .o files in a 32-bit build are bit for bit equal. You got not a single line less code duplication then, so i don't really see the point of this. -Andi
From: Thomas Gleixner [email blocked] To: Andi Kleen [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 10:15:50 +0200 On Sat, 2007-07-21 at 07:37 +0200, Andi Kleen wrote: > On Saturday 21 July 2007 00:32, Thomas Gleixner wrote: > > We are pleased to announce a project we've been working on for some > > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > > to solicit feedback about it. > > Well you know my position on this. I think it's a bad idea because > it means we can never get rid of any old junk. IMNSHO arch/x86_64 > is significantly cleaner and simpler in many ways than arch/i386 and I would > like to preserve that. Also in general arch/x86_64 is much easier to hack > than arch/i386 because it's easier to regression test and in general > has to care about much less junk. And I don't > know of any way to ever fix that for i386 besides splitting the old > stuff off completely. I disagree of course. I worked on both trees quite intensive over the last years and I broke x86_64 more than once when hacking on i386 and vice versa. Your "junk" argument is nothing else than a strawman which you beat on every time when this discussion comes up. > Besides radical file movements like this are bad anyways. They cause > a big break in patchkits and forward/backwards porting that doesn't > really help anybody. Interestingly enough the folks with the big patch kits (Virtualization) would be quite happy about that move. > > This causes double maintenance > > even for functionality that is conceptually the same for the 32-bit and > > the 64-bit tree. (such as support for standard PC platform architecture > > devices) > > It's not really the same platform: one is PC hardware going back forever > with zillions of bugs, the other is modern PC platforms which much less > bugs and quirks It _IS_ the same platform. x86_64 is PC hardware with zillions of bugs as well. And it is not modern at all. It is nothing else than a 64 bit version of the legacy x86. > To see it otherwise it's more a junkification of arch/x86_64 than > a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 > at all. We went for a 1 : 1 replacement without merging anything which is not obvious in the first place (identical files and files, which are just including some other file). That way we were able to do a binary compatible migration. The clean up is the next step and there are enough folks out there willing to help on this. > > As an initial matter, we made it painstakingly sure that the resulting > > .o files in a 32-bit build are bit for bit equal. > > You got not a single line less code duplication then, so i don't really > see the point of this. Really ? The script detected 15 identical files with a simple cmp. It also unified another 10 by simply looking at the only line in there "include <the other arch/file>" And there is more of that, when you take the time and look closely at the _32.[ch] _64.[ch] files which are created by the merge. tglx
From: David Woodhouse [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 11:37:46 +0100 On Sat, 2007-07-21 at 00:32 +0200, Thomas Gleixner wrote: > We are pleased to announce a project we've been working on for some > time: the unified x86 architecture tree, or "arch/x86" - and we'd like > to solicit feedback about it. Oooh, shiny. We've been talking about how useful this would be for years. Experience with the unification of PowerPC shows that it's definitely the right thing to do -- it reduces the number of gratuitous differences between 32-bit and 64-bit code, and makes it far more easier to ensure that bug-fixes and new features get added to both at the same time. > When should this go upstream? > ----------------------------- > > We actually think that the sooner we get over with this, the better. I'm inclined to agree. -- dwmw2
From: Matt Mackall [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sat, 21 Jul 2007 17:25:09 -0500 On Sat, Jul 21, 2007 at 12:32:59AM +0200, Thomas Gleixner wrote: > How is the new arch/x86 and include/asm-x86 namespace layed out? Our > foremost concern was to enable a 100% smooth transition to the new, > shared architecture, while still enabling much more fine-grained future > unification of the source code. To do this we consciously aimed for the > strictest possible unification strategy: we only 'unified' those source > files that are already bit for bit equal between the two architectures > today. For all other files we used the following rule: if a file came > from arch/i386/foo/bar.c, it gets moved to arch/x86/foo/bar_32.c, if it > came from arch/x86_64/foo/bar.c it gets moved to arch/x86/foo/bar_64.c. > We also generated arch/x86/foo/bar.c that simply #include's those two > files (depending on whether we do a 32-bit or a 64-bit built). If a file > only existed in only one of the architectures, it's moved to > arch/x86/foo/bar.c straight away. (take a look at our git repository to > see how this works out in practice.) Can we see some stats on: How many files were auto-merged? How many files got 32.c and 64.c extensions? How many existed only in one arch? -- Mathematics is the supreme nostalgia of our time.
From: Thomas Gleixner [email blocked] To: Chris Wright [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sun, 22 Jul 2007 09:50:46 +0200 On Sat, 2007-07-21 at 16:51 -0700, Chris Wright wrote: > * Matt Mackall wrote: > > Can we see some stats on: > > > > How many files were auto-merged? > > How many files got 32.c and 64.c extensions? > > How many existed only in one arch? > > It's mostly about file movement first. > > 918 files changed, 4745 insertions(+), 2836 deletions(-) Hmm, did you forget to make distclean ? Numbers from the script: include/asm-i386 240 files include/asm-x86_64 169 files ------------------------------ 409 files include/asm-x86 389 files arch/i386 335 files arch/x86_64 141 files ------------------------------ 476 files arch/x86 484 files The increase here is due to migration helper files which only include the (_32.x or the _64.x) variant. Makefile helpers 9 files Kconfig helpers 1 file Source helpers 4 files ------------------------------ 14 files Summary: vanilla 22657 files vanilla->x86 22649 files ------------------------------ include/x86 has 125 _32 and 125 _64 files arch/x86 has 55 _32 and 55 _64 files 25 files were auto-merged Looking at include/asm-x86/*_[32/64].h there are offhand ~ 50 of the 125 which differ only minimal (white space damage, comment changes, ...), where the unification is a no brainer. tglx
From: Matt Mackall [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: [RFC, Announce] Unified x86 architecture, arch/x86 Date: Sun, 22 Jul 2007 07:02:13 -0500 On Sun, Jul 22, 2007 at 09:50:46AM +0200, Thomas Gleixner wrote: > > Numbers from the script: > That looks more promising than I would have expected. For what it's worth, I was originally fairly disgusted by the _32/64.c thing, but the idea grows on me. -- Mathematics is the supreme nostalgia of our time.
Related Links:
- Archive of above thread [3]
- KernelTrap interview with Ingo Molnar [4]