login
Header Space

 
 

Unified x86 Architecture Code Quality

October 24, 2007 - 4:12pm
Submitted by Jeremy on October 24, 2007 - 4:12pm.
Linux news

"Can we please finish up this merge a little more before we freeze 2.6.24? The way we currently have leftovers of arch/i386/ and arch/x86_64/ is quite a nightmare and not how the other architectures were merged,

" Christoph Hellwig asked, leading to an insightful reply by Ingo Molnar. Ingo began by noting, "to answer that question one should first be aware of the fundamental code quality problems that the unified x86 architecture has inherited from the split i386 and x86_64 architectures." He then utilized the checkpatch script to generate a table of "coding style errors per one thousand lines of source code". In his table, arch/i386/ rated 77.3 errors per thousand lines of source, with arch/x86_64/ rating 96.0. The new unified arch/x86/ rated a lower but still very high 74.1. He summarized, "it is plainly obvious that the x86_64 and i386 architectures were in a dreadful state of code quality before the unification. Their code quality was almost an order of magnitude worse than that of the core kernel (!) - and their code quality was significantly worse than that of a couple of other, comparable architectures." Ingo continued:

"So to answer your question: full unification is no easy task and it is not automatic at all. The x86_64 tree has diverged from the i386 tree in the past 5 years due to their illogical, forced separation and a resulting bitrot. The two architectures have grown different sets of cleanliness problems and different sets of functions with arbitrary differences that often cover the same functionality. It's all compounded by the fact that the 64-bit code is in worse shape than the 32-bit - so it's not like we could just pick the 64-bit code and use that as the unified code. The 32-bit code is also used about 8-10 times more frequently than the 64-bit code. So there is no easy 'just unify it' path."

"The new maintainers of the x86 architecture (Thomas, Peter and me), and many other x86 developers are highly motivated to improve the x86 architecture's code quality and unify the heck out of it, and there are some real improvements in 2.6.24-rc1 already, but we _must_ be (and are) working on this carefully. So we do unifications on a case by case basis, with the highest priority being to not introduce 'unification regressions'. The x86 architecture is the most common Linux architecture after all - and users care much more about having a working kernel than they care about cleanups and unifications. So yes, we agree with you, but please be patient! :-) This cannot be realistically finished in v2.6.24, without upsetting the codebase."


From: Christoph Hellwig <hch@...>
Subject: Re: Linux v2.6.24-rc1
Date: Oct 24, 4:04 am 2007

On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote:
> In short, we just had an unusually large amount of not just x86 merges, 

Btw, can we please finis up this merge a little more before we freeze
2.6.24?  The way we currently have leftovers of arch/i386/ and arch/x86_64/
is quite a nightmare and not how the other architectures were merged.

Thomas, what again prevents us from just killing these leftovers?

-

From: Ingo Molnar <mingo@...> Subject: Re: Linux v2.6.24-rc1, x86 arch code quality, unifications Date: Oct 24, 6:17 am 2007 * Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Oct 23, 2007 at 09:19:16PM -0700, Linus Torvalds wrote: > > In short, we just had an unusually large amount of not just x86 merges, > > Btw, can we please finis up this merge a little more before we freeze > 2.6.24? The way we currently have leftovers of arch/i386/ and > arch/x86_64/ is quite a nightmare and not how the other architectures > were merged. > > Thomas, what again prevents us from just killing these leftovers? to answer that question one should first be aware of the fundamental code quality problems that the unified x86 architecture has inherited from the split i386 and x86_64 architectures. To get objective and automated metrics about code quality, i've constructed a table of "coding style errors per one thousand lines of source code" numbers with the help of the latest checkpatch.pl. The codebases i measured are the pre-merge i386 and x86_64 tree, the post-merge arch/x86 unified architecture, and i've also added a handful of other architectures and selected core subsystems, as comparison: ------------------------------------------------------- | errors | lines of code | errors/KLOC | | | (smaller is better) --------------|------------|----------------|------------------------ arch/i386/ 5717 73865 77.3 arch/x86_64/ 2993 31155 96.0 arch/x86/ 8504 114654 74.1 ..............|............|................|........................ arch/ia64/ 1779 64022 27.7 arch/mips/ 2110 94692 22.2 arch/sparc64/ 1387 49253 28.1 ..............|............|................|........................ kernel/ 762 83540 9.1 kernel/time/ 15 4191 3.5 kernel/irq/ 1 2317 0.4 mm/ 464 46324 10.0 net/core 176 24413 7.2 ..............|............|................|........................ a couple of observations. Firstly, it is plainly obvious that the x86_64 and i386 architectures were in a dreadful state of code quality before the unification. Their code quality was almost an order of magnitude worse than that of the core kernel (!) - and their code quality was significantly worse than that of a couple of other, comparable architectures. (we knew this when we started the x86 unification effort - but i suspect it's even more apparent via the hard numbers in this table.) ( Note: code metrics should be taken with a grain of salt, as they often over-simplify the picture, but in this particular situation the trends are clear and the numbers match my personal impressions of code quality and robustness of these codebases. ) paradoxically the x86_64 architecture that had a _worse_ code quality than the "legacy" 32-bit code - so much about the "newer code must be better" misconception. The first, mechanic round of unifications thus brought a net degradation in quality - but we've reversed that trend in 2.6.24-rc1 already, via unifications and cleanups, as it can be seen from the table. (and we did that while adding new features like high-resolution timers and dynticks to the x86-64bit architecture in v2.6.24-rc1 - or the new IOMMU code. So the x86 architecture is not standing still at all while the unification is going on.) so to answer your question: full unification is no easy task and it is not automatic at all. The x86_64 tree has diverged from the i386 tree in the past 5 years due to their illogical, forced separation and a resulting bitrot. The two architectures have grown different sets of cleanliness problems and different sets of functions with arbitrary differences that often cover the same functionality. It's all compounded by the fact that the 64-bit code is in worse shape than the 32-bit - so it's not like we could just pick the 64-bit code and use that as the unified code. The 32-bit code is also used about 8-10 times more frequently than the 64-bit code. So there is no easy "just unify it" path. The new maintainers of the x86 architecture (Thomas, Peter and me), and many other x86 developers are highly motivated to improve the x86 architecture's code quality and unify the heck out of it, and there are some real improvements in 2.6.24-rc1 already, but we _must_ be (and are) working on this carefully. So we do unifications on a case by case basis, with the highest priority being to not introduce "unification regressions". The x86 architecture is the most common Linux architecture after all - and users care much more about having a working kernel than they care about cleanups and unifications. So yes, we agree with you, but please be patient! :-) This cannot be realistically finished in v2.6.24, without upsetting the codebase. Ingo -

[mid=352550,352591,352594,


Ingo

October 24, 2007 - 4:36pm
Anonymous (not verified)

Don't you think he's done too much? :-)

- And he recieves a lot of flak for it

October 24, 2007 - 4:45pm
Anonymous (not verified)

Yep, he's doing A LOT of the important kernel development, and a lot of what he gets back, is bitching from people that knows almost nothing about kernel development.

I, for one, would like to extend a BIG thanks to Ingo, for keeping our kernel fast, lean, properly working and clean-looking!!!!

People that know "almost

October 24, 2007 - 5:07pm
Anonymous (not verified)

People that know "almost nothing" about kernel development - like Christoph Hellwig, you mean?

WTF? In which way has

October 24, 2007 - 6:20pm
Anonymous (not verified)

WTF? In which way has Hellwig been doing the "CFS sucks", "Ingo sucks", "It's just because Linus likes him" crap?

I'm not talking about Hellwig's wish for a quick merge of the arch's (I'm pretty sure everyone wants that, including Ingo, there's just issues that makes it hard to rush), I'm talking about the tons of crap being dumped by Con's army of useful idiots, and the like.

: I'm talking about the tons

October 24, 2007 - 6:48pm
Anonymous (not verified)

: I'm talking about the tons of crap being dumped by Con's army of useful idiots, and the like.

Wow, the elites of Ingo's army strike back!

*sigh*

October 25, 2007 - 1:11pm

How tedious, predictable, and boring.

I'd be curious to see, if Jeremy blocked out names for a week or two and just assign them generic names like "Person 1", "Person 2", "Person 3", etc., how many people could line up the names with the actual people without going back to LKML archives and lining up the quotes. Or, if he randomized it per article, if people could actually figure out who it was within the context of that article.

The kernel programmers are bright, opinionated, articulate and argumentative. I'd say they're also extremely good at what they do. It's certainly entertaining to watch, and rewarding to participate in when it happens to happen.

Down here in the threads, I sometimes feel like I'm watching monkeys fling poo at the zoo. Especially when the "Oh no! It's Ingo!" crowd shows up along with their counterparts.

--
Program Intellivision and play Space Patrol!

What are you talking about?

October 24, 2007 - 6:23pm
Anonymous (not verified)

You're the only one to mention Chris Hellwig in this sense - the post you replied to was referring to reponses on CFS, not the original e-mail. There really isn't any disagreement I can see there anyway, plus the excerpts are out of context from the conversation anyway...

Not acceptable

October 25, 2007 - 12:22am
Anonymous (not verified)

That it has so many coding style errors is unacceptable.
x86 is the most used architecture. It should be the cleanest and best implemented.

An esoteric architecture

October 25, 2007 - 1:58am
Anonymous (not verified)

An esoteric architecture merits an esoteric implementation ...

32-bit Linux on 64-bit processors

October 25, 2007 - 2:59am

Ingo Molnar: "The 32-bit code is also used about 8-10 times more
frequently than the 64-bit code."

I wonder for how long? Running a 32-bit OS on the newer processors from Intel and AMD is starting to be a bit like running 16-bit MS-DOS code on 386 and 486, back in the days when Linux was born. I agree it was probably painful in the beginning, but in the latest major distros the 64-bit mode is ok. In Mandriva, whose 64 -bit edition I use, they have also managed to make using legacy 32-bit binaries quite seamless. Even proprietary 32-bit browser plugins Just Work, thanks to nspluginwrapper. I predict that in a few years the kernel maintainers will find keeping the 32-bit variant around to be an unwanted legacy, and will want to split the architectures again...

I wonder for how

October 25, 2007 - 6:10am
Anonymous (not verified)


I wonder for how long?

That's one of the reasons why this unification is important IMHO. According to most usage stats, x86_64 is primarily used on servers, with spotty desktop use. The i386 arch is mostly used on desktops with lots of server use as well. As a result of that the i386 arch has acquired more code robustness (via a more varied user-base and hardware-base) and has acquired more desktop features.

Treating i386 as a 'legacy step-child of x86_64' was IMHO a mistake - a more tested and more utilized code-base is always valuable because it's easier to add features to it. (as bugs get found faster)

If the new x86 architecture goes down in flames and becomes a major disaster then Andi should pick up the old split architectures and should continue maintaining them. If it works out fine (which is my guess) then Andi should admit his mistake quickly (nobody is perfect) and should join the effort. For Linux it's a win-win situation, as the interest in the x86 architecture and activities around it are more intense than ever before - so there's enough manpower available for all the possible eventualities.

One thing must be said, the kernel devs and Linus in particular certainly have balls and don't seem to be afraid of making difficult decisions and don't seem to be afraid of making complex code changes to back up those decisions. If only Microsoft were this flexible ;)

It will be very interesting to watch how this new architecture works out. Popcorn anyone?

The sum is bigger then the two separated trees...

October 25, 2007 - 4:27am
Anonymous (not verified)

How is it possible?
If I merge 73865 lines of code from i386 with 31155 lines of code from x86_64 (73865+31155=105020) how can I get 114654 lines of code? The merge has created over 9500 lines of code :(

Just a thought...

Bye
Piero

New features have been added

October 25, 2007 - 5:10am
Anonymous (not verified)

New features have been added in 2.6.24-rc1 to the 64-bit code: IOMMU, high resolution timers, dynticks, hpet enhancements.

Could it be....

October 25, 2007 - 1:06pm

...around 3200 instances of

#ifdef x86_64
...
#else /* 32-bit code */
...
#endif

perhaps? ;-)

(I kid! I kid!)

--
Program Intellivision and play Space Patrol!

Haha

October 25, 2007 - 3:50pm
Beau V.C. Bellamy (not verified)

I'm almost suspicious of that myself...

I've posted before.

October 25, 2007 - 8:32pm
Anonymous (not verified)

I've posted before a good strategy to minimize problems, here in

http://kerneltrap.org/Linux/Discussing_the_x86_Merge



I'm waiting its good merge i386/x86-64 :)

* i386 assembly/architecture is very good for simple and small 32-bit processes.
[ Bandwith of its bus is 1 * X MegaWord32/s ].

* x86_64 assembly/architecture is very good for complex, multithreaded and big 64-bit processes.
Is good for the addressing of the kernel more than 4 or 8 GiB of RAM.

[ Bandwith of its bus is 0.5 * X MegaWord64/s. ]

To merge them both is very good for this ILP32/LP64 (or possibly ILP64) architecture.

My idea is to have one 64-bit kernel, many 32-bit processes (ELF32) and few 64-bits big processes (ELF64).

They are needed one 32-bit & 64-bit crosscompiler GCC (-march=athlon for applications and -march=k8 for 64-bit kernel and applications).

The map of many devices like from PCI, PCI-E, USB-2.0, AGP, Sound, Ethernet, etc. are still 32-bit.

Merged 32-bit (to ignore higher 32 bits) and 64-bit addressing have to be:

1) 0x0000'0000'0000'0000 .. 0x0000'0000'0000'0FFF is reserved 4 KiB page for NULL exception.

2) 0x0000'0000'0000'1000 .. 0x0000'0000'BFFF'FFFF is userspace for pure 32-bit tasks (3G/1G)(and for 64-bit tasks reserving it for 32-bit emulation/translation).

3) 0x0000'0000'C000'0000 .. 0x0000'0000'FFFF'FFFF is kernelspace for pure 32-bit tasks (3G/1G)(and for 64-bit tasks reserving it for 32-bit emulation/translation using its recovered 32-64-bit trampoline although the running kernel is 64-bit).

4) 0x0000'0001'0000'0000 .. 0x0000'7FFF'FFFF'FFFF is userspace for pure 64-bit tasks.

5) 0xFFFF'8000'0000'0000 .. 0xFFFF'FFFF'FFFF'FFFF is kernelspace for pure 64-bit tasks.

http://en.wikipedia.org/wiki/X86-64

Paint it with colors: Green for userspace and Red for kernelspace :)

Athlon64, Opteron, etc. and the old i486 are good chips, not yet the best.



EFFORTS in merging are important!

This table below is a bad report:

Filename    i386   x86_64
acpi.c        X       X
common.c      X       X
...

It requires many efforts to obtain the merging.

See this idea of this below exhaustive table to prevent errors of merging:

Filename i386-only   x86_64-only   is merged
-------- ---------   -----------   ---------
foo.c        OK      Not-tested    No-such-file
bar.c     Not-tested      OK           OK
santo.c      OK           OK        Not-tested
inocente.c   OK           OK         Failed
paratodos.c  OK           OK           OK
mortales.c   OK      No-such-file      OK

* "is merged" means "the 32-bit and 64-bit files are modified and merged resulting one unique file for both architectures".
* "is merged" is OK if it works for both machines i386 and x86-64.

Possible values for the report are: OK, Failed, Not-tested, No-such-file, ...



J.C. Pizarro

And interesting discussion of LP64/ILP64.

October 25, 2007 - 8:41pm
Anonymous (not verified)

And there is an interesting discussion of LP64/ILP64 for the merge of 32-bit i386 and 64-bit x86-64:

http://kerneltrap.org/node/14159#comments



J.C. Pizarro

Hmmm...

October 26, 2007 - 4:20pm

(a) What are you smoking? I had a hard time making heads/tails of your post. It looked like an amalgam of other articles, with random bits changed.

(b) You are aware you can run a full 32-bit userspace under a 64-bit x86-64 kernel, right? That works today.

At any rate, an ILP64 version of Linux seems unlikely as long as there are ILP32 implementations around, as code that assumes 32-bit int is widespread. Code that assumes sizeof(int)==sizeof(void*) is thankfully much less prevalent due to the fact people moved to ANSI C some time ago. 32-bit int has a couple of orders of magnitude greater installed base in mainstream computing than did 16-bit int when mainstream started to shift to 32-bit.

(Note that I said "mainstream." There are billions of little embedded toys with 16-bit int.)

--
Program Intellivision and play Space Patrol!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary