A couple of lengthy threads on the lkml [1] drifted away from kernel coding, instead debating the merits of naming standards. The debate was sparked from a patch by David Woodhouse that Eric Raymond applied to Configure.help in the new CML2 configure language. This patch changes references from the familiar MB (megabyte) and GB (gigabyte) to the NIST standard [2] MiB (mebibyte) and GiB (gibibyte). According to these standards, technically a megabyte (MB) is a power of ten, while a mebibyte (MiB) is a power of two, appropriate for binary machines. A megabyte is then 1,000,000 bytes. A mebibyte is the actual 1,048,576 bytes that most intend.
Bits, bytes, kilobytes, megabytes... and now, mebibytes? When one speaks about a kilobyte referring to computer memory, one refers to 1024 bytes. This is more an "accepted innaccuracy" than anything. Though simple enough at the kilobyte level, as memory sizes increase from kilo to mega to giga and beyond, things get more and more confusing. The recent change is an attempt to move away from this "accepted innaccuracy" towards an official and accurate standard.
Some seem to support this change. Others fear it will only add confusion. What do you think?
Even Eric is not that enthused by this naming standard, but explains, "In the *absence* of a clear consensus, I will follow best practices. Best practice in editing a technical or standards document is to (a) avoid ambiguous usages, seek clarity and precision; and (b) to use, follow and reference international standards."
Alan Cox offers support to this change, "Eric using MiB seems the right thing. Its an ugly but appropriate unit, its at least recommended as a solution by a standards body. We can either redefine SI units ("You cannot change the laws of physics") or find a better label. What better than a recommended one others use.".
The standards they refer to are detailed on the the National Institute of Standards and Technology [3] web page title, "Definitions of the SI units: The binary prefixes [4]". You can find more on this International System of Units (SI) here [5].
CML2 [6] is targeted for early inclusion into the 2.5 development kernel.
The initial discussion began with this comment by Steven Cole:
From: Steven Cole
Subject: Changing KB, MB, and GB to KiB, MiB, and GiB in Configure.help.
Date: Thu, 20 Dec 2001 11:02:28 -0700
Greetings all,
I see that in the very latest Configure.help version, 2.76, available at
http://www.tuxedo.org/~esr/cml2/ [7]
Eric has decided to follow the following standard:
IEC 60027-2, Second edition, 2000-11, Letter symbols to be used in electrical
technology - Part 2: Telecommunications and electronics and has changed all
the abbreviations for Kilobyte (KB) to KiB, Megabyte (MB) to MiB, etc, etc.
Now, granted that this is the "standard", should there be some discussion related to this
change, or is everyone comfortable with this? It certainly made me do a double take.
Here is a snippet from the diff between versions 2.75 and 2.76 of Configure.help:
@@ -344,8 +344,8 @@
If you are compiling a kernel which will never run on a machine with
more than 960 megabytes of total physical RAM, answer "off" here
(default choice and suitable for most users). This will result in a
- "3GB/1GB" split: 3GB are mapped so that each process sees a 3GB
- virtual memory space and the remaining part of the 4GB virtual memory
+ "3GiB/1GiB" split: 3GiB are mapped so that each process sees a 3GiB
+ virtual memory space and the remaining part of the 4GiB virtual memory
space is used by the kernel to permanently map as much physical memory
as possible.
Steven
The next wave of discussion came following Eric's comments on the above subject:
From: "Eric S. Raymond"
To: Linux Kernel List
Subject: Configure.help editorial policy
Date: Thu, 20 Dec 2001 14:32:47 -0500
I guess it's a pretty quiet week in kernel-hacker land. Must be,
otherwise people would have better things to do than argue over KB
vs. KiB. The alternative would be to conclude that significant
portions of the lkml population prefer flaming to coding, and that
couldn't possibly be the case, could it?
Let me make a couple of things clear:
I am by no means in love with the new abbreviations described at
http://physics.nist.gov/cuu/Units/binary.html [8]. I have the same
reflexes as the rest of you -- they kind of make me want to gag.
If there is a clear consensus from lkml, I will be happy to back
out this change. Perhaps this terminological standard does not
meet a real need, perhaps it will be rejected by most engineers and
deserves to wither on the vine. It's happened before.
However. In the *absence* of a clear consensus, I will follow best
practices. Best practice in editing a technical or standards document
is to (a) avoid ambiguous usages, seek clarity and precision; and (b)
to use, follow and reference international standards.
In fact, the first time David Woodhouse submitted this change, some
months ago, I rejected it. I have since, reluctantly, concluded
that I was wrong to do so. So when he re-submitted, I merged in
the patch.
My personal esthetic distaste for the new terminology (gack! "kibi"
sounds like something I would feed my cat!) is less important
than following best practices. I'm hoping it will seem less ugly as it
becomes more familiar.
I don't like my duty much in this instance. But my duty is clear.
--
Eric S. Raymond [9]
From: David Garfield
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 13:43:29 -0500
Eric S. Raymond writes:
> David Garfield :
> > Another option: maybe the choice of KB vs KiB vs KKB should be a
> > configuration choice.
>
> You *must* be joking.
>
> Please tell me you're joking.
No, I'm serious. I will understand if CML2 does not support
meta-configuration. A configuration choice as I described above could
be viewed as a minor facet of a language configuration choice.
(Should kernel configuration be internationalized or at least
internationalizable?)
Choice of kB vs KB vs KiB vs KKB could also be used in some places in
the kernel. For instance, /proc/meminfo currently shows "kB".
--David
From: "Eric S. Raymond"
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 13:40:34 -0500
What, and *encourage* non-uniform terminology? No, I won't do that.
Better to have a single standard set of abbreviations, no matter how
ugly, than this.
From: Benjamin LaHaise
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 14:18:47 -0500
So, encouraging non-uniform terminology, breaking applicates *and*
confusing the hell out of everyone is better? Face it, the only
people trying to confuse things are the disk vendors. DRAM is sold
by the MB, everyone talks about MB == 1024*1024... I'm having a
hard time giving a sympathetic ear to anyone try to change the well
established, and consistent (barring the storage venduhs), standard.
-ben
From: Alan Cox
Subject: Re: Configure.help editorial policy
Date: Sat, 22 Dec 2001 00:32:54 +0000 (GMT)
by the MB, everyone talks about MB == 1024*1024... I'm having a hard time giving a sympathetic ear to anyone try to change the well established, and consistent (barring the storage venduhs), standard.
If someone sells you 16MB of RAM and it turns out to be 16,000,000 bytes,
not only would it be appropriate use of units, it would be quite reasonable
as far as I can see to say it was in accordance with labelling of products.
The world did not begin in 1970, A-Za-z is not English collate order and
M is 1,000,000. When computing meets the rest of planet earth usages for
the odd hundred years its hard to see any reason to believe we are "right"
Eric using MiB seems the right thing. Its an ugly but appropriate unit, its
at least recommended as a solution by a standards body. We can either
redefine SI units ("You cannot change the laws of physics") or find a better
label. What better than a recommended one others use.
Alan
From: Rik van Riel
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 19:17:45 -0200 (BRST)
On Fri, 21 Dec 2001, Eric S. Raymond wrote:
What, and *encourage* non-uniform terminology? No, I won't do that. Better to have a single standard set of abbreviations, no matter how ugly, than this.
Last I checked the purpose of language was _communication_.
Better use words people understand.
Also, the kB vs KiB mess is so ambiguous and complex that
it virtually guarantees that the _writers_ of documentation
will get it wrong occasionally and only confuse the readers
more.
As a last point, we shouldn't forget about the inconsistent
way in which the marketing departments of hardware vendors
apply these units to their products. In many cases binary
and decimal units are mixed, leading to something which is
impossible to "get right". Disk space would be one example
of this, but I'm sure there are more.
regards,
Rik
From: Stephen Satchell
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 14:53:50 -0800
At 07:17 PM 12/21/01 -0200, Rik van Riel wrote:As a last point, we shouldn't forget about the inconsistentway in which the marketing departments of hardware vendorsapply these units to their products. In many cases binaryand decimal units are mixed, leading to something which isimpossible to "get right". Disk space would be one exampleof this, but I'm sure there are more.regards,Rik
OK, how about this silly suggestion: DON'T USE ABBREVIATIONS IN DOCUMENTATION.
For example, let me propose an "edit" to the text that has been appearing
in this thread to eliminate the ambiguity:
> # Choice: himem
> High Memory support
> CONFIG_NOHIGHMEM
> Linux can use up to 64 * 2^30 bytes (64 gigabytes) of physical memory
on x86 systems.
> However, the address space of 32-bit x86 processors is only 4 * 2^30
bytes (four gigabytes)
> large. That means that, if you have a large amount of
> physical memory, not all of it can be "permanently mapped" by the
> kernel. The physical memory that's not permanently mapped is called
> "high memory".
>
> If you are compiling a kernel which will never run on a machine with
> more than 960 * 2^20 bytes (960 megabytes) of total physical RAM,
answer "off" here
> (default choice and suitable for most users). This will result in a
> 3:1 split: each process sees a 3 * 2^30 byte (three gigabyte)
> virtual memory space, and the remaining 1 * 2^30 (one gigabyte)
of virtual memory
> space is used by the kernel to permanently map as much physical memory
> as possible.
>
> If the machine has between one 2^30 bytes and four 2^30 bytes of
physical RAM, then
> answer "4GB" here.
In this edit, the only place where the abbreviation "GB" appears is in the
symbol definition space. Configuration symbols are by custom all
upper-case, so the standard for capitalization as specified in SI is
violated in the interest of a symbol space that is consistent.
What kills me is that people forget the origin of KB as a standard
designation for "kilobyte" in the first place. Does anyone remember the
KSR-33 teletype, the early dot-matrix printers, and other output devices
that output only upper-case characters? Maybe you youngsters don't recall
that lower-case character devices were EXPENSIVE -- I still have a TI
Silent 700 terminal that did output lower-case when connected to systems
that understood the full ASCII alphabet, but those systems were few and far
between in business applications -- upper-case-only was "good enough." How
about tab machines that never did have lower-case, such as the 407 printing
accounting machine?
It got worse when you need to differentiate between "bytes" and "bits"
because you didn't have the luxury of using "B" for "byte" and "b" for
"bit" -- so that's why you see in legacy documentation a lone "K" for
"kilobit".
Have you noticed that some schematic packages *still* don't support lower
case for all output devices?
So now you know where the common usage of K, KB, MB, GB and the rest came
from -- limitations in the early tools for doing electronic documentation
of computer systems. Now, some of you may scream that we no longer have
those limitations -- but what are you going to do with the vast body of
written knowledge that is written using the accepted abbreviations of the
time? Unless someone is going to go back and re-edit 30 years of academic
papers, journal articles, and legacy documentation of equipment already in
use (can you say telephone systems?) then any argument about re-doing
abbreviations "for esthetic reasons" is either pointless or yet another
cause for confusion. Especially as this has every stigmata of becoming a
religious war.
So here we are, campers, arguing about abbreviations when in fact there is
no real NEED for abbreviations outside of the config symbol space. Why not
just take the few extra bytes (they are not a penny each anymore) to spell
out what you really mean? Why do we need MB or mB or MiB for
"megabyte"? Sometimes we get so wrapped up in our abbreviations that we
lose sight of the fact that the original job of a HELP FILE is to HELP, to
COMMUNICATE really useful information to the person trying to configure
their Linux kernel to a specific purpose. Don't forget that the audience
for configure.help goes far beyond the 30K of us (oops, I used an
abbreviation -- sorry) that work with this sort of stuff on a continuing
basis -- members of the priesthood, if you will. What would your mother or
grandmother say if confronted with this sort of stuff? Is it necessary to
obfuscate your meaning to the unwashed? (Even my edit caters to the expert
to some extent -- I admit it.)
Frankly, if the reason you use abbreviations is because you type slowly, I
can't feel any pain for you. There are lots of courses, sources, and even
gaming software around the world designed to teach you to type on a QWERTY
keyboard, far too many for me to feel any pain at all about your lack -- if
you are too lazy to learn a necessary skill for your craft or hobby, then
you might want to think about finding another career or passtime.
Oh, before someone quotes my Slashdot interview and says "Hey, it's easy
for you, you are a professional writer" I'll tell you that my high school
summer-school typing course increased my keypunching productivity by a
factor of 15 while I was still a hobbyist, and permitted me to make full
use of that KSR-33 and glass TTY and so forth as a professional...and that
I was a programmer for 12 years before I wrote my first article for
money. (I hope none of you EVER see that first article, which thank the
deity of your choice never saw publication.)
Look, I'm sorry if this comes across as harsh, but I can't believe you
people WANT to alienate users for the sake of an obscurity. Please
CONSIDER YOUR AUDIENCE -- ALL of them. Not just the top one-half of one
percent. The worst sin any writer -- professional or amateur, technical or
popular, fiction or non-fiction -- can do is lose the reader. And that,
people, is what you are doing with this debate.
Stephen Satchell
From: Nicholas Knight
Subject: Re: Configure.help editorial policy
Date: Fri, 21 Dec 2001 15:07:22 -0800
While pure reason suddenly made me realize that we really shouldn't use
abbreviations in docs, thus rendering MiB vs MB moot, this still leaves
us with another problem:
Gigs or Gibs? Kibbles or Kilos? You're still going to end up with
confusion, because outside of the (limited number of) people who heard
about this "international" standard, nobody knows what the heck a
kibibyte is.
From: Keith Owens [9]
Subject: Re: Configure.help editorial policy
Date: Sat, 22 Dec 2001 15:49:07 +1100
On Fri, 21 Dec 2001 20:12:59 +0100,
David Weinehall wrote:Whatever the choice ends up being, KB is always incorrect, unless youintend to specify some strange formula where the number of bytes (B)combined with the temperature in Kelvin (K) has anything to do withthings.
The KB unit has been reserved for the temperature of a linux-kernel
flamewar multiplied by the number of bytes of network traffic wasted on
that flame-war.
:)
Having made it this far, if you're still interested, you can follow the actual threads on an online lkml archive. The Unix Workstation Support Group [10] (UWSG) has one such archive, and using it you can follow both the second thread [11].