OpenBSD: Intel Core 2 Bugs

Submitted by Jeremy
on June 30, 2007 - 1:37pm

Theo de Raadt [interview] described an active effort by OpenBSD developers to work around "serious bugs in Intel's Core 2 cpu". He went on to explain, "these processors are buggy as hell, and some of these bugs don't just cause development/debugging problems, but will *ASSUREDLY* be exploitable from userland code. As is typical, BIOS vendors will be very late providing workarounds / fixes for these processors bugs. Some bugs are unfixable and cannot be worked around. Intel only provides detailed fixes to BIOS vendors and large operating system groups. Open Source operating systems are largely left in the cold." He provided a link to the full errata (in PDF format) as well as a graphical overview, summarizing:

"Note that some errata like AI65, AI79, AI43, AI39, AI90, AI99 scare the hell out of us. Some of these are things that cannot be fixed in running code, and some are things that every operating system will do until about mid-2008, because that is how the MMU has always been managed on all generations of Intel/AMD/whoeverelse hardware. Now Intel is telling people to manage the MMU's TLB flushes in a new and different way. Yet even if we do so, some of the errata listed are unaffected by doing so.

As I said before, hiding in this list are 20-30 bugs that cannot be worked around by operating systems, and will be potentially exploitable. I would bet a lot of money that at least 2-3 of them are."


From: Theo de Raadt [email blocked]
To:  misc
Subject: Intel Core 2
Date: Wed, 27 Jun 2007 11:08:16 -0600

Various developers are busy implimenting workarounds for serious bugs
in Intel's Core 2 cpu.

These processors are buggy as hell, and some of these bugs don't just
cause development/debugging problems, but will *ASSUREDLY* be
exploitable from userland code.

As is typical, BIOS vendors will be very late providing workarounds /
fixes for these processors bugs.  Some bugs are unfixable and cannot
be worked around.  Intel only provides detailed fixes to BIOS vendors
and large operating system groups.  Open Source operating systems are
largely left in the cold.

Full (current) errata from Intel:

  http://download.intel.com/design/processor/specupdt/31327914.pdf

  - We bet there are many more errata not yet announced -- every month
    this file gets larger.
  - Intel understates the impact of these erraata very significantly.
    Almost all operating systems will run into these bugs.
  - Basically the MMU simply does not operate as specified/implimented
    in previous generations of x86 hardware.  It is not just buggy, but
    Intel has gone further and defined "new ways to handle page tables"
    (see page 58).
  - Some of these bugs are along the lines of "buffer overflow"; where
    a write-protect or non-execute bit for a page table entry is ignored.
    Others are floating point instruction non-coherencies, or memory
    corruptions -- outside of the range of permitted writing for the
    process -- running common instruction sequences.
  - All of this is just unbelievable to many of us.

An easier summary document for some people to read:

  http://www.geek.com/images/geeknews/2006Jan/core_duo_errata__2006_01_21__full.gif

Note that some errata like AI65, AI79, AI43, AI39, AI90, AI99 scare
the hell out of us.  Some of these are things that cannot be fixed in
running code, and some are things that every operating system will do
until about mid-2008, because that is how the MMU has always been
managed on all generations of Intel/AMD/whoeverelse hardware.  Now
Intel is telling people to manage the MMU's TLB flushes in a new and
different way.  Yet even if we do so, some of the errata listed are
unaffected by doing so.

As I said before, hiding in this list are 20-30 bugs that cannot be
worked around by operating systems, and will be potentially
exploitable.  I would bet a lot of money that at least 2-3 of them
are.

For instance, AI90 is exploitable on some operating systems (but not
OpenBSD running default binaries).

At this time, I cannot recommend purchase of any machines based on the
Intel Core 2 until these issues are dealt with (which I suspect will
take more than a year).  Intel must be come more transparent.

(While here, I would like to say that AMD is becoming less helpful day
by day towards open source operating systems too, perhaps because
their serious errata lists are growing rapidly too).


Related Links:

Just return the CPUs

Anonymous (not verified)
on
July 1, 2007 - 2:17am

At least in europe this is a defect seen by the warranty law which requires sellers to repair/replace it or leads to an annulment of the sale contract.

Users should get into this to aquire docs for developers.

return every cpu ever made?

Anonymous (not verified)
on
July 1, 2007 - 11:02am

return every cpu ever made? :)

Verdict: overblown

Matt Sayler (not verified)
on
July 1, 2007 - 6:49am

Thread at RWT

Comments from Linus and Andi Kleen! It must be true!

Intel Only?

Anonymous (not verified)
on
July 1, 2007 - 11:25am

Is it only Intel CPUs that have these problems and not AMD CPUs? Do similar problems exist on AMD CPUs?

Intel Only?

Blah (not verified)
on
July 1, 2007 - 11:29am

Do problems like this exist on any AMD CPUs?

No.

Anonymous (not verified)
on
July 1, 2007 - 11:48am

No, AMD CPUs have their own set of bugs. See the following link for Athlon 64 / Opteron bugs:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf

Errata

Anonymous (not verified)
on
July 10, 2007 - 8:14am

All CPUs have errata, but not all have bugs like these, which are serious security holes that cannot be worked around in software.

I side w/ Linus on this one

on
July 2, 2007 - 12:10am

I read through the errata, Theo's comments and Linus' comments. I'll have to side with Linus on this one.

I also agree with Linus' assessment of embedded CPUs vs. commodity CPUs. Your typical Intel or AMD x86 device will be much cleaner in the end (especially relative to their complexity!) than J. Random Embedded CPU, because neither AMD nor Intel has much control over the software stack running on their CPUs, and there are just way, way, way too many units out there.

There might be a ton of DSPs and ARMs out there, but the number of people writing code for them are far fewer, by one to four orders of magnitude depending on the device. I work for a company that makes embedded processors, and can vouch for the fact that we will work around CPU / chip bugs with compiler tweaks or with detailed instructions to programmers, as opposed to spending $BIGNUM on a spin. (I'm not divulging any secrets here. It's all right there in our errata. Some of those words *I* wrote!)

If the bug is big enough, we'll respin. But, it's got to be a pretty big bug. Otherwise, there goes the profitability. And that's a calculation every vendor makes. Your volumes and your revenue go a long way in determining that threshold.

Now, we do tend to fix bugs in the core IP even if we don't respin chips specifically due to those bugs. Later spins can then pick up the bug fixes "for free." That's another common practice.

And the point is...

Anonymous (not verified)
on
July 2, 2007 - 5:59pm

That you give out detailed information on the bug.
Intel does not. AMD I believe is getting worse. It
is in *everyone's* best interest to have this info.

Ok, *maybe*, but that's a big maybe, not intel's best
interest. At least short term... long term I believe
it also is.

Hmmm.

on
July 4, 2007 - 5:01pm

I'm not sure how much more detail is actually useful. You need more detail to reproduce the bug, but not necessarily more detail to avoid the bug. And if the bug is only triggerable from kernel space, the additional detail isn't necessary to determine whether hostile user space could trigger it.

In my experience with bugs like this, sometimes it's very hard to explain, even among people intimately familiar with the device's architecture, the sequence of events that exposes a bug. Some of these sound like they could take several pages to explain, should you want enough detail to be able to reproduce the bug and know all the detailed ways the bug can manifest.

Let's take for example AI56:


AI56. Update of Read/Write (R/W) or User/Supervisor (U/S) or Present (P) bits without TLB Shootdown May Cause Unexpected Processor Behavior

Problem: Updating a page table entry by changing R/W, U/S or P bits without TLB shootdown (as defined by the 4 step procedure in "Propagation of Page Table and Page Directory Entry Changes to Multiple Processors" in volume 3A of the IA-32 Intel Architecture Software Developer's Manual), in conjunction with a complex sequence of internal processor micro-architectural events, may lead to unexpected processor behavior.

Implication: This erratum may lead to livelock, shutdown or other unexpected processor behavior. Intel has not observed this erratum with any commercially available system.

Ok, what details are missing that are relevant to a developer trying to avoid this bug? None, really. They tell you exactly where to go to get the proper procedure for shooting down the TLB so that the bug never occurs.

But, it's not very detailed about what the bug is. There's not nearly enough detail here on how to reproduce the bug quickly (though there is enough that you could probably goof around in this area and eventually trigger it). There's also no indication of what goes wrong or the full range of misbehavior to expect. (Sure, "livelock" and "shutdown" are mentioned, but what's "other unexpected processor behavior"? 2 + 2 returning something other than 4? Thermal meltdown?)

My point is, does that matter? The severity of the bug is clear: It can crash the system. The scope of the bug is clear: This could happen if the OS doesn't follow the procedure we laid out. The required course of action is clear: Follow the directions. What's left out probably requires a deep explanation of the microarchitectural details as a starting point, likely followed by an equally complex description of the sequence(s) of events that get the machine into an odd state.

The missing part might be interesting to an architect or a designer, but not terribly so to an OS writer, unless they like playing armchair architect. I can see some of these could inspire conference papers at design verification oriented conferences. But in the errata? Do we really need all that? I suspect if that were there, most people would be saying "Just get to the point, ok?"

Oh, and I forgot to mention

on
July 4, 2007 - 5:27pm

There are plenty of bugs we don't disclose. That's true of just about any vendor. Most bugs simply don't matter all that much.

Sometimes it's a simple performance bug. For instance, maybe a certain sequence of instructions should run in 10 cycles if the processor was in spec, but it happens to take 11 due to some bug. That's pretty benign, but it's still a bug. Is it worth an errata? Probably not.

Intel Core Duo 2 the worst, Intel Core Duo the best.

Anonymous (not verified)
on
July 3, 2007 - 4:40pm

IMHO, Intel Core Duo 2 overflows bugs because they complicated the design of Intel Core Duo + new complex instructions set x86-64 computer that they go to flaw it in little time.

So, Intel Core Duo (has not x86-64) is currently the best processor after Intel Pentium-M, and Intel Core Duo (has x86-64 that 50% nobody uses it) the worst.

Flaws? Wait a moment ...

Anonymous (not verified)
on
July 3, 2007 - 4:46pm

Windows Vista is still using 8086 16-bit code?

Yes, 640 k is ok for everything!!!
It still uses the A20 gate to address more than 1 MB!!!

It jokes me an asshole.

i hate subjects

Anonymous (not verified)
on
July 14, 2008 - 5:05pm

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.