On Thu, 27 Dec 2007, Robert Hancock wrote:There was a thread called "PCI vendor id == 1 regression?" (Kai Ruhnau was the main reporter for that one). But looking back, it seems that one didn't hit the kernel mailing list, so I guess google cannot pick it up. I can forward all the emails if you want, but quite frankly, you don't really want to. It boils down to: Stephen Hemminger: "There have been two reports with different hardware of the PCI vendor id of 0001 showing up. I got a report on sky2, and Francois saw similar problem on r8169. In one case, it happened only with 2.6.23 kernel, the correct id was returned by older kernels. This is a heads up, there may be a PCI problem. Or just some users smoking strange fall leaves." And then one of the reporters: "Good kernel: 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) 00: ab 11 64 43 07 00 10 00 12 00 00 02 01 00 00 00 Bad kernel: 02:00.0 Ethernet controller: Unknown device 0001:4364 (rev 12) 00: 01 00 64 43 07 00 10 00 12 00 00 02 01 00 00 00" and after I suspected it was mmconfig-related and asked him to try "lspci -H1" to force conf1 accesses from user space on a broken kernel: "Bare lspci gives the wrong vendor ID, lspci -H1 the right one." for *one* single device. In other words, just a single word in mmconfig was corrupt, but it was consistently corrupt. The PCI device chain for that particular report was: 00:00.0 Host bridge: ATI Technologies Inc Unknown device 7930 .. 00:06.0 PCI bridge: ATI Technologies Inc Unknown device 7936 (prog-if 00 [Normal decode]) .. 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) and with the bug, the PCI vendor ID for that ethernet chip was 0x0001 (bogus vendor) rather than the correct Marvell ID (0x11ab). But as mentioned, there were other reports too of the exact same bug (with different PCI devices, but the same "vendor == 0001" bogosity). Googling for lspci "Unknown device 0001:" mmconfig shows reports like these: http://lkml.org/lkml/2007/10/29/500 http://madwifi.org/ticket/1587 http://www.nvnews.net/vbulletin/showthread.php?t=103271 http://naoya.g.hatena.ne.jp/naoya/20070529/1180436756 http://bbs.archlinux.org/viewtopic.php?id=34321 ... which all seem to be due to this same bug with different cards (but the common theme seems to be an ATI northbridge). The point is: mmconfig is easily broken. In surprising ways. The whole concept is stupid, it was badly done, and it has never gotten any testing what-so-ever until Linux started using it (and now Vista). And hardware that isn't tested is broken by *definition*. And there's no point in blaming AMD. They may have gotten it wrong, but hey, so did Intel (with machines that lock up completely), so it's not like this is some "bad AMD" case. It's a "bad mmconfig" case. Linus --
| Davide Libenzi | Re: [patch 7/8] fdmap v2 - implement sys_socket2 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Mariusz Kozlowski | [KJ PATCHES] mostly kmalloc + memset conversion to k[cz]alloc |
git: | |
| KOSAKI Motohiro | [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Stefan Richter | Re: [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
