Re: Realtek 8111c weirdness problems, apic/msi, and normal bug

Previous thread: Re: [PATCH] jffs2 summary allocation by matthieu castet on Saturday, April 5, 2008 - 7:05 am. (1 message)

Next thread: Re: sata_via by Marin Mitov on Saturday, April 5, 2008 - 9:08 am. (1 message)
From: Kasper Sandberg
Date: Saturday, April 5, 2008 - 8:03 am

Hello.

I have a Gigabyte-X48-DQ6 motherboard which contains two realtek 8111c
pci express gigabit ethernet controllers.

The driver for these are r8169

To cut to the results that matters(IMO) most, is that on .25-rc8-git3,
the driver detects these cards, both of them, on different interrupts,
however, none of the nics work if i have both msi and apic enabled. If i
boot with pci=nomsi, and apic is enabled, both ports work (however one
"insignificant" bug remains), if i boot with noapic boot parameter, but
msi is enabled, both controllers are again found, and they work, however
that insignificant bug is also present here.

The insignificant bug is, that one of the interfaces appears to always
report as link up, despite me not having any cable in it.

So apparently the conflict is if i have BOTH apic and msi.

That leads me to a question, untill this is resolved, which
configuration do i want? apic with no msi, or msi with no apic?

I also have some more information, however i do not know if its useful
at all: On a .23 livecd, which has both msi and apic, something
different happens. What happens is that both controllers are detected,
however only 1 of them works, ethtool reports that the working
controller properly detects link up/down, and that its TP with 1gbit.
This is the ethtool output for the .23 livecd with apic and msi:
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        ...
From: Andrew Morton
Date: Saturday, April 5, 2008 - 11:23 pm

Let's add some cc's.


--

From: Francois Romieu
Date: Sunday, April 6, 2008 - 7:06 am

Andrew Morton <akpm@linux-foundation.org> :

Complete dmesg and lspci -vvxn for a start.

-- 
Ueimor
--

From: Kasper Sandberg
Date: Sunday, April 6, 2008 - 9:09 pm

Hello.

I apologize, but it will have to wait a few days, as the fan on the
graphics card just broke, and it overheats almost instantly.

I will post as soon as possible, sorry for the inconvenience.


--

From: Kasper Sandberg
Date: Thursday, April 10, 2008 - 5:17 am

Okay, the box is up again, sorry for the delay..

hmm, is this dmesg cut off? (its right after booting, but maybe the
buffer isnt big enough?)

000 - 0xc0318df8   (2147 kB)
Checking if this processor honours the WP bit even in supervisor
mode...Ok.
CPA: page pool initialized 1 of 1 pages preallocated
SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 3201.59 BogoMIPS
(lpj=1600795)
Mount-cache hash table entries: 512
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
using mwait in idle threads.
Compat vDSO mapped to ffffe000.
CPU: Intel(R) Celeron(R) CPU          420  @ 1.60GHz stepping 01
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
net_namespace: 152 bytes
NET: Registered protocol family 16
PCI: PCI BIOS revision 3.00 entry at 0xfb6c0, last bus=7
PCI: Using configuration type 1
Setting up standard PCI resources
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
pci 0000:00:1f.0: Force enabled HPET at 0xfed00000
pci 0000:00:1f.0: quirk: region 0400-047f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
PCI: Transparent bridge - 0000:00:1e.0
PCI: Discovered primary peer bus ff [IRQ]
PCI: Using IRQ router PIIX/ICH [8086/2916] at 0000:00:1f.0
hpet clockevent registered
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: 0xe4000000-0xe6ffffff
  PREFETCH window: 0x00000000d0000000-0x00000000dfffffff
PCI: Bridge: 0000:00:06.0
  IO window: 9000-9fff
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.0
  IO window: a000-afff
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: ...
From: Kasper Sandberg
Date: Thursday, April 10, 2008 - 6:11 pm

Sorry for top posting, but its just easier in this case :)

I may have just gotten some new information to share.

I just built -git8. and the nic didnt work.. by booting with
noapic/nomsi i got it running though. then i did some tests, and
rebooted into the default(has worked mostly for me) noapic/msi boot.
Then it worked.

i did get some interresting information though, which i think may help
to fix the problem..

if its a boot where the nic works, i can usually rmmod/modprobe the
module once without it giving error, however if its a boot where the nic
doesent work, first time i rmmod and modprobe, it gives me an error..

the error is:
PCI: cache line size of 32 is not supported by device 0000:05:00.0
ACPI: PCI interrupt for device 0000:05:00.0 disabled
r8169: probe of 0000:05:00.0 failed with error -22

I have attached some dmesgs(sorry, but my client messes up inlines..)
also.. link detecting isnt working properly, sometimes it will detect a
link as down after a while.. its quite weird..

(oh, and this time its without nvidia just to be 100000% sure)

I hope this can help to get it resolved, alot of people are having
problems with these controllers.. i can however confirm that a similar
controller is working perfectly on a friends gigabyte motherboard, thats
with P35 chipset though, i have X48. He has only 1 of them onboard.

also something which may be of interrest, realtek offers a modified
r8169 driver called "r8168", which supposedly fixes this. I have been
unable to get it to compile though, but i saw it on ubuntu forums.



From: Kasper Sandberg
Date: Friday, April 11, 2008 - 11:06 am

I've got more information again. This is dmesg, and /proc/interrupts
from a livecd (systemrescuecd), its a .23.14, and only 1 nic works (the
one that doesent work on .25 noapic/msi), the other gets detected as
fiber.. strangely enough, the link detection works on the working nic,
on .23.14.

attached are /proc/interrupts and dmesg..

oh, and this is a 64bit livecd kernel, but it appears to act the same on
the 32bit kernel on the livecd..

This, and the previous information i have given, does it give a clue as
to what may be happening here? from what i can see, it may be either
interrupts or that pci warning thingie...

it appears that on the livecd, which runs with both msi and apic, that
the ethernet controllers gets its interrupt setup with apic, whereas msi
likes to catch it on my own .25 kernel, allthough its abit weird what
happens..

If you need more information, just ask :)

From: Kasper Sandberg
Date: Saturday, April 12, 2008 - 5:23 am

I've got some more information (though probably not too helpful).
Realtek has modified(i probably mentioned this earlier) the r8169
driver, under the name of r8168. It might be worth checking out what
they have done. people are reporting that it works:
http://damienkane.livejournal.com/5574.html
http://www.mail-archive.com/ubuntu-bugs%
40lists.ubuntu.com/msg730229.html

the driver can be found at:
http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5...
if that doesent work, do: http://www.realtek.com.tw/downloads ->
communication networks ics -> network interface controllers ->
10/100/1000 gigabit ethernet -> pci express.

I will try myself, but i have zero knowledge about this stuff in the
kernel, and will probably not be able to find anything of use.



--

From: Francois Romieu
Date: Saturday, April 12, 2008 - 12:22 pm

Can you try a simple 'nomsi' boot ?

The lspci in your previous message (2008/04/10) suggested that
something was very wrong at the PCI level:
[...]
05:00.0 0200: 10ec:8168 (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: r8169
        Kernel modules: r8169
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

(but this was on an older kernel, right ?)

When you see such an output, can you try to see if the direct access
option of lspci (-H{1/2}) makes a difference ?





You can ask your friend to grep for the 'XID' line in the kernel log
written by the r8169 driver. If the values match, you have got the same
816x hardware.

There are different problems with the 8168
- older kernel with broken mmconfig / msi / etc.
- newer 8168 chipsets which are currently not correctly handled
- plain 8168 driver bugs
- ...


I am looking very closely at Realtek's drivers but the diff between 
the different revisions of their 8168 driver are not always as
readable as I would hope for (things have improved though).

Realtek is working at fixing its 8168 driver for recent kernels. It
should be easier to compare its behavior against the in-kernel driver
with recent kernels soon. It will makes everybody's life easier.

-- 
Ueimor
--

From: Kasper Sandberg
Date: Saturday, April 12, 2008 - 3:02 pm

first off, isnt it pci=nomsi? it still appears to be using msi when
booting with nomsi. anyway, i did as you asked.

I have now booted with a multitude of parameters and combinations,
okay.. well, i just thought i'd mention it, to be honest, i have more
faith in you than realtek :)

the information i have dumped is too substantial to have inline in mail,
and also, my mail client destroys whitespace, so i have put up a
tarball, which is available here:
http://download1.kaspersandberg.com/r8169_debugging.tar.bz2

if you need more, for example, boots with "pci=nomsi" instead of just
the nomsi parameter, or combinations, as always, feel free to ask, and i
shall provide it.


--

Previous thread: Re: [PATCH] jffs2 summary allocation by matthieu castet on Saturday, April 5, 2008 - 7:05 am. (1 message)

Next thread: Re: sata_via by Marin Mitov on Saturday, April 5, 2008 - 9:08 am. (1 message)