Hi,
I am having trouble with the 2.6.23 kernel. With all versions since
2.6.23-rc1 I have trouble with my network connection. When using the
network over a certain level (just browsing the web seems not to be
enough) e.g. when installing packages over the nvsv4 share, all
network stuff freezes for some time and syslog tells me:
Aug 13 13:16:09 frege NETDEV WATCHDOG: eth0: transmit timed out
Aug 13 13:16:39 frege NETDEV WATCHDOG: eth0: transmit timed out
Aug 13 13:17:09 frege NETDEV WATCHDOG: eth0: transmit timed out
Aug 13 13:17:57 frege NETDEV WATCHDOG: eth0: transmit timed outSome info about my system:
/usr/src/linux-2.6.23-rc3 $ sh scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.Linux frege linux-2.6.23-rc3 #1 SMP PREEMPT Sat Aug 11 16:24:26 CEST
2007 i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel
GNU/LinuxGnu C 4.2.0
Gnu make 3.81
binutils 2.17
util-linux 2.12r
mount 2.12r
module-init-tools 3.2.2
e2fsprogs 1.39
Linux C Library 2.5
Dynamic linker (ldd) 2.5
Procps 3.2.7
Net-tools 1.60
Kbd 1.12
Sh-utils 6.9
udev 114
Modules Loaded nvidialspci -vvv
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and
945GT Express Memory Controller Hub (rev 03)
Subsystem: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT
Express Memory Controller Hub
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Capabilities: [e0] Vendor Specific Information00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and
945GT Express PCI Express Root Port (rev 03) (prog-if 00 [Normal
...
(netdev Cced)
Karl Meyer <adhocrocker@gmail.com> :
Can you:
- send a complete dmesg + /proc/interrupts + .config
- use git bisect to find a suspect changeset
I do not expect any change of behavior between 2.6.22 and
25805dcf9d83098cf5492117ad2669cd14cc9b24 if it can help you narrow
things down (assuming it is a r8169 regression).--
Ueimor
-
Hi,
dmesg, interrupts and .config are attached. I will have a look at git bisect.
Karl Meyer <adhocrocker@gmail.com> :
Can you reproduce the problem when nvidia binary-only stuff is not loaded
after boot ?--
Ueimor
-
I did some additional testing, the results are:
[0e4851502f846b13b29b7f88f1250c980d57e944] r8169: merge with version
8.001.00 of Realtek's r8168 driver
does not work, I after some traffic the transmit timeout occurs.
[6dccd16b7c2703e8bbf8bca62b5cf248332afbe2] r8169: merge with version
6.001.00 of Realtek's r8169 driver
Seems to be the last version to work. I did some stress testing (much
more than the level that was enough to make
[0e4851502f846b13b29b7f88f1250c980d57e944] break) and am currently
using this version and no problems so far.-
Thanks for the quick feedback.
Can you try the patch below on top of 2.6.23-rc3 ?
If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
tomorrow.diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b85ab4a..cdb8a08 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2749,6 +2749,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
if (!(status & tp->intr_event))
break;+#if 0
/* Work around for rx fifo overflow */
if (unlikely(status & RxFIFOOver) &&
(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
@@ -2756,6 +2757,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
rtl8169_tx_timeout(dev);
break;
}
+#endifif (unlikely(status & SYSErr)) {
rtl8169_pcierr_interrupt(dev);
--
Ueimor
-
(please do not remove the netdev Cc:)
Francois Romieu <romieu@fr.zoreil.com> :
You will find a tgz archive in attachment which contains a serie of patches
(0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
still works, apply 0002 on top of 0001, etc.--
Ueimor
This is what happened today:
Sep 1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
frege ~ # uname -r
2.6.22.5-cfs-v20.5-
Hi,
Can you reproduce this on 2.6.22 (not 2.6.22.x - it might be a -stable
regression)?Regards,
Michal--
LOG
http://www.stardust.webpages.pl/log/
-
Hi,
am am looking for this issue for some time now, but there where no
errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
officially), I also ran git-bisect (for more information see the older
messages in this thread).-
Karl Meyer <adhocrocker@gmail.com> :
2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
Thus it is not surprizing that it works.Any update regarding the patchkit that I sent on 2007/08/16 ?
It would help to narrow the culprit.
--
Ueimor
-
Hi,
after reading about issues with the nics on kontron boards I did a
bios upgrade,
but this did not change anything.
However, yesterday the nic (onboard) I used died. No link at all,
after switching to
the next onboard nic I got a NETDEV transmit timeout with that one on
kernel 2.6.22-r2.
It seems the whole thing is a hardware issue. I will try to figure out
with kontron.Sorry :(
Karl
-
Hi Francois,
this is what I found and sent:
The error exists from patch 2 on. I did some network testing with
patch 1 and currently use it and have no errors so far.
From my experiences up to now patch 1 should be error free.Do you need additional info?
-
fyi:
I do not know whether it is related to the problem, but since using
the version you told me there are these entries is my log:
frege Hangcheck: hangcheck value past margin!
frege Hangcheck: hangcheck value past margin!
frege Hangcheck: hangcheck value past margin!-
...
BTW, I don't know wheter it's related too, but I think you should try
Regards,
Jarek P.
-
The error exists from patch 2 on. I did some network testing with
-
I did some testing today and found that the error occurs after
applying some of the patches. However I did not figure out the exact
patch in which the error "starts" since it sometimes occurs immediatly
when moving some data over the net and sometimes it takes 30 min till
I get the transmit timeout. I will be away till sunday and do some
more testing then.-
Sorry, I was wrong, still testing....
-
Hi,
I successfully ran git bisect:
0127215c17414322b350c3c6fbd1a7d8dd13856f is first bad commit
commit 0127215c17414322b350c3c6fbd1a7d8dd13856f
Author: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue Feb 20 22:58:51 2007 +0100r8169: small 8101 comment
Extracted from version 1.001.00 of Realtek's r8101.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
d41a52a215fb1b38ba652dda90faf6ed951bccd1 M driversI did proof it by doing "git revert
0127215c17414322b350c3c6fbd1a7d8dd13856f" on my git clone, now I am
happily running 2.6.23-rc3-ge60a without the NETDEV WATCHDOG message.-
| Benjamin Herrenschmidt | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Daniel Walker | Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Andrew Morton | -mm merge plans for 2.6.23 |
git: | |
| David Miller | [GIT]: Networking |
| Hannes Eder | [PATCH 01/43] drivers/net/at1700.c: fix sparse warning: symbol shadows an earlier ... |
| Gerrit Renker | [PATCH 16/37] dccp: API to query the current TX/RX CCID |
| Herbert Xu | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
