login
Header Space

 
 

Re: sk98lin for 2.6.23-rc1

Previous thread: [M68KNOMMU]: use setup_irq() in ColdFire simple timer by Greg Ungerer on Thursday, July 26, 2007 - 11:09 am. (1 message)

Next thread: [QUESTION] cciss/cpqarray: can we use __end_that_request_first()? by Kiyoshi Ueda on Thursday, July 26, 2007 - 11:47 am. (1 message)
To: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 11:16 am

&gt;From http://www.krose.org/~krose/computing.html:

Since the sky2 driver continues to suck ass (which is a technical
description for "it hangs all the time under load, at least on my
hardware" :-) ), I've fixed the sk98lin driver to compile for
linux-2.6.23-rc1. Those who continue to have problems with sky2 can
still use 2.6.23-rc1, simply by doing the following:

   1.

      Make sure you have the headers for your kernel properly installed
      and linked to /usr/src/linux-$KVER.

   2.

      Download the sk98lin source from Marvell's site
      &lt;http://www.marvell.com/drivers/search.do&gt;.

   3.

      Untar the driver and run the install.sh according to the
      directions. It will fail.

   4.

      Look in /tmp for a directory called Sk98something. Go to
      http://www.krose.org/~krose/projects/sk98lin/ and copy the
      Makefile &lt;http://www.krose.org/%7Ekrose/projects/sk98lin/Makefile&gt;
      and sky2.c &lt;http://www.krose.org/%7Ekrose/projects/sk98lin/sky2.c&gt;
      into /tmp/Sk98something/all.

   5.

      Change into /tmp/Sk98something/all and execute:

          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules
          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules_install

   6.

      Blacklist sky2 in /etc/modprobe.d/blacklist, and (maybe not
      necessary) manually load sk98lin in /etc/modules.

There. You're done. Stable networking at last... er, again.

Unfortunately, you lose the nicest differential feature of
sky2---WOL---but that's a small price to pay for networking stability of
a desktop machine. It's nice to be able to watch MythTV again without
having to sudo bash -c 'ifdown eth0; rmmod sky2; modprobe sky2; ifup
eth0' every few minutes.


Personally, I'd like to see sk98lin remain in the kernel proper until
sky2 goes at least 6 months without reported problems.  The fact that I
am not the only one still seeing issues is a clear indication that sky2
(even with the recent patches in 2.6.23-rc1) is not...
To: Kyle Rose <krose@...>
Cc: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 7:52 pm

Bless you, extends my update capability for another version. ;-)

However, Ingo posted a patch for the thread "network dies after random 
time" which probably didn't make it into rc1. In all fairness applying 
that might fix the problem, it's possible if unlikely that the new 
driver tickles a bug the stable sk98lin driver didn't.

Does skge work for your hardware? Based on a sample size of one (four to 
go) everything worked for me except NFS, jumbo packets work with tcp, 
not with udp. I don't have everything nailed down enough for a proper 
bug report, it's just something to note. In truth there's little to 
choose between tcp and udp for machines in the same room, I could live 
with skge.

haven't tried shy2, there was a build failure on my last server build, 
won't look at it until Monday.

-- 
Bill Davidsen &lt;davidsen@tmr.com&gt;
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To: Bill Davidsen <davidsen@...>
Cc: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 9:13 pm

&gt; Does skge work for your hardware?

I unloaded sky2 and loaded skge at one point, but it didn't recognize my
hardware.  Perhaps it doesn't work with the 88E8053?

Kyle
-
To: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 3:17 pm

On Thu, 26 Jul 2007 11:16:36 -0400

Just don't build it with lock debugging enabled or you will see all the
deadlocks lying below the surface.  Worse yet, read the macro hell
of sky2le.h

-
To: Kyle Rose <krose@...>
Cc: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 12:57 pm

This sounds good in theory.

The practical problem with this approach is that there are always many 
people who use the old driver when the new driver doesn't work for them 
instead of reporting their problems with the new driver.

For these people a new driver will often suck when the old driver gets 
removed, but after the removal of the old driver they are finally forced 
to report their bugs resulting in a better new driver for everyone.

The sky2 driver is since nearly 2 years in the kernel and Stephen is 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Sunday, July 29, 2007 - 11:01 pm

The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in
the past), and cycling the module generally fixes the issues.  I have
supplied all the information that Stephen has asked for, but still no
resolution.  I am not complaining about the lack of a fix, but don't
assume that all it takes to get sky2 working is adequate bug reports.  I
have been and remain willing to test and assist debug, but after several
dropped threads, I feel like the desire or ability to fix this issue
isn't there (and remote debug of an intermittent hardware issue IS
hard), and I didn't want to be a nuisance to someone that has no
obligation to me to address the issue in the first place.

Stability has improved, it's just not there yet.

I'll switch to 1.16 soon, and respond to Stephen's request on netdev for
current issues.
--=20
Rob
To: Rob Sims <lkml-z@...>
Cc: Adrian Bunk <bunk@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Wednesday, September 5, 2007 - 5:22 am

On Sun, 29 Jul 2007 21:01:30 -0600

The only known outstanding problems on 2.62.22.6 of sky2 are:
 * problems with fibre PHY based systems
 * suspend/resume issues, missing multicast reinitalization, etc.
The previous stability problems have been addressed.
-
To: Stephen Hemminger <shemminger@...>
Cc: Rob Sims <lkml-z@...>, Adrian Bunk <bunk@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Wednesday, September 12, 2007 - 12:46 pm

Sorry to disappoint you, but it just hung for me again.
After seeing the backport of commit  c59697e06058fc2361da8cefcfa3de85ac107582 as
"sky2: restore workarounds for lost interrupts" going into 2.6.22.5 I
decided to give it another try.

First tests worked and for two days I had no trouble, but today the
network hung again, until I removed and reinserted the sky2 module.

I'm using the Gentoo kernel 2.6.22-gentoo-r6 which is based on
2.6.22.6. (All patches at
http://dev.gentoo.org/~dsd/genpatches/patches-2.6.22-7.htm )
This is as x86_64 kernel but with a 32bit userland.

My hardware:
00:00.0 Host bridge: Intel Corporation 82915G/P/GV/GL/PL/910GL Memory
Controller Hub (rev 04)
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
Integrated Graphics Controller (rev 04)
00:02.1 Display controller: Intel Corporation 82915G Integrated
Graphics Controller (rev 04)
00:1b.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) High Definition Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) PCI Express Port 1 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC
Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 03)
00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA
Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/...
To: Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>
Cc: Adrian Bunk <bunk@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Wednesday, September 5, 2007 - 3:42 pm

--- Stephen Hemminger

I pretty much agree with everything said, including 
the part about the sky2 people working hard on it. I
have noticed several bugs fixed recently in the driver
source.

However, it really DOES lock up under load. I even 
tried 2.6.23-rc4 and the absolute latest version of
the
driver and it still locks up, as in

eth1: hw csum failure.

Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff804779b6&gt;]
__skb_checksum_complete_head+0x43/0x56
 [&lt;ffffffff804779d5&gt;] __skb_checksum_complete+0xc/0x11
 [&lt;ffffffff804a989d&gt;] tcp_v4_rcv+0x14e/0x801
 [&lt;ffffffff8048ff84&gt;] ip_local_deliver+0xca/0x14c
 [&lt;ffffffff80490472&gt;] ip_rcv+0x46c/0x4ae
 [&lt;ffffffff88006138&gt;] :sky2:sky2_poll+0x72b/0x9c7
 [&lt;ffffffff80245979&gt;] update_wall_time+0x28c/0x39b
 [&lt;ffffffff8047c934&gt;] net_rx_action+0xa8/0x166
 [&lt;ffffffff8023901c&gt;] do_timer+0x10/0xab
 [&lt;ffffffff80235ced&gt;] __do_softirq+0x55/0xc4
 [&lt;ffffffff8020c5cc&gt;] call_softirq+0x1c/0x28
 [&lt;ffffffff8020d6fd&gt;] do_softirq+0x2c/0x7d
 [&lt;ffffffff8020d9bb&gt;] do_IRQ+0x13e/0x15f
 [&lt;ffffffff8020a780&gt;] mwait_idle+0x0/0x48
 [&lt;ffffffff8020b951&gt;] ret_from_intr+0x0/0xa
 &lt;EOI&gt;  [&lt;ffffffff804acdb9&gt;] udp_poll+0x0/0xfb
 [&lt;ffffffff8020a7c2&gt;] mwait_idle+0x42/0x48
 [&lt;ffffffff8020a718&gt;] cpu_idle+0xbd/0xe0
 [&lt;ffffffff80704a5a&gt;] start_kernel+0x2ac/0x2b8
 [&lt;ffffffff80704140&gt;] _sinittext+0x140/0x144

As far as I can tell, this bug has been with the
sky2 driver all the way back to the Beforetime.
Based on it happening with various versions of the
driver back to 2.6.18 that I have tried, plus some
googling on it.

So while I bug reporting point is a good one, it would
be nice to have a reliable driver in the kernel until
the sky2 one is better. The alternative is to use
the vendor driver, which less than optimal.

-J



      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to plea...
To: James Corey <ploversegg@...>
Cc: Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Adrian Bunk <bunk@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Saturday, September 8, 2007 - 1:44 pm

I checnged from the sk98lin to the previous driver Adrian said was the 
"right one," skge IIRC. Then he started pushing sky2, and I tried that. 
Like you I get hangs, but unlike you the system doesn't hang, just the 
NIC. No errors, warnings, and reboot fixes it. Acts as if the cable were 
pulled.

That was with 2.6.22.5 (or so), dropped back to an old kernel with 
sk98lin, previously had uptimes in three digit days. Up for a week or so 
now.

Haven't tried later kernels, don't intend to, while no network is really 
secure, it not really useful.

-- 
Bill Davidsen &lt;davidsen@tmr.com&gt;
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To: Bill Davidsen <davidsen@...>
Cc: James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Saturday, September 8, 2007 - 3:11 pm

There is a real long-term advantage of removing drivers like sk98lin 
because it forces people to report bugs if the new driver doesn't work  
instead of giving them the workaround of using the obsolete driver.    
And this has the (at first sight surprising) effect that removing code  

You are a regular reader of linux-kernel, and therefore the sk98lin 
removal can hardly be a surprise for you. If you prefer whining over 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Monday, September 10, 2007 - 10:32 am

The issue is that sk98lin is only obsolete because you say so! skge 
crashes the system, as Chris reports, sky2 just stops passing bits and 
behaves as if the network cable were idle, no error messages of any 
nature, ping claims it's sending packets, tcpdump claims packets are 
being sent, the switch never blinks and systems on the switch see no 
packets. Again, no error messages, no dumps, nothing which would help 
you debug it, and it happens after some undefined time.

skge and sky2 are up to eight or ten versions now, and they still don't 

I am trying to "improve the kernel" by advocating not removing reliable 
drivers in favor of unreliable drivers. Saying a driver is better 
because it has a clean design and good code is something I would expect 
from someone who hadn't written or used code. If skge and sky2 were so 
clean you wouldn't still be chasing obscure bugs after the driver had 
been in the kernel for six+ versions, you wouldn't have me wasting time 
trying to get a more secure kernel which is still reliable, wouldn't 
have Willy Tarreau suggesting you should be marking sk98lin as obsolete 
and leaving it in, wouldn't have someone maintaining sk98lin as a patch, 
wouldn't have Chris Stromsoe getting hard lock-ups. No matter how ugly 
sk98lin looks, and how well designed skge and sky2 may be, reliability 
is not a beauty contest.

The volume of complaint should give you a hint that in this case the new 
drivers aren't usefully stable for many people, and that you are 
advocating a removal which is at least premature. If you can't admit 
you're wrong on this one, you can say you have reconsidered the timing 
of removal in light of new information.

-- 
bill davidsen &lt;davidsen@tmr.com&gt;
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To: Bill Davidsen <davidsen@...>
Cc: James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Monday, September 10, 2007 - 11:39 am

No, it is obsolete because we have more than one driver for this 
hardware, and the people responsible for network drivers in the kernel 

A better written driver might still lack some workarounds for broken 
hardware or similar problems. Or simply contain some bugs like all 
software does.

The important word is not "reliability", it's "maintainability".

It was clear that sk98lin would go in the long term, and the only thing 
that could be discussed is the when and how of removal.

When you talk about "new information", why did this information not 
surface until after the sk98lin driver was removed?

Is there really a problem with "the timing of removal" or would we have 
faced exactly the same problems if the removal was timed a year later?

And this is really the essence when I'm saying "removing code improves 
the kernel": The goal is to get people to report if the new drivers 
aren't usefully stable for them, not to use sk98lin instead without 
sending a bug report.

Having different drivers with different sets of bugs and features is 
not a situation that should be retained for a longer time.

The underlying question is:
Is there anything better than a quick removal of the obsolete driver to 
get people to both test and report bugs with the new driver?
Keeping obsolete drivers longer only for running into exactly the same 
problem later isn't an improvement.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Tuesday, September 11, 2007 - 12:23 am

I would like to happily report that the sky2 driver works great in  
the NIC on my tablet where the sk98lin and skge drivers both fail  
utterly and hang the kernel.  On another system the sk98lin and skge  
drivers don't recognize the chipset at all (missing PCI ID?) while  
the sky2 driver works perfectly for large quantities of data  
transferred.

Cheers,
Kyle Moffett

-
To: Adrian Bunk <bunk@...>
Cc: Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Sunday, September 9, 2007 - 8:54 am

I've been trying to migrate off sk98lin to skge since earlier this year, 
without success, starting with 2.6.18 or .19.

I have several of these cards in production using the sk98lin driver:

fresno:~# lspci -vv -s 02:01
02:01.0 Ethernet controller: SysKonnect SK-9872 Gigabit Ethernet Server Adapter (SK-NET GE-ZX dual link) (rev 11)
         Subsystem: SysKonnect SK-9844 Gigabit Ethernet Server Adapter (SK-NET GE-SX dual link)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium &gt;TAbort- &lt;TAbort- &lt;MAbort- &gt;SERR- &lt;PERR-
         Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 32 bytes
         Interrupt: pin A routed to IRQ 22
         Region 0: Memory at febfc000 (32-bit, non-prefetchable) [size=16K]
         Region 1: I/O ports at e800 [size=256]
         Expansion ROM at febc0000 [disabled] [size=128K]
         Capabilities: [48] Power Management version 1
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
         Capabilities: [50] Vital Product Data

They are dual port SX fiber.  Both ports are connected.  If I do this:

fresno:~# modprobe skge
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up

the system locks up and I have to power cycle it.  The order doesn't 
matter (if I do eth3 up/down, then eth2 up kills it).

I don't have any problems with sk98lin.  This works fine:

fresno:~# modprobe sk98lin RlmtMode=DualNet
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up
fresno:~# ip li set eth3 down


I am more than happy to test various driver changes, and have tried a few 
suggested patches but nothing has worked so far.  I would like to be using 
skge instead of sk98lin, but so far haven't had any success.




-Chris
-
To: Chris Stromsoe <cbs@...>
Cc: Adrian Bunk <bunk@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 6:23 pm

On Sun, 9 Sep 2007 05:54:45 -0700 (PDT)

Please test 2.6.24-rc1 (or -rc2) because there were several fixes for skge
that made it work correctly for dual port fiber board. The worst bug in skge
was that it configured the ram buffer incorrectly.

I just submitted these for next 2.6.23.X stable release as well

-- 
Stephen Hemminger &lt;shemminger@linux-foundation.org&gt;
-
To: Stephen Hemminger <shemminger@...>
Cc: Adrian Bunk <bunk@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Tuesday, November 6, 2007 - 9:42 pm

I tested 2.6.24-rc1.  This series of commands

   fresno:~# modprobe skge
   fresno:~# ip li set eth2 up
   fresno:~# ip li set eth2 down
   fresno:~# ip li set eth3 up

still hard-locks the box in the same place.  Was there anything in the 
-rc2 patch for skge?



-Chris
-
To: Adrian Bunk <bunk@...>
Cc: Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>
Date: Saturday, September 8, 2007 - 10:42 pm

In my case the issue is simply one of practicality: I cannot go to the
data center 5 times per day to reboot my colo box.  Therefore, I run
sk98lin.  It's really that simple.

Kyle

-
To: Kyle Rose <krose@...>
Cc: Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>
Date: Sunday, September 9, 2007 - 7:13 am

When did you report this bug the first time?

What we need is that people when testing a new kernel they plan to use 
test the new drivers *and report the bugs if they run into any*.

What could we have done so that you reported your bug without removing 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Kyle Rose <krose@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>
Date: Tuesday, September 11, 2007 - 4:05 am

On Sun, 9 Sep 2007 13:13:26 +0200


There are several different problems in this thread:
1. The removal of old sk98lin driver caused some users to be forced to use
    skge. These users have uncovered issues with the dual port fiber based versions
    of the board.  
    Short term: The sk98lin driver should be restored to previous state, 
       and the PCI table should be used to limit the usage to only fiber systems.
       If Adrian doesn't do it, I'll do it when I return from Germany.
    Long term: I have fiber based board (thanks ebay) on the way to resolve
       skge bug.

2. Sky2 driver has it's own fiber based problems.  Solve these after skge fiber.

3. Sky2 doesn't have as many workarounds for hardware problems as vendor sk98lin
    driver.
-
To: Stephen Hemminger <shemminger@...>, Adrian Bunk <bunk@...>
Cc: Kyle Rose <krose@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>
Date: Tuesday, September 11, 2007 - 6:20 pm

--- Stephen Hemminger


Hm, hope I didn't trigger a religious debate. When
you get to the point of working on the SKY2 driver
problem with DGE-550SX (Syskonnect SK-9S81) also
known as the "hw csum failure" issue, I'll be 
glad to test a patch or take debug data. Til then,
I'll stay out of the way.

-J





      ____________________________________________________________________________________
Shape Yahoo! in your own image.  Join our Network Research Panel today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 


-
To: Stephen Hemminger <shemminger@...>
Cc: Kyle Rose <krose@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <netdev@...>
Date: Tuesday, September 11, 2007 - 7:54 am

No problem with this, but since it was Jeff's patch it should better be 
him who reverts it (and he's anyway one step nearer to Linus).

But the underlying general problem still remains:

How can we get people to test and report bugs with the new drivers 
before removing the old driver?

That's a question especially for the people who now had problems after 
sk98lin was removed.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Stephen Hemminger <shemminger@...>, Kyle Rose <krose@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <netdev@...>
Date: Tuesday, September 11, 2007 - 10:29 am

Sorry for a long answer, I'm trying to provide insight on two recent cases.

Thinking back to several drivers, when e100 was new I tried it because I 
had problems with eepro100 in the area of multiple cards, multiple 
cables on a single card, and jumbo packets. For a while I used both, 
until e100 worked where I need it. So I initially tried it because it 
had features I needed, and then dropped to older driver just to avoid 
having to decide.

With sk98lin, the driver worked flawlessly with all (3-4) systems, so I 
had no reason to try any other. When removing sk98lin was first 
proposed, I tried skge, first measurements showed it was 5-8% slower, 
NOT what I want, so I went back. For me there was no reliability issue, 
but I never tried it in a system with more than on NIC on the driver. 
Would "it's a little slower" be a valid bug report? Or would I have 
gotten "works fine for me" from people not beating it over Gbit? I 
didn't try sky2 until you suggested it, and I have reported my results 
previously, just stops working. Could it be my hardware? I tried it on 

So if you want people to try a new driver, I think it really has to have 
some benefits to the users, in terms of performance, reliability, or 
features. "Cleaner design" doesn't motivate, and it does raise the 
question of why the old driver wasn't just cleaned up. I've been doing 
software for decades, I appreciate why, but users in general just want 
to use their system. Which raises the question of why to delete drivers 
which work for many or even most users? Testing a new kernel is no 
longer a drop in a boot operation if modprobe.conf must be edited to get 
the network up, and the typical user isn't going to write that shell 
script to try one or the other driver.

Honestly, new drivers which offer little benefit to most users are the 
exception rather than the rule, so this may a corner case I would like 
to see sk98lin back in the kernel, for a while I can build my own 
kernels and patch it in, but until oth...
To: Bill Davidsen <davidsen@...>
Cc: Stephen Hemminger <shemminger@...>, Kyle Rose <krose@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <netdev@...>
Date: Tuesday, September 11, 2007 - 11:03 am

If you get less throughput that is a regression, and it should be 
reported and fixed.

I doubt anybody would have told you otherwise.


As I already explained, there is a long term advantage for all users if 
there is only one driver in the kernel. Therefore all users should 
switch away from obsolete drivers to the replacement drivers, and the 
obsolete driver will be removed at some point in time. The only question 

The typical user will let his distribution handle this.


That a new driver offers benefits that cause most users to switch isn't 
realistic.

You mention e100 as an example - well, I'm using this driver in my 
computer, but I doubt anything would be worse for me if I'd use the 
obsolete eepro100 driver instead since I'm not using any of the fancy 
e100 features you mentioned as advantages.

There is a long term advantage for all users if there is only one driver 
in the kernel. Therefore all users should switch away from obsolete 
drivers to the replacement drivers, and the obsolete driver will be 

skge and sky2 support distinct hardware.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To: Adrian Bunk <bunk@...>
Cc: Bill Davidsen <davidsen@...>, Stephen Hemminger <shemminger@...>, Kyle Rose <krose@...>, James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>, Jeff Garzik <jeff@...>, <netdev@...>
Date: Tuesday, September 11, 2007 - 6:37 pm

Not only that. You have to place the switch in its context with history.
Stephen, please correct me if I'm wrong, but sk98lin has been randomly
working for a very long time. Not 100% the driver's fault, because it
has had to workaround a lot of chips bugs. The fact that this driver
supports *all* chips in the family makes it harder to identify whether
problems are caused by the hardware or by the driver because it is
bloated with tons of if/else.

I've personally encountered random data corruption on the receive path
with PCI-E hardware with sk98lin, as well as random TX stops. Sometimes
it would require one terabyte of data, sometimes just a few hundreds
megs. On other hardware (skge now), UDP would simply stop being sent
and some TCP traffic was necessary to restart UDP! One guy at Marvell
once asked me for more information, but it was not easy to provide
much more, given the randomness of the problems!

Stephen has done an excellent (and thankless) job at restarting from
scratch, and the idea to separate the two chips was a good one IMHO.
The problem is that he might have thought that most of the bugs were
in the driver, while most of them are in the hardware, and this requires
a lot of workarounds, which do not always work the same for everybody
(I remember having tried to disable flow control with sk98lin because
it helped with sky2).

In parallel, sk98lin has improved on the vendor's site. v8 exhibited
all the problems I explained above, but v10 has fixed a lot of them,
making the new sk98lin more reliable. In parallel, sky2 and skge had
got wider acceptance and testing. The nastiest hardware bugs will
slowly surface, a good deal of driver bugs have been detected too
(and that's expected from any new driver).

It is possible that after 2 or 3 patches, a lot of the remaining
problems will suddenly vanish. But it's also possible that the driver
will still not work for 1% of people for 1 or 2 years because of some

Desktop users genreally have no problem experimenting with ...
To: Kyle Rose <krose@...>
Cc: Adrian Bunk <bunk@...>, Bill Davidsen <davidsen@...>, James Corey <ploversegg@...>, Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, <linux-kernel@...>
Date: Sunday, September 9, 2007 - 12:48 am

Adrian generally wants to force "normal" users to test new drivers in order
to quickly find bugs and fade out older ones. While this is often possible
on the desktop, it's not possible for production servers. And not everyone
can run 2.6.16.x to get a long-term stable kernel.

I think that what is really needed is to add the opposite of "experimental"
in the config options. Something like "deprecated drivers" which would be
disabled by default. Desktop users would normally not care about that and
rely only on newer drivers. Server users would have to enable the option if
they want their old driver to be present because they have no other choice.

With each driver's help text, it would be wise to add some text indicating
what will replace the driver in question, so that their users know how to
test it on non-production machines.

But I agree with Kyle that on production systems, it is not acceptable to
have a driver hang even once a month. This generally implies loss of service
and customers going away. Ideology has no place in this area, is is quickly
replaced by pragmatism.

It was the same reason I spent time trying to get sky2 to reliably work in
2.4 ; sk98lin v8 was horribly unstable. Sky2 was fairly better but did not
support some basic operations such as ifdown/ifup. sk98lin v10 finally worked
fine, and I upgraded my customer's system with it because I needed anything
which would reliably work. It was not acceptable anymore to have the customer
phone twice a week complaining that their server had crashed again.

In the long term, I would really like to get sky2 to work well in 2.4
because I'm more confident it in, it's cleaner, less obscure and less
bloated. Having passed terabytes of data through both drivers I have
not observed any glitch with sky2 as I had with sk98lin v8.

Fortunately, sky2 chips are mostly found on desktop motherboards, so that
helps the driver stabilize very quickly. It should not take as long as
the transition from eepro100 to e100.

Willy

-
...
To: James Corey <ploversegg@...>
Cc: Stephen Hemminger <shemminger@...>, Rob Sims <lkml-z@...>, Adrian Bunk <bunk@...>, <linux-kernel@...>
Date: Wednesday, September 5, 2007 - 5:04 pm

Yich.  I'm glad I'm still using sk98lin on my unmanned colo box.

Kyle

-
To: Kyle Rose <krose@...>
Cc: James Corey <ploversegg@...>, Rob Sims <lkml-z@...>, Adrian Bunk <bunk@...>, <linux-kernel@...>
Date: Wednesday, September 5, 2007 - 7:00 pm

On Wed, 05 Sep 2007 17:04:59 -0400

Great for you, when I was testing sk98lin crashed my machine on
overnight stress run. My intuition is that there is a bug in sk98lin
on Yukon EC-U chips (those without ram buffer) and a hardware
problem on Yukon XL chips (those with ram buffer) and the sky2
driver doesn't have workaround for getting the ram buffer stuck (yet).

I don't like putting workarounds in for problems I can't reproduce.
After KS, I'll rerun more stress tests on all the chip flavors
and see if the hang is reproducible.
-
To: Adrian Bunk <bunk@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 7:38 pm

Yes, you've grasped the reason for leaving the old driver in, so people 
can use their computers. Because when there is a new driver for 
previously unsupported hardware people will be glad to put time into 
debugging it to make the hardware useful. But when you take out a 
working driver because you (ie. the responsible developer) have a new 
idea which interests you, users don't want to use it because they have 
something which works, so you take out the working driver to make work 
for the users and create what you call a "better new driver" below.

The old driver wasn't requiring any resources to maintain, the old 
hardware wasn't changing, there was no particular benefit to users in 
breaking their configuration. This disregard for the users just gives 
Linux critics an arguing point, "the next new kernel may withdraw 
support for your hardware." Isn't that why 2.6.16 is still being 
maintained? Nobody (sane) expects new drivers to be perfect, they just 
"Better" is a very subjective thing, you see elegance of design perhaps, 
I see works or not, and when I have to use statistical methods to see 
latency or CPU overhead benefits, I frankly don't care.

Removing a working driver without a fully functional replacement forces 
people to stop upgrading their kernel, or start maintaining old drivers 
out of line. Problems of the "just occasionally goes away" type can take 
months to debug, the load can't be duplicated in most cases, and there's 
Where does sky2 come in? Does this mean the the recent suggestion to 
"just change to skge and stop complaining" is also wrong?

-- 
Bill Davidsen &lt;davidsen@tmr.com&gt;
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To: Bill Davidsen <davidsen@...>
Cc: Adrian Bunk <bunk@...>, Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 7:41 pm

This statement proves you don't know anything at all about the situation.

	Jeff


-
To: Adrian Bunk <bunk@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 6:58 pm

I have a number of SK-9844 "SK-NET GE-SX dual link" cards.  skge has never 
worked with the cards.

The following sequence locks up the machine completely (power cycle to get 
it back) with 2.6.22.1:

fresno:~# modprobe skge
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up

This works just fine:

fresno:~# rmmod skge
fresno:~# modprobe sk98lin RlmtMode=DualNet
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up
fresno:~# ip li set eth3 down


eth2 and eth3 are ports off the sk-9844.

I've been reporting the problem since March.  If sk98lin is removed, I 
won't have networking.



-Chris
-
To: Kyle Rose <krose@...>
Cc: <linux-kernel@...>
Date: Thursday, July 26, 2007 - 12:28 pm

This breaks with O= builds. See (1).


Sorry for the nitpick, it can be done easier :)




	Jan
-- 
-
To: Jan Engelhardt <jengelh@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 12:30 pm

I'm sure it can.  I didn't want to have to figure out the kernel build
system just to get this one driver working.  Hence my desire for it to
remain in the kernel proper until sky2 utterly works. ;-)

Kyle

-
To: Kyle Rose <krose@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 12:41 pm

Oh it's really easy, have a look at
https://dev.computergmbh.de/svn/misc_kernel/oopser/trunk/Makefile


	Jan
-- 
-
To: Jan Engelhardt <jengelh@...>
Cc: Kyle Rose <krose@...>, <linux-kernel@...>
Date: Thursday, July 26, 2007 - 9:07 pm

Thanks for the pointer.  I've done this, and created an actual kernel
module tarball that is now available at
http://www.krose.org/~krose/projects/sk98lin/sk98lin.tar.gz.

Thanks,
Kyle



-
Previous thread: [M68KNOMMU]: use setup_irq() in ColdFire simple timer by Greg Ungerer on Thursday, July 26, 2007 - 11:09 am. (1 message)

Next thread: [QUESTION] cciss/cpqarray: can we use __end_that_request_first()? by Kiyoshi Ueda on Thursday, July 26, 2007 - 11:47 am. (1 message)
speck-geostationary